SlideShare a Scribd company logo
1 of 33
Download to read offline
The BADAPPLE promiscuity
plugin for BARD
Evidence-based promiscuity
scores
Jeremy Yang, UNM
Oleg Ursu, UNM
Cristian Bologa, UNM
Anna Waller, UNM
Larry Sklar, UNM
Tudor Oprea, UNM
ACS National Meeting – Indianapolis, IN -- Sep. 8-12, 2013
CINF: Integrative Chemogenomics Knowledge Mining Using NIH Open Access Resources
What is BADAPPLE?
• BioActivity Data Associative Promiscuity Pattern
Learning Engine
• Bioassay data analysis algorithm
• Scaffold association patterns
• Evidence-based
• Robust to noise and errors
UNM
What is promiscuity?
• Un-selective bioactivity
• Normally undesirable
• Evolving conceptions:
– Polypharmacology
– Systems biology
– Systems chemical biology
3
Promiscuity & bioassay data analysis:
Selected references
• Frequent hitters (2002, Schneider &al.)
• Aggregators (2003, Shoichet &al.)
• ALARM NMR (2004, Hajduk &al.)
• Promiscuous scaffolds [talk] (2007, Bologa)
• Pan-Assay Interference (2010, Baell &al.)
• PubChem Promiscuity (2011, Canny &al.)
4
Prerequisites for Promiscuity Data
Analysis
• Definition of unique chemical entity?
– Yes
• Definition of unique biological entity?
– Challenging
• I.e., Rigorously calibrated bioactivity matrix
5
CN/C(=C[N+](=O)[O-])/NCCSCC1=CC=C(O1)CN(C)C.Cl
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH
VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
2211 Chemicals On 159 Targets
Matrix c/o Tudor
Oprea, Scott Boyer
(AZ), 2011
Bioactivity matrices depend on rigorous
informatics
6
2211 Chemicals On 159 Targets
Bioactivity matrices depend on rigorous
informatics
chemistry
biology
“1 chemistry”
= unique compound
or… = unique scaffold
“1 biology”
= unique target?
or… = unique assay?
Chemical biology space:
To define a space must define a point 7
Matrix c/o Tudor
Oprea, Scott Boyer
(AZ), 2011
Promiscuity: related concepts
• Assay-interferers
• Experimental
artifacts
• Frequent hitters
• False positives
• True positives
• True non-selective
actives
• Aggregators
• Reactives
• Cytotoxic
8
Badapple promiscuity score:
a practical definition for bioassay
data analysis
• Purpose: Streamline molecular discovery
project workflow.
• Hence: Score designed to detect "false trails"
(true-promiscuous OR false positives) unlikely
to be productive leads.
9
What is evidence-based?
• Empirical data analysis
• Mistakes can be un-learned.
• Data driven
• Not: Expert systems
10
Drawbacks of evidence-based
• Theories proven wrong. Ouch.
• Reality is messy.
• GIGO.
11
Benefits of evidence-based
More wins. More knowledge. Lower costs. Progress.
(*) N.B. key role of Dick Cramer, ACS CINF 2013 Skolnik award winner.
12
(*)
BADAPPLE scoring function
• Scaffold score
• By scoring scaffolds, more relevant evidence
• Substances, assays and samples considered
• Penalize under-sampling
• Score is a statistic, not a "model"!
• Inherently "validated"
13
Avoiding overfitting; being skeptical
of scanty evidence
14
Avoiding overfitting: Moneyball style
Sabermetrics leads the way.
The Signal and the Noise: Why So Many
Predictions Fail—But Some Don't, Nate
Silver (2012).
http://mark.stubbornlights.org/phils/archive
s/2004_05.html
15
*http://pasilla.health.unm.edu/tomcat/biocomp/badapple
BADAPPLE public web app
16
Why Scaffolds?
biology - birds, bees, nature
chemistry - chemists
SAR - drugs
IP - lawyers
Molecular scaffolds are special
and useful guides for discovery
Jeremy Yang, UNM & IU
Cristian Bologa, UNM
David Wild, IU
Tudor Oprea, UNM
ACS National Meeting - Sept. 8-12, 2013 - Indianapolis, IN
CINF grad student symposium 9/8/13 & ACS SciMix 9/9/13 8pm
There's something about scaffolds...
Especially the "privileged few"...
pScore
Count
Dataset:
BARD,
MLSMR,
MLP HTS
Totals:
compounds:
373,802 ;
scaffolds:
146,024 ;
assays: 528 ;
wells/results:
30,612,714
19
Promiscuity score distribution
Scaffolds & drug-scaffolds, the privileged few
explaining a lot of activity...
% of
active
samples
i_scaf, descending pScore order
Dataset:
BARD,
MLSMR,
MLP HTS
Totals:
compounds:
373,802 ;
scaffolds:
146,024 ; assays:
528 ;
wells/results:
30,612,714;
drugs: 283;
drugscafs: 1958
20
Scaffolds & drug-scaffolds, the privileged few
explaining a lot of activity...
Dataset:
BARD,
MLSMR,
MLP HTS
Totals: compounds:
373,802 ; scaffolds:
146,024 ; assays: 528
; wells/results:
30,612,714;
drugs: 283;
drugscafs: 1958
% total
activity
# scaffolds %
scaffolds
All 50% 1979 1.4%
All 75% 11,645 8%
Drugs 50% 54 2.8%
Drugs 90% 327 16.7%
“total activity” = active scaffold-instances
21
Scaffold visualization: Molecule Cloud
(Badapple-scaled)
The Molecule Cloud - compact visualization of large collections of molecules,
Peter Ertl and Bernd Rohde, J. Cheminformatics, 2012, 4:12.
22
What is BARD?
• BioAssay Research Database, http://bard.nih.gov
• MLP: 1000+ assays, 400k+ cpds, 200M data
• Manual assay annotations + QA
• BARD Assay Ontology
• Assay Data Standard
• Community platform for bioassay data analysis
& computation
25
BARD + Badapple Synergy
• BARD ontology, based on BAO
• BARD raising "semantic IQ" of public bioassay
data.
• Evidence = information = data + metadata
http://bard.nih.gov/
BARD Plugin Platform:
Community development
Enterprise deployment
• BARD Plugin spec (IPlugin interface)
• BARD PluginValidator class
• Java, JAX-RS, Jersey, REST
• BARD REST API
• WAR deployment, discoverable
fig c/o Steve
Brudz, Broad
Badapple Plugin via BARD web client
28
Discovery
scenario:
Rapidly flags
potentially
problematic
(notorious)
scaffolds.
BARD Synergy, next steps:
Raising Badapple to the next level
Assay Definition
Assay Protocol
Biology
BARD Classes
biology
chemistry
Re-calibrating the
bioactivity matrix
for improved rigor,
accuracy &
sensitivity, using the
BARD ontology.
BARD Synergy, next steps:
Raising Badapple to the next level
Re-calibrating the bioactivity matrix for improved rigor,
accuracy & sensitivity, using the BARD ontology.
Biology
Types
Assay
Formats
Protein
Classes
Some re-calibrations of interest:
BARD Synergy, next steps:
Raising Badapple to the next level
cell-based format, 540,
60%single protein format,
161, 18%
biochemical format, 76,
8%
protein complex format,
38, 4%
whole-cell lysate format,
32, 4%
small-molecule format,
11, 1%
protein format, 10, 1%
Assays: assay format
cell-based format
single protein format
biochemical format
protein complex format
whole-cell lysate format
small-molecule format
protein format
tissue-based format
nucleic acid format
organism-based format
cell membrane format
microsome format
cell-free format
mitochondrion format
(blank)
BARD Synergy, next steps:
Raising Badapple to the next level
PROTEIN, 514, 47%
PROCESS, 341, 31%
GENE, 217, 20%
Biology Types
PROTEIN
PROCESS
GENE
biology
AASUBSTITUTION
NCBI
BARD Synergy, next steps:
Raising Badapple to the next level
33
transporter, 112, 28%
receptor, 56, 14%
nucleic acid binding, 52,
13%
enzyme modulator, 45, 11%
cell adhesion molecule, 24,
6%
transferase, 23, 6%
hydrolase, 22, 6%
signaling molecule, 19, 5%
extracellular matrix protein,
16, 4%
oxidoreductase, 11, 3%
membrane traffic protein,
5, 1%
Protein Classes
transporter
receptor
nucleic acid binding
enzyme modulator
cell adhesion molecule
transferase
hydrolase
signaling molecule
extracellular matrix protein
oxidoreductase
membrane traffic protein
transfer/carrier protein
transcription factor
cytoskeletal protein
storage protein
isomerase
defense/immunity protein
Conclusions
• Badapple exemplar as BARD plugin
• Badapple exemplar as evidence-based
algorithm
• New BARD semantic capabilities will
elevate Badapple to next level.
• Promiscuity a complex issue.
34
Acknowledgements
• Steve Mathias, UNM
• Chris Lipinski, Melior
• Rajarshi Guha, NCATS
• Stephan Schurer, UMiami
• Uma Vempati, UMiami
• Mark Southern, Scripps
• BARD Engineering
Working Group
35

More Related Content

What's hot

SolGS Hyderabad conference 2016
SolGS Hyderabad conference 2016SolGS Hyderabad conference 2016
SolGS Hyderabad conference 2016solgenomics
 
Using ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesUsing ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesCasey Greene
 
Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17solgenomics
 
SolGS workshop 2016
SolGS workshop 2016SolGS workshop 2016
SolGS workshop 2016solgenomics
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryResearch Information Network
 
breeding informatics solutions at SGN
breeding informatics solutions at SGNbreeding informatics solutions at SGN
breeding informatics solutions at SGNsolgenomics
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the futurePistoia Alliance
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsChunlei Wu
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance
 
eScience Institute presentation on eagle-i
eScience Institute presentation on eagle-ieScience Institute presentation on eagle-i
eScience Institute presentation on eagle-imhaendel
 

What's hot (20)

Enriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentationEnriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentation
 
SolGS Hyderabad conference 2016
SolGS Hyderabad conference 2016SolGS Hyderabad conference 2016
SolGS Hyderabad conference 2016
 
Using ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesUsing ADAGE for pathway-style analyses
Using ADAGE for pathway-style analyses
 
EVQLV Deck
EVQLV Deck EVQLV Deck
EVQLV Deck
 
Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17
 
SolGS workshop 2016
SolGS workshop 2016SolGS workshop 2016
SolGS workshop 2016
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
David Grawoig resume 061215
David Grawoig resume 061215David Grawoig resume 061215
David Grawoig resume 061215
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
breeding informatics solutions at SGN
breeding informatics solutions at SGNbreeding informatics solutions at SGN
breeding informatics solutions at SGN
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
Dr Julie Stahlhut - Barcode Data Life-cycle
Dr Julie Stahlhut - Barcode Data Life-cycleDr Julie Stahlhut - Barcode Data Life-cycle
Dr Julie Stahlhut - Barcode Data Life-cycle
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Introduction to Database Research Projects @ CWHR
Introduction to Database Research Projects @ CWHRIntroduction to Database Research Projects @ CWHR
Introduction to Database Research Projects @ CWHR
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotations
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
eScience Institute presentation on eagle-i
eScience Institute presentation on eagle-ieScience Institute presentation on eagle-i
eScience Institute presentation on eagle-i
 

Viewers also liked

Just In Time - Operations Management
Just In Time - Operations ManagementJust In Time - Operations Management
Just In Time - Operations ManagementGargi Kapadia
 
5 s powerpoint presentation
5 s powerpoint presentation5 s powerpoint presentation
5 s powerpoint presentationmarigold-cherie
 
Just in time manufacturing ppt
Just in time manufacturing pptJust in time manufacturing ppt
Just in time manufacturing pptSwati Luthra
 

Viewers also liked (7)

Paradox
ParadoxParadox
Paradox
 
Just In Time - Operations Management
Just In Time - Operations ManagementJust In Time - Operations Management
Just In Time - Operations Management
 
Poka Yoke Final Ppt
Poka Yoke  Final PptPoka Yoke  Final Ppt
Poka Yoke Final Ppt
 
Process design
Process designProcess design
Process design
 
Motion and time study
Motion and time studyMotion and time study
Motion and time study
 
5 s powerpoint presentation
5 s powerpoint presentation5 s powerpoint presentation
5 s powerpoint presentation
 
Just in time manufacturing ppt
Just in time manufacturing pptJust in time manufacturing ppt
Just in time manufacturing ppt
 

Similar to The BADAPPLE promiscuity plugin for BARD

Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slidesGenomeInABottle
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Jeremy Yang
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Jeremy Ash Bioinformatics PhD Defense
Jeremy Ash Bioinformatics PhD DefenseJeremy Ash Bioinformatics PhD Defense
Jeremy Ash Bioinformatics PhD DefenseJeremy Ash
 
Best Practices for Building an End-to-End Workflow for Microbial Genomics
 Best Practices for Building an End-to-End Workflow for Microbial Genomics Best Practices for Building an End-to-End Workflow for Microbial Genomics
Best Practices for Building an End-to-End Workflow for Microbial GenomicsJonathan Jacobs, PhD
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryCarole Goble
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuAnne Deslattes Mays
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Sage Base
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...ChemAxon
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013Andrea de Souza
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Keyguest3d0531
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
 
Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Ian Fore
 

Similar to The BADAPPLE promiscuity plugin for BARD (20)

Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
 
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Jeremy Ash Bioinformatics PhD Defense
Jeremy Ash Bioinformatics PhD DefenseJeremy Ash Bioinformatics PhD Defense
Jeremy Ash Bioinformatics PhD Defense
 
Best Practices for Building an End-to-End Workflow for Microbial Genomics
 Best Practices for Building an End-to-End Workflow for Microbial Genomics Best Practices for Building an End-to-End Workflow for Microbial Genomics
Best Practices for Building an End-to-End Workflow for Microbial Genomics
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Fore FAIR ISMB 2019
Fore FAIR ISMB 2019
 

More from Jeremy Yang

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesJeremy Yang
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APIJeremy Yang
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerJeremy Yang
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterJeremy Yang
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discoveryJeremy Yang
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingJeremy Yang
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of ComputingJeremy Yang
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsJeremy Yang
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds posterJeremy Yang
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryJeremy Yang
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesJeremy Yang
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...Jeremy Yang
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsJeremy Yang
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
 

More from Jeremy Yang (18)

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles Explorer
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of Computing
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of Computing
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubts
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds poster
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discovery
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applications
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

The BADAPPLE promiscuity plugin for BARD

  • 1. The BADAPPLE promiscuity plugin for BARD Evidence-based promiscuity scores Jeremy Yang, UNM Oleg Ursu, UNM Cristian Bologa, UNM Anna Waller, UNM Larry Sklar, UNM Tudor Oprea, UNM ACS National Meeting – Indianapolis, IN -- Sep. 8-12, 2013 CINF: Integrative Chemogenomics Knowledge Mining Using NIH Open Access Resources
  • 2. What is BADAPPLE? • BioActivity Data Associative Promiscuity Pattern Learning Engine • Bioassay data analysis algorithm • Scaffold association patterns • Evidence-based • Robust to noise and errors UNM
  • 3. What is promiscuity? • Un-selective bioactivity • Normally undesirable • Evolving conceptions: – Polypharmacology – Systems biology – Systems chemical biology 3
  • 4. Promiscuity & bioassay data analysis: Selected references • Frequent hitters (2002, Schneider &al.) • Aggregators (2003, Shoichet &al.) • ALARM NMR (2004, Hajduk &al.) • Promiscuous scaffolds [talk] (2007, Bologa) • Pan-Assay Interference (2010, Baell &al.) • PubChem Promiscuity (2011, Canny &al.) 4
  • 5. Prerequisites for Promiscuity Data Analysis • Definition of unique chemical entity? – Yes • Definition of unique biological entity? – Challenging • I.e., Rigorously calibrated bioactivity matrix 5 CN/C(=C[N+](=O)[O-])/NCCSCC1=CC=C(O1)CN(C)C.Cl VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
  • 6. 2211 Chemicals On 159 Targets Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011 Bioactivity matrices depend on rigorous informatics 6
  • 7. 2211 Chemicals On 159 Targets Bioactivity matrices depend on rigorous informatics chemistry biology “1 chemistry” = unique compound or… = unique scaffold “1 biology” = unique target? or… = unique assay? Chemical biology space: To define a space must define a point 7 Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011
  • 8. Promiscuity: related concepts • Assay-interferers • Experimental artifacts • Frequent hitters • False positives • True positives • True non-selective actives • Aggregators • Reactives • Cytotoxic 8
  • 9. Badapple promiscuity score: a practical definition for bioassay data analysis • Purpose: Streamline molecular discovery project workflow. • Hence: Score designed to detect "false trails" (true-promiscuous OR false positives) unlikely to be productive leads. 9
  • 10. What is evidence-based? • Empirical data analysis • Mistakes can be un-learned. • Data driven • Not: Expert systems 10
  • 11. Drawbacks of evidence-based • Theories proven wrong. Ouch. • Reality is messy. • GIGO. 11
  • 12. Benefits of evidence-based More wins. More knowledge. Lower costs. Progress. (*) N.B. key role of Dick Cramer, ACS CINF 2013 Skolnik award winner. 12 (*)
  • 13. BADAPPLE scoring function • Scaffold score • By scoring scaffolds, more relevant evidence • Substances, assays and samples considered • Penalize under-sampling • Score is a statistic, not a "model"! • Inherently "validated" 13
  • 14. Avoiding overfitting; being skeptical of scanty evidence 14
  • 15. Avoiding overfitting: Moneyball style Sabermetrics leads the way. The Signal and the Noise: Why So Many Predictions Fail—But Some Don't, Nate Silver (2012). http://mark.stubbornlights.org/phils/archive s/2004_05.html 15
  • 17. Why Scaffolds? biology - birds, bees, nature chemistry - chemists SAR - drugs IP - lawyers
  • 18. Molecular scaffolds are special and useful guides for discovery Jeremy Yang, UNM & IU Cristian Bologa, UNM David Wild, IU Tudor Oprea, UNM ACS National Meeting - Sept. 8-12, 2013 - Indianapolis, IN CINF grad student symposium 9/8/13 & ACS SciMix 9/9/13 8pm
  • 19. There's something about scaffolds... Especially the "privileged few"... pScore Count Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714 19 Promiscuity score distribution
  • 20. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... % of active samples i_scaf, descending pScore order Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 20
  • 21. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 % total activity # scaffolds % scaffolds All 50% 1979 1.4% All 75% 11,645 8% Drugs 50% 54 2.8% Drugs 90% 327 16.7% “total activity” = active scaffold-instances 21
  • 22. Scaffold visualization: Molecule Cloud (Badapple-scaled) The Molecule Cloud - compact visualization of large collections of molecules, Peter Ertl and Bernd Rohde, J. Cheminformatics, 2012, 4:12. 22
  • 23. What is BARD? • BioAssay Research Database, http://bard.nih.gov • MLP: 1000+ assays, 400k+ cpds, 200M data • Manual assay annotations + QA • BARD Assay Ontology • Assay Data Standard • Community platform for bioassay data analysis & computation 25
  • 24. BARD + Badapple Synergy • BARD ontology, based on BAO • BARD raising "semantic IQ" of public bioassay data. • Evidence = information = data + metadata http://bard.nih.gov/
  • 25. BARD Plugin Platform: Community development Enterprise deployment • BARD Plugin spec (IPlugin interface) • BARD PluginValidator class • Java, JAX-RS, Jersey, REST • BARD REST API • WAR deployment, discoverable fig c/o Steve Brudz, Broad
  • 26. Badapple Plugin via BARD web client 28 Discovery scenario: Rapidly flags potentially problematic (notorious) scaffolds.
  • 27. BARD Synergy, next steps: Raising Badapple to the next level Assay Definition Assay Protocol Biology BARD Classes biology chemistry Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology.
  • 28. BARD Synergy, next steps: Raising Badapple to the next level Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology. Biology Types Assay Formats Protein Classes Some re-calibrations of interest:
  • 29. BARD Synergy, next steps: Raising Badapple to the next level cell-based format, 540, 60%single protein format, 161, 18% biochemical format, 76, 8% protein complex format, 38, 4% whole-cell lysate format, 32, 4% small-molecule format, 11, 1% protein format, 10, 1% Assays: assay format cell-based format single protein format biochemical format protein complex format whole-cell lysate format small-molecule format protein format tissue-based format nucleic acid format organism-based format cell membrane format microsome format cell-free format mitochondrion format (blank)
  • 30. BARD Synergy, next steps: Raising Badapple to the next level PROTEIN, 514, 47% PROCESS, 341, 31% GENE, 217, 20% Biology Types PROTEIN PROCESS GENE biology AASUBSTITUTION NCBI
  • 31. BARD Synergy, next steps: Raising Badapple to the next level 33 transporter, 112, 28% receptor, 56, 14% nucleic acid binding, 52, 13% enzyme modulator, 45, 11% cell adhesion molecule, 24, 6% transferase, 23, 6% hydrolase, 22, 6% signaling molecule, 19, 5% extracellular matrix protein, 16, 4% oxidoreductase, 11, 3% membrane traffic protein, 5, 1% Protein Classes transporter receptor nucleic acid binding enzyme modulator cell adhesion molecule transferase hydrolase signaling molecule extracellular matrix protein oxidoreductase membrane traffic protein transfer/carrier protein transcription factor cytoskeletal protein storage protein isomerase defense/immunity protein
  • 32. Conclusions • Badapple exemplar as BARD plugin • Badapple exemplar as evidence-based algorithm • New BARD semantic capabilities will elevate Badapple to next level. • Promiscuity a complex issue. 34
  • 33. Acknowledgements • Steve Mathias, UNM • Chris Lipinski, Melior • Rajarshi Guha, NCATS • Stephan Schurer, UMiami • Uma Vempati, UMiami • Mark Southern, Scripps • BARD Engineering Working Group 35