SlideShare a Scribd company logo
Scaffold-Based Analytics: Enabling
Hit-to-Lead Decisions by Visualizing
Chemical Series Linked Across
Large Datasets
Deepak Bandyopadhyay,
Constantine Kreatsoulas,
Pat G. Brady, Genaro
Scavello, Dac-Trung Nguyen,
Tyler Peryea, Ajit Jadhav
GSK
NCATS
Thanks to:
Lena Dang and Josh Swamidass (WUSTL),
Rajarshi Guha, Stephen Pickett, Martin
Saunders, Nicola Richmond, Darren Green,
Eric Manas, Todd Graybill, Rob Young, Mike
Ouellette, Stan Martens, Javier Gamo,
Lourdes Rueda
Outline
– Intro: analyzing and merging screening output
– Methods for Scaffold-Based Analytics
– Examples – Linking series across datasets
– Hit Prioritization & Scaffold Hopping (TCAMS)
– Dataset Integration & Scaffold Progression (Kinase “X”)
– Conclusion
2
Small Molecule Lead Discovery at GSK
High Throughput Screening
- Maximize chemical diversity
Focused Screening
- Compound sets tailored
to target families
- Small scale process
Fragment Hit ID
- Low mol weight, ligand
efficient starting points
High-Content / Phenotypic
Screen
- Disease-relevant assays
- Target agnostic
Screening
output: large,
diverse, and
difficult to
navigate
3
GSK,
Tres Cantos,
Spain
DNA Encoded Library
Technology (ELT)
- Massive combinatorial libraries
- Binders found by Next-Gen Seq.
Primary bioassay (pIC50)
Orthogonalassay(pIC50)
Manual Data Surfing
Historical Hit Triage - on Individual Compounds
Criteria
– Activity Data
– Potency in a suite of assays
– Selectivity against off-targets
– Inhibition Frequency Index (IFI)
– Physical/Chemical Properties
– MW, solubility, permeability,…
– Property Forecast Index (PFI)
Use case: isolate good chemical starting points and weed out bad ones
Filters
4
IFI (%) = # HTS assays Hit *100
# HTS assays Tested
PFI = Chromatophic LogD + # of aromatic rings
Lower PFI improves chances of positive outcome
in phys/chem assays correlated with developability
IFI: S. Chakravorty, ACS New Orleans 2013 PFI: R. Young, D.V.S. Green, C. Luscombe, A. Hill. Drug Discovery
Today. Volume 16, Numbers 17/18 September 2011 R
Datasets Used in this Presentation
– Tres Cantos Anti-Malarial Set (TCAMS)
– 13.5k public compounds from GSK HTS
– pIC50 against Plasmodium falciparum (PF)
“susceptible” 3D7 strain
– Percent inhibition against “resistant” DD2 strain
– Other properties including IFI
– In-house data on Kinase “X”
– HTS, FBDD, ELT data
Hit
Prioritization
Dataset
Integration
5
Scaffold
Hopping
?
Outline
– Intro: analyzing and merging screening output
– Methods for Scaffold-Based Analytics
– Examples – Linking series across datasets
– Hit Prioritization & Scaffold Hopping (TCAMS)
– Dataset Integration & Scaffold Progression (Kinase “X”)
– Conclusion
6
Automation is Necessary for Screening Hit Triage…
• Manual selection and scaffold/R-group based SAR do not scale
• 5-50k molecules, 1000’s of chemotypes!
• Traditional methods: clustering, substructure/similarity search, …
SSS2 SSS3SSS1
Manually Merge Results
Multiple Substructure SearchesHierarchical Clustering
Scaffold
Network
(adapted
from J.
Swamidass,
swami.wustl.edu)
7
Agglomerative Clustering
Similarity Search
0.9
0.75
… But Clustering Is Not Sufficient for SAR Navigation
– Agglomerative Clustering:
– Hierarchical Clustering:
– Same underlying issues, adds complexity (level of hierarchy, e.g. # rings)
seals
(fur)
?
singleton
?
ducks
(bill)
?
penguins (flipper)
?
Cluster 3 Cluster 10
similar molecules ≠ same cluster
8
Many singletons
Complete Link Cluster ID
ClusterSize
Molecule  single cluster, can be limiting
Proposed Improvement:
Automatic Decomposition into All (Overlapping) Scaffolds
IFI
1.5%
PF 3D7 LE
0.34
PF 3D7 pIC50
8.1 Molecule
Scaffold(s)
Related Molecules
9
…
49 total
…
226 total
2 total
1.5%
0.318.2
Avg IFI
1.5%
Avg pIC50
8.15
Avg LE
0.32
Avg IFI
3.0%
Avg pIC50
7.8
Avg LE
0.45
Avg IFI
4.0%
Avg pIC50
7.8
Avg LE
0.46
10
Next Step: Combine with Activities and Properties
…
49 total
…
226 total
2 total
1.5%
6.4%
8.5
0.51
0.58
8.2
8.0
2.1%
0.57
7.5
3.0%
0.6
18.1%
24.1%
7.7
0.47
0.36
8.5
2.9%
1.5%
7.4
0.57
0.56
7.9
7.7 8.2
5.0%
0.5
4.4%
0.54
Molecule
Scaffold(s)
Annotation
Related Molecules
– 1
Methods Used to Exhaustively Generate Overlapping
Scaffolds
SSSR scaffolds optimized for R-group tables
Frameworks (GSK) Bemis-Murcko like & RECAP
Exhaustive (pro: complete and con: redundant/too simple)
NCATS
R-Group Tool
4
3
2
Rings
Molecule
Scaffold(s)
Related Molecules
11
Scaffold
Network
Generator
Hierarchical
Directed
Graph of
Scaffolds.
Scales
to large
datasets
Details: Integrating Scaffold-Based Analytics
into a Single Spotfire Visualization
Main Data Table: ChemBLNTD_TCAMS
Compound ID, SMILES, Properties, Activities
Scaffolds from
NCATS R-
Group Tool
Compound
ID
Frames from
Data-Driven
Frameworks
Cluster
from
Clustering
Properties &
activities
aggregated by
scaffold
Framework ID,
FW SMILES,
Cpd IDs
Cluster ID,
Cluster Size,
Cpd IDs
Scaffold info:
IDs, SMILES
Cpd Info: IDs,
SMILES, Properties
Scaffold ID
(many)
Top-Level Scaffold
from Scaffold
Network Generator
scaffold 
subscaffold
Compound
Exemplars from
Top-Level Scaffolds
Scaffold ID
(many)
Scaffold ID
(many)
12
subscaffold
 scaffold
n
n
Method Specific
Group IDs
Molecule
Scaffold(s)
Annotation
Related Molecules
We found
Scaffold
Networks
complex
to integrate
& navigate…
Outline
– Intro: analyzing and merging screening output
– Methods for Scaffold-Based Analytics
– Examples – Linking series across datasets
– Hit Prioritization & Scaffold Hopping (TCAMS)
– Dataset Integration & Scaffold Progression (Kinase “X”)
– Conclusion
13
Framework Overlaps in Related Molecules
Reveal Substructures Associated with Activity
14
Framework
not active in
3D7 strain;
not found by
R-group tool Frameworks
active and
overlapping
Framework
moderately
active
Color by:
Framework
Sector size:
# molecules
Size by:
Ligand
Efficiency
(PF 3D7)
Hit
Prioritization
PercentinhibitioninDD2(PFresistantstrain)
pIC50 in 3D7 (PF susceptible strain)
Each pie is one compound
Each sector/color is one framework
Exemplar compounds
PercentinhibitioninDD2(resistantstrain)
pIC50 in 3D7 (PF susceptible strain)
Scaffold Networks Example: Identify
Related Scaffolds with a Desirable Profile
15
Trellis by:
# rings in
scaffold
Color by:
Top-Level
Scaffold
Size by:
Ligand
Efficiency
(PF 3D7)
Scaffold
Hopping
?
… possibly
more layers
with higher
# rings …
Find new bicyclic and tricyclic scaffolds
active against resistant DD2 strain
Original tricyclic scaffold inactive
against resistant DD2 strain
RINGS = RINGS =
NCATS R-Group Tool Connects Molecules to
Scaffolds with Aggregate Data and Drill-Down
16
– Minimum # of “useful” scaffolds
– Tautomers under single scaffold
Bonus: sensible R-group tables generated
5.7k scaffolds, filtered to 428 by max pIC50
Avg.IFI
Avg. pIC50 in 3D7 (PF sensitive strain)
NCATS R-Group Tool Example:
Deconstruct SAR of Related Molecules
Quinazolines
alone active,
ligand efficient
Discover alt. tricycles
Indazoles
alone only
weakly
active
17
Scaffold
Hopping
?
pIC50 in 3D7 (PF susceptible strain)
IFI
Fuse Design Ideas
Each pie is one compound
Each sector/color is one scaffold
Size by Ligand Efficiency (3D7)
NCATS R-Group Tool Example:
Iterative SAR Exploration
New tricycle scaffold
(1824) seems more
active than indoles or
quinazolines alone
18
pIC50 in 3D7 (PF susceptible strain)
IFI
Scaffold
Hopping
?
Each pie is one compound
Each sector/color is one scaffold
Size by Ligand Efficiency (3D7)
Scaffold-Based Decision Making
and Hit ID Integration
– Kinase “X”
– Candidate compound demonstrates exquisite kinase selectivity
– Active against Wild-Type, Inactive against Mutant enzyme
– Backup program
– New screens analyzed & integrated using NCATS R-Group Tool
19
HTS 2014
350K top-up
3613 pIC50s
HTS 2012
2M screened
4564 pIC50s
2011 2012 2014 (backup)
Fragment
hits
288 pIC50s
DNA ELT
130 libraries
824 features
No activity dataActivity data available
9259
cpds
Goal: identify selective backup series from new Hit ID efforts
Dataset
Integration
HTS 2014 hit
Selective Lead Series Linked Across Datasets
20
MeanΔ(WTpIC50–mutantpIC50)
Mean PFIpred
Scaffold-Level Details:
Mech. pIC50: 7.1
Cell pIC50: 6.3
LE: 0.44
Statistics for 8 exemplars
Mech. pIC50: 6.0 ± 0.88
Cell pIC50: 5.3 ± 0.81
LE: 0.35 ± 0.05
Chemistry initiated on series!
HTS 2012 hit (not followed up)
Scaffold classification by mutant binding
Selective WT/mut.
Non-selective
Size: pIC50
Assay Drill-Down:
Mechanistic
Full-length WT
Truncated WT
Cell
Mutant
pIC50
GSK Compound ID
20122014
Dataset
Integration
Identify and Test Unmeasured Compounds
Based on Overlap with Actives Across Datasets
PFI PFI
MW
Ligand-
efficient
HTS hit
Ligand-efficient
HTS and
fragment hits
21
Dataset
Integration
Weak active for Kinase “X”
Trellis by
Scaffold
Color by LE
Shape by:
Identify and Test Unmeasured Compounds
Based on Overlap with Actives Across Datasets
PFI PFI
MW
Ligand-
efficient
HTS hit
Low
MW/PFI
untested
fragment
Low MW/PFI
ELT feature
to synthesize
Ligand-efficient
HTS and
fragment hits
Low
MW/PFI
untested
fragment
Low MW/PFI
ELT feature
to synthesize
22
Dataset
Integration
Weak active for Kinase “X”
Trellis by
Scaffold
Color by LE
Shape by:
Conclusions and Future Directions
23
• Merging datasets using scaffolds enables a cohesive visualization
of chemical series and suggests opportunities for hybridization
• Automated scaffold and R-group generation is a powerful way to
prioritize hits and replace scaffolds in large and diverse datasets
• Partitioning into clusters is ambiguous, incomplete for SAR navigation.
• Scaffold-Generation Methods (Frameworks, Scaffold Networks,
NCATS R-Group Tool) have their differences, pros and cons
• All methods revealed similar insights from the TCAMS dataset
• Future improvements:
• Scalability to larger and ever-changing datasets
• Automated selection of informative overlapping scaffolds
• Combining multiple scaffold-generation methods
Thank You & Questions
24
Backup and References
– Scaffold Generation Methods:
– NCATS R-group analysis (http://tripod.nih.gov/?p=46 )
– Frameworks (Data-Driven Clustering, GSK/ChemAxon)
– Scaffold Network Generator (http://swami.wustl.edu/sng)
– Agglomerative Clustering (Complete Linkage, GSK/ChemAxon)
25
G. Harper, G. S. Bravi, S. D. Pickett, J. Hussain, and D. V. S.
Green. J. Chem. Inf. Comput. Sci., 44(6), 2145-2156 (2004)
NCATS R–group tool @
http://tripod.nih.gov
M. K. Matlock, J.M. Zaretzki, and S. J. Swamidass.
Bioinformatics. 29(20), 2655-2656 (2013).
Hit Prioritization via Clustering:
Exploration within Pre-determined Groups Only
– ~2000 complete linkage clusters in TCAMS set
– Initial clustering limits neighbors you can discover
Percent inh. in DD2 (PF resistant strain)
IFI
Query molecules (scatter plot)
pXC50 in 3D7 (PF susceptible strain)
#aromaticrings
26
Hit
Prioritization
Using GSK Frameworks
– 80k GSK frameworks, 7.5k RECAP fragments in TCAMS set
– Score of a framework = Average activity of molecules containing it
– Low scoring frameworks can be filtered out
– Issues identified:
– Many equivalent and redundant frameworks
– Tautomers not unified by current implementation
27
Related Molecules with Framework Overlaps:
Reveal Potential Scaffold Hops
Shared framework,
Related chemotypes
Opportunity to design
hybrid series
Color by:
Framework
Sector size:
# molecules
Size by:
Ligand
Efficiency
28
Scaffold
Hopping
?
PercentinhibitioninDD2(PFresistantstrain)
pXC50 in 3D7 (PF susceptible strain)
Molecule
Scaffold(s)
Related Molecules
Each pie is one compound
Each sector/color is one framework
Hit Prioritization via Scaffold Networks:
Navigate to Related Scaffolds
13.5k compounds map to 7715 top-level scaffolds
(28.5k total)
29
Color by:
Top-Level Scaffold
Size by:
Ligand Efficiency
Trellis by:
Number
of rings in
scaffold
Hit
Prioritization
Percent inhibition in DD2 (PF resistant strain)
pXC50in3D7(PFsusceptiblestrain)
2
3
4+
Rings
… possibly more layers with higher # rings …
Related Molecules from NCATS R-Group Tool:
Visualizing Scaffold Overlap and Activity
Co-occurring
active scaffolds
Scaffold 4719
active by itself
Scaffold 978 alone
not highly active
30
pXC50 in 3D7 (PF susceptible strain)
IFI
Hit
Prioritization
Each pie is one compound
Each sector/color is one scaffold

More Related Content

What's hot

【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
Deep Learning JP
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
Vincenzo Lomonaco
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
Zhuyi Xue
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
Donghyeon Kim
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
Deep Learning JP
 
8.protein ligand interactions
8.protein ligand interactions8.protein ligand interactions
8.protein ligand interactions
Abhijeet Kadam
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
Recombination Technology
Recombination TechnologyRecombination Technology
Recombination Technology
Zahid Azeem
 
[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...
[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...
[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...
Deep Learning JP
 
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
QIAGEN
 
【ICLR2023】論文紹介: Image as Set of Points
【ICLR2023】論文紹介: Image as Set of Points【ICLR2023】論文紹介: Image as Set of Points
【ICLR2023】論文紹介: Image as Set of Points
Shoki Miyagawa
 
ラベル付けのいろは
ラベル付けのいろはラベル付けのいろは
ラベル付けのいろは
Kensuke Mitsuzawa
 
Asymetric -PCR
Asymetric -PCRAsymetric -PCR
Asymetric -PCR
Asraful Islam Rayhan
 
Introduction to continual learning
Introduction to continual learningIntroduction to continual learning
Introduction to continual learning
Nguyen Giang
 
ChIP-seq
ChIP-seqChIP-seq
Role of Tensors in Machine Learning
Role of Tensors in Machine LearningRole of Tensors in Machine Learning
Role of Tensors in Machine Learning
Anima Anandkumar
 
Generative Image Inpainting with Contextual Attention (by Jiahui Yu)
Generative Image Inpainting with Contextual Attention (by Jiahui Yu) Generative Image Inpainting with Contextual Attention (by Jiahui Yu)
Generative Image Inpainting with Contextual Attention (by Jiahui Yu)
Tomoki Tanimura
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
Bioinformatics and Computational Biosciences Branch
 

What's hot (20)

【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
【DL輪読会】Emergent World Representations: Exploring a Sequence ModelTrained on a...
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
 
8.protein ligand interactions
8.protein ligand interactions8.protein ligand interactions
8.protein ligand interactions
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Recombination Technology
Recombination TechnologyRecombination Technology
Recombination Technology
 
[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...
[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...
[DL Hacks]Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternati...
 
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
 
【ICLR2023】論文紹介: Image as Set of Points
【ICLR2023】論文紹介: Image as Set of Points【ICLR2023】論文紹介: Image as Set of Points
【ICLR2023】論文紹介: Image as Set of Points
 
ラベル付けのいろは
ラベル付けのいろはラベル付けのいろは
ラベル付けのいろは
 
Asymetric -PCR
Asymetric -PCRAsymetric -PCR
Asymetric -PCR
 
Introduction to continual learning
Introduction to continual learningIntroduction to continual learning
Introduction to continual learning
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
Role of Tensors in Machine Learning
Role of Tensors in Machine LearningRole of Tensors in Machine Learning
Role of Tensors in Machine Learning
 
Generative Image Inpainting with Contextual Attention (by Jiahui Yu)
Generative Image Inpainting with Contextual Attention (by Jiahui Yu) Generative Image Inpainting with Contextual Attention (by Jiahui Yu)
Generative Image Inpainting with Contextual Attention (by Jiahui Yu)
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 

Similar to Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemical Series Linked Across Large Datasets (ACS Boston 2015)

presentation
presentationpresentation
presentation
Peter Langfelder
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
Enrico Glaab
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Natalio Krasnogor
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
Fahadahammed2
 
May workshop
May workshopMay workshop
May workshop
Fahadahammed2
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
Natalio Krasnogor
 
AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdf
Layne Sadler
 
Population-Based DNA Variant Analysis
Population-Based DNA Variant AnalysisPopulation-Based DNA Variant Analysis
Population-Based DNA Variant Analysis
Golden Helix
 
Omics data integration for MSA | International Society for Clinical Biostatis...
Omics data integration for MSA | International Society for Clinical Biostatis...Omics data integration for MSA | International Society for Clinical Biostatis...
Omics data integration for MSA | International Society for Clinical Biostatis...
Said el Bouhaddani 👩‍💻
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
ayeshasattarsandhu
 
Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012
Deepak Bandyopadhyay
 
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADEExtracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Nikolas Pontikos
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
HorizonDiscovery
 
A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016
Andrew Pope
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
Joel Ricci-López
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
Paul Gardner
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
David Cook
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptx
Fatma Sayed Ibrahim
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Nils Gehlenborg
 

Similar to Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemical Series Linked Across Large Datasets (ACS Boston 2015) (20)

presentation
presentationpresentation
presentation
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
May workshop
May workshopMay workshop
May workshop
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdf
 
Population-Based DNA Variant Analysis
Population-Based DNA Variant AnalysisPopulation-Based DNA Variant Analysis
Population-Based DNA Variant Analysis
 
Omics data integration for MSA | International Society for Clinical Biostatis...
Omics data integration for MSA | International Society for Clinical Biostatis...Omics data integration for MSA | International Society for Clinical Biostatis...
Omics data integration for MSA | International Society for Clinical Biostatis...
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012
 
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADEExtracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016A_Pope_RQRM_LeadDisc_June_2016
A_Pope_RQRM_LeadDisc_June_2016
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptx
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 

Recently uploaded

"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 

Recently uploaded (20)

"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 

Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemical Series Linked Across Large Datasets (ACS Boston 2015)

  • 1. Scaffold-Based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemical Series Linked Across Large Datasets Deepak Bandyopadhyay, Constantine Kreatsoulas, Pat G. Brady, Genaro Scavello, Dac-Trung Nguyen, Tyler Peryea, Ajit Jadhav GSK NCATS Thanks to: Lena Dang and Josh Swamidass (WUSTL), Rajarshi Guha, Stephen Pickett, Martin Saunders, Nicola Richmond, Darren Green, Eric Manas, Todd Graybill, Rob Young, Mike Ouellette, Stan Martens, Javier Gamo, Lourdes Rueda
  • 2. Outline – Intro: analyzing and merging screening output – Methods for Scaffold-Based Analytics – Examples – Linking series across datasets – Hit Prioritization & Scaffold Hopping (TCAMS) – Dataset Integration & Scaffold Progression (Kinase “X”) – Conclusion 2
  • 3. Small Molecule Lead Discovery at GSK High Throughput Screening - Maximize chemical diversity Focused Screening - Compound sets tailored to target families - Small scale process Fragment Hit ID - Low mol weight, ligand efficient starting points High-Content / Phenotypic Screen - Disease-relevant assays - Target agnostic Screening output: large, diverse, and difficult to navigate 3 GSK, Tres Cantos, Spain DNA Encoded Library Technology (ELT) - Massive combinatorial libraries - Binders found by Next-Gen Seq.
  • 4. Primary bioassay (pIC50) Orthogonalassay(pIC50) Manual Data Surfing Historical Hit Triage - on Individual Compounds Criteria – Activity Data – Potency in a suite of assays – Selectivity against off-targets – Inhibition Frequency Index (IFI) – Physical/Chemical Properties – MW, solubility, permeability,… – Property Forecast Index (PFI) Use case: isolate good chemical starting points and weed out bad ones Filters 4 IFI (%) = # HTS assays Hit *100 # HTS assays Tested PFI = Chromatophic LogD + # of aromatic rings Lower PFI improves chances of positive outcome in phys/chem assays correlated with developability IFI: S. Chakravorty, ACS New Orleans 2013 PFI: R. Young, D.V.S. Green, C. Luscombe, A. Hill. Drug Discovery Today. Volume 16, Numbers 17/18 September 2011 R
  • 5. Datasets Used in this Presentation – Tres Cantos Anti-Malarial Set (TCAMS) – 13.5k public compounds from GSK HTS – pIC50 against Plasmodium falciparum (PF) “susceptible” 3D7 strain – Percent inhibition against “resistant” DD2 strain – Other properties including IFI – In-house data on Kinase “X” – HTS, FBDD, ELT data Hit Prioritization Dataset Integration 5 Scaffold Hopping ?
  • 6. Outline – Intro: analyzing and merging screening output – Methods for Scaffold-Based Analytics – Examples – Linking series across datasets – Hit Prioritization & Scaffold Hopping (TCAMS) – Dataset Integration & Scaffold Progression (Kinase “X”) – Conclusion 6
  • 7. Automation is Necessary for Screening Hit Triage… • Manual selection and scaffold/R-group based SAR do not scale • 5-50k molecules, 1000’s of chemotypes! • Traditional methods: clustering, substructure/similarity search, … SSS2 SSS3SSS1 Manually Merge Results Multiple Substructure SearchesHierarchical Clustering Scaffold Network (adapted from J. Swamidass, swami.wustl.edu) 7 Agglomerative Clustering Similarity Search 0.9 0.75
  • 8. … But Clustering Is Not Sufficient for SAR Navigation – Agglomerative Clustering: – Hierarchical Clustering: – Same underlying issues, adds complexity (level of hierarchy, e.g. # rings) seals (fur) ? singleton ? ducks (bill) ? penguins (flipper) ? Cluster 3 Cluster 10 similar molecules ≠ same cluster 8 Many singletons Complete Link Cluster ID ClusterSize Molecule  single cluster, can be limiting
  • 9. Proposed Improvement: Automatic Decomposition into All (Overlapping) Scaffolds IFI 1.5% PF 3D7 LE 0.34 PF 3D7 pIC50 8.1 Molecule Scaffold(s) Related Molecules 9 … 49 total … 226 total 2 total
  • 10. 1.5% 0.318.2 Avg IFI 1.5% Avg pIC50 8.15 Avg LE 0.32 Avg IFI 3.0% Avg pIC50 7.8 Avg LE 0.45 Avg IFI 4.0% Avg pIC50 7.8 Avg LE 0.46 10 Next Step: Combine with Activities and Properties … 49 total … 226 total 2 total 1.5% 6.4% 8.5 0.51 0.58 8.2 8.0 2.1% 0.57 7.5 3.0% 0.6 18.1% 24.1% 7.7 0.47 0.36 8.5 2.9% 1.5% 7.4 0.57 0.56 7.9 7.7 8.2 5.0% 0.5 4.4% 0.54 Molecule Scaffold(s) Annotation Related Molecules
  • 11. – 1 Methods Used to Exhaustively Generate Overlapping Scaffolds SSSR scaffolds optimized for R-group tables Frameworks (GSK) Bemis-Murcko like & RECAP Exhaustive (pro: complete and con: redundant/too simple) NCATS R-Group Tool 4 3 2 Rings Molecule Scaffold(s) Related Molecules 11 Scaffold Network Generator Hierarchical Directed Graph of Scaffolds. Scales to large datasets
  • 12. Details: Integrating Scaffold-Based Analytics into a Single Spotfire Visualization Main Data Table: ChemBLNTD_TCAMS Compound ID, SMILES, Properties, Activities Scaffolds from NCATS R- Group Tool Compound ID Frames from Data-Driven Frameworks Cluster from Clustering Properties & activities aggregated by scaffold Framework ID, FW SMILES, Cpd IDs Cluster ID, Cluster Size, Cpd IDs Scaffold info: IDs, SMILES Cpd Info: IDs, SMILES, Properties Scaffold ID (many) Top-Level Scaffold from Scaffold Network Generator scaffold  subscaffold Compound Exemplars from Top-Level Scaffolds Scaffold ID (many) Scaffold ID (many) 12 subscaffold  scaffold n n Method Specific Group IDs Molecule Scaffold(s) Annotation Related Molecules We found Scaffold Networks complex to integrate & navigate…
  • 13. Outline – Intro: analyzing and merging screening output – Methods for Scaffold-Based Analytics – Examples – Linking series across datasets – Hit Prioritization & Scaffold Hopping (TCAMS) – Dataset Integration & Scaffold Progression (Kinase “X”) – Conclusion 13
  • 14. Framework Overlaps in Related Molecules Reveal Substructures Associated with Activity 14 Framework not active in 3D7 strain; not found by R-group tool Frameworks active and overlapping Framework moderately active Color by: Framework Sector size: # molecules Size by: Ligand Efficiency (PF 3D7) Hit Prioritization PercentinhibitioninDD2(PFresistantstrain) pIC50 in 3D7 (PF susceptible strain) Each pie is one compound Each sector/color is one framework Exemplar compounds
  • 15. PercentinhibitioninDD2(resistantstrain) pIC50 in 3D7 (PF susceptible strain) Scaffold Networks Example: Identify Related Scaffolds with a Desirable Profile 15 Trellis by: # rings in scaffold Color by: Top-Level Scaffold Size by: Ligand Efficiency (PF 3D7) Scaffold Hopping ? … possibly more layers with higher # rings … Find new bicyclic and tricyclic scaffolds active against resistant DD2 strain Original tricyclic scaffold inactive against resistant DD2 strain RINGS = RINGS =
  • 16. NCATS R-Group Tool Connects Molecules to Scaffolds with Aggregate Data and Drill-Down 16 – Minimum # of “useful” scaffolds – Tautomers under single scaffold Bonus: sensible R-group tables generated 5.7k scaffolds, filtered to 428 by max pIC50 Avg.IFI Avg. pIC50 in 3D7 (PF sensitive strain)
  • 17. NCATS R-Group Tool Example: Deconstruct SAR of Related Molecules Quinazolines alone active, ligand efficient Discover alt. tricycles Indazoles alone only weakly active 17 Scaffold Hopping ? pIC50 in 3D7 (PF susceptible strain) IFI Fuse Design Ideas Each pie is one compound Each sector/color is one scaffold Size by Ligand Efficiency (3D7)
  • 18. NCATS R-Group Tool Example: Iterative SAR Exploration New tricycle scaffold (1824) seems more active than indoles or quinazolines alone 18 pIC50 in 3D7 (PF susceptible strain) IFI Scaffold Hopping ? Each pie is one compound Each sector/color is one scaffold Size by Ligand Efficiency (3D7)
  • 19. Scaffold-Based Decision Making and Hit ID Integration – Kinase “X” – Candidate compound demonstrates exquisite kinase selectivity – Active against Wild-Type, Inactive against Mutant enzyme – Backup program – New screens analyzed & integrated using NCATS R-Group Tool 19 HTS 2014 350K top-up 3613 pIC50s HTS 2012 2M screened 4564 pIC50s 2011 2012 2014 (backup) Fragment hits 288 pIC50s DNA ELT 130 libraries 824 features No activity dataActivity data available 9259 cpds Goal: identify selective backup series from new Hit ID efforts Dataset Integration
  • 20. HTS 2014 hit Selective Lead Series Linked Across Datasets 20 MeanΔ(WTpIC50–mutantpIC50) Mean PFIpred Scaffold-Level Details: Mech. pIC50: 7.1 Cell pIC50: 6.3 LE: 0.44 Statistics for 8 exemplars Mech. pIC50: 6.0 ± 0.88 Cell pIC50: 5.3 ± 0.81 LE: 0.35 ± 0.05 Chemistry initiated on series! HTS 2012 hit (not followed up) Scaffold classification by mutant binding Selective WT/mut. Non-selective Size: pIC50 Assay Drill-Down: Mechanistic Full-length WT Truncated WT Cell Mutant pIC50 GSK Compound ID 20122014 Dataset Integration
  • 21. Identify and Test Unmeasured Compounds Based on Overlap with Actives Across Datasets PFI PFI MW Ligand- efficient HTS hit Ligand-efficient HTS and fragment hits 21 Dataset Integration Weak active for Kinase “X” Trellis by Scaffold Color by LE Shape by:
  • 22. Identify and Test Unmeasured Compounds Based on Overlap with Actives Across Datasets PFI PFI MW Ligand- efficient HTS hit Low MW/PFI untested fragment Low MW/PFI ELT feature to synthesize Ligand-efficient HTS and fragment hits Low MW/PFI untested fragment Low MW/PFI ELT feature to synthesize 22 Dataset Integration Weak active for Kinase “X” Trellis by Scaffold Color by LE Shape by:
  • 23. Conclusions and Future Directions 23 • Merging datasets using scaffolds enables a cohesive visualization of chemical series and suggests opportunities for hybridization • Automated scaffold and R-group generation is a powerful way to prioritize hits and replace scaffolds in large and diverse datasets • Partitioning into clusters is ambiguous, incomplete for SAR navigation. • Scaffold-Generation Methods (Frameworks, Scaffold Networks, NCATS R-Group Tool) have their differences, pros and cons • All methods revealed similar insights from the TCAMS dataset • Future improvements: • Scalability to larger and ever-changing datasets • Automated selection of informative overlapping scaffolds • Combining multiple scaffold-generation methods
  • 24. Thank You & Questions 24
  • 25. Backup and References – Scaffold Generation Methods: – NCATS R-group analysis (http://tripod.nih.gov/?p=46 ) – Frameworks (Data-Driven Clustering, GSK/ChemAxon) – Scaffold Network Generator (http://swami.wustl.edu/sng) – Agglomerative Clustering (Complete Linkage, GSK/ChemAxon) 25 G. Harper, G. S. Bravi, S. D. Pickett, J. Hussain, and D. V. S. Green. J. Chem. Inf. Comput. Sci., 44(6), 2145-2156 (2004) NCATS R–group tool @ http://tripod.nih.gov M. K. Matlock, J.M. Zaretzki, and S. J. Swamidass. Bioinformatics. 29(20), 2655-2656 (2013).
  • 26. Hit Prioritization via Clustering: Exploration within Pre-determined Groups Only – ~2000 complete linkage clusters in TCAMS set – Initial clustering limits neighbors you can discover Percent inh. in DD2 (PF resistant strain) IFI Query molecules (scatter plot) pXC50 in 3D7 (PF susceptible strain) #aromaticrings 26 Hit Prioritization
  • 27. Using GSK Frameworks – 80k GSK frameworks, 7.5k RECAP fragments in TCAMS set – Score of a framework = Average activity of molecules containing it – Low scoring frameworks can be filtered out – Issues identified: – Many equivalent and redundant frameworks – Tautomers not unified by current implementation 27
  • 28. Related Molecules with Framework Overlaps: Reveal Potential Scaffold Hops Shared framework, Related chemotypes Opportunity to design hybrid series Color by: Framework Sector size: # molecules Size by: Ligand Efficiency 28 Scaffold Hopping ? PercentinhibitioninDD2(PFresistantstrain) pXC50 in 3D7 (PF susceptible strain) Molecule Scaffold(s) Related Molecules Each pie is one compound Each sector/color is one framework
  • 29. Hit Prioritization via Scaffold Networks: Navigate to Related Scaffolds 13.5k compounds map to 7715 top-level scaffolds (28.5k total) 29 Color by: Top-Level Scaffold Size by: Ligand Efficiency Trellis by: Number of rings in scaffold Hit Prioritization Percent inhibition in DD2 (PF resistant strain) pXC50in3D7(PFsusceptiblestrain) 2 3 4+ Rings … possibly more layers with higher # rings …
  • 30. Related Molecules from NCATS R-Group Tool: Visualizing Scaffold Overlap and Activity Co-occurring active scaffolds Scaffold 4719 active by itself Scaffold 978 alone not highly active 30 pXC50 in 3D7 (PF susceptible strain) IFI Hit Prioritization Each pie is one compound Each sector/color is one scaffold

Editor's Notes

  1. Data visualization & exploration environment (we use Spotfire). PFI lipo akin to cLogP. Lower is better. 30 sec.
  2. Adding the hier does not fix the agg isues, only adds complexity in navigation . Things at different levels may not be matched
  3. What I will be describing is a method that exhaustively finds all possible shared (or common or frequent) substructures – which we call scaffolds within your data set using a tool from the NIH. Here is a screening hit that I will use to demonstrate this. … (don’t need to go into gory details) Biaryl substructure is contained in these molecules that have low similarity to the original hit molecule.
  4. We can aggregate activities & properties at the scaffold-level and then drill-down to the underlying data for individual compounds to progress scaffolds of interest.
  5. Text up top. Grey out clustering. Purple box for aggregate props.
  6. Preprocessed substructure search: which substructure encodes activity?
  7. 10 sec. short script
  8. We used the scaffolds to merge all of this data and identify more series that bind selectively
  9. Key message: prioritize ELT with no activity data, just based on overlap with actives from other datasets
  10. This slide can be backup.
  11. Automated substructure search to find part of molecule that’s active. Backup?
  12. Backup
  13. Backup?