SlideShare a Scribd company logo
1 of 28
Download to read offline
Molecular and Data Visualization
in Drug Discovery
Deepak Bandyopadhyay
GlaxoSmithKline
Intro: Human Body & Disease Biology
• From Wikipedia:
– Abnormal condition that affects part or all of an organism.
– Associated with specific symptoms and signs.
• Causes:
– Single cause, e.g. pathogen, poison, nutrient deficiency, genetics
– Multiple factors including environment, lifestyle, genetics
http://www.biologyguide.net/biol1/1_disease.htm
Mycobacterium
tuberculosis
Chest X-ray showing
lung cancer
Drug Discovery Parts/Timeline
Focus of Drug Discovery
• Narrow down on one or a
few substances to test in
humans and develop into a
drug that treats a disease
Components:
Target Selection
and Validation
genome
protein
link to
disease
disease
genetics
pathology
biological
target
In Vitro BiologyMedicinal Chemistry
(Lead Optimization)
Lead Discovery
(a.k.a. Screening)
In Vivo Biology
Molecular and Data Visualization
• The two parts of my job at GSK! 
• Molecules:
– small (drugs/peptides) and large
(proteins/DNA/RNA/lipids)
– visualized in 1D (SMILES), 2D (structure), 3D
(coords / conformations), 4D (Mol. Dynamics)
• Data:
– Format: numeric / text,
continuous / categorical,
Delimited/database/XML/proprietary
– Source: instruments, manual entry, calculation
– About drug discovery projects (key: molecule ID),
genomics/proteomics (key: gene/protein ID),
clinical studies (key: anon. patient ID), …
Ibuprofen
DRUG
PROTEIN
EGFR
Ball
and
stick
EGFR
ribbons
Movie: Introduction to Drug Design
By Schrödinger (molecular modeling software company): https://www.youtube.com/watch?v=u49k72rUdyc
Bioactivity 101
• Concentration-Response curve and IC50
• Structure Activity Relationship (SAR)
pIC50 = -log IC50
IC50 = 12.8 uM
(micromolar)
pIC50 = 6-log10IC50
= 4.89
Think
Avogadro,
pH…
Molecular Visualization Deconstructed
• Representations
• Navigation
• Interaction
• What would you add?
Aspirin (ligand)
Cox-1 (protein)
Binding
pocket
surface
polar
+ve charge
hydrophobic
-ve charge
XY translate, Z zoom
Rotate about X/Y or Z
E.g. in program MOE 
F1
F2
F3
Save/restore
scenes
Select Hide/Show Center Prev/Next Scene
Expand Sel. Import/Export Align Compute…
Purposes of Molecule Visualization
• Understand and rationalize “SAR” in 3D
• (Protein) Structure-Based Drug Design. E.g.:
– Aspirin Binds COX1/2, Celebrex binds COX2 only
• Clearly illustrate biological systems / processes
• What other tasks can you think of?
Case study 1: Protein-Protein Interactions
HIV-1 coat protein gp120 bound to antibody 17b (Light, Heavy) and CD4
gp120/CD4 interfacegp120/antibody L/H interface
Rank color:  >  >  >  >  >  > 
Ban, Y. E. A., Edelsbrunner, H., & Rudolph, J. (2006). Interface surfaces for protein-protein complexes. J. ACM, 53(3), 361-378.
Case-Study 2: Molecular Dynamics Simulation
of a drug entering into the binding site of a target protein
Decherchi et al., Nature Comms. 6(6155), 2015. https://www.youtube.com/watch?v=ckTqh50r_2w
From Molecules to Data
Mol spreadsheets, visualizations
StarDrop Glowing Molecules™ image from
http://www.asteris-app.com/technical-info.htm
Hybrid molecule/data visualization
Software Systems: Spotfire
• Feature set / distinguishing factors:
– Handling large datasets via filtering and
memory management
– Tabular file (CSV, Excel) or database input
– Multiple, configurable visualization types
– Easy enough for domain experts to use / share
– Life science add-ons
• Molecule depiction
• Specialized –omics packages
Binned pIC50 trellised by HBA and HBDpIC50 vs. % inh
Software Systems: LiveDesign
• Consolidate multiple disconnected tools for molecule design
– Integrated Single Platform
– Intuitive UI
– 2D, 3D, Data & Visuals
– Social aspect
Dimensions, dimensions…
• Molecules: 1D (SMILES e.g. c1ccccc1),
2D (depiction), 3D (coords), 4D (motion)
• Data:
– 100s of activities, measured and predicted properties
per row (compound)
– ~100K for gene expression, clinical trial data
– Millions for –omics, next-gen sequencing
– Then there’s systems biology…
• Dimensionality reduction is a key capability
– PCA, SOM, Stochastic Proximity Embedding,…
Challenges / Types of Visualization
• Key capabilities for data visualization
– Large data  human comprehension
– High-level summary + drill-down
– Quickly (auto?) isolate interesting data points
http://guides.library.duke.edu/datavis/vis_types
map
SOM
Parallel coords
Heat mapprotein
Volume
rendering
http://flagshipbio.com/amino-acid-structure-properties-using-self-organizing-maps/
Radar
plot
Box Plot
Sunburst
2D 3D nD hierarchical
Dendro-
gram
Network/Graph
layout
Wikipedia
All the Data at Once: Vlaaivis
T. J. Howe, G. Mahieu, P. Marichal,T. Tabruyn and P. Vugts. Data reduction and representation in drug discovery. Drug Discovery Today 12(1/2):45-53 Jan 2007 R
All the Data at Once (cont’d): Radar Plots
• Circular histogram for viewing multi-parameter results
The influence of the 'organizationalfactor'on compoundquality in drug discovery
Paul D. Leeson & Stephen A. St-Gallay
Nature Reviews Drug Discovery 10, 749-765 (October 2011)
Property differences are scaled to either +1, whereby the company with a positive ('best') property value had the
highest magnitude, or −1, whereby the company with the lowest ('worst') value had the highest magnitude.
Visualizing Large Datasets
P. Ertl & B. Rohde, J. Cheminformatics 4(12), 2012
Gaspar et al. J. Chem. Inf. Model., 2015, 55 (1), pp 84–94
Network-like
similarity graph
Bajorath et al.
• Dimensionality reduction
• Graph layout
• Activity landscape
• Probabilistic property plots
• Scaffold abstraction
Steven Muchmore,
Abbott Labs
(now Abbvie)
Molecule cloud
MolecularProperty 1 
MolecularProperty2
Probabilityofsuccess
(crossingcellmembrane)
SAR Tables
• SAR: Structure-Activity Relationship
– Split molecule: core/scaffold, pendant R-groups
– SAR Table: molecule spreadsheet with
R-groups and Activity Data
(-OH)
(-COOH)
SAR Maps - R1 vs. R2 on a Core
Selectiveforprotein1pIC502‒pIC501Selectiveforprotein2
R1
R2
Core “scaffold”:
D. K. Agrafiotis et al. SAR Maps:  A New SAR Visualization Technique for Medicinal Chemists. J. Med. Chem., 2007, 50 (24), 5926–5937.
Clustering
• Based on chemical descriptors, biological activity, etc…
• Agglomerative or hierarchical
Hoek, Keith S. et al.: Metastatic potential of melanomas defined by specific gene
expression profiles with no BRAF signature. Pigment Cell Research 19 (4), 290-302
http://chemmine.ucr.edu/help/
Molecules Genes
Limitations of Clustering
Molecule  single cluster, can be limiting
seals
(fur)
?
singleton
?
ducks
(bill)
?
penguins (flipper)
?
Cluster 3 Cluster 10
similar molecules ≠ same cluster
Many singletons
Complete Link Cluster ID
ClusterSize
Automatic Decomposition into
(All) Overlapping Scaffolds
Malarial parasite
assay pIC50 8.1
…
49 total
…
226 total
2 total
Molecule
Scaffold(s)
Related Molecules
8.2
Avg pIC50
8.15
Avg pIC50
7.8
Avg pIC50
7.8
Next Step: Combine with
Activities and Properties
…
49 total
…
226 total
2 total
8.5
8.2
8.0
7.5
7.7
8.5
7.4
7.9
7.7 8.2
Molecule
Scaffold(s)
Annotation
Related Molecules
Case Study: Linking Molecules By Scaffolds
• Use aggregate properties for decision making
• Find related molecules with improved properties
 Improving property 1
Improvingactivity2
Aggregate
(scaffold)
↓
Drill down
(8 molecules)
Improving activity 3 
Improvingproperty4
 > 
Keep top half of molecule,
substitute bottom half
Example 1 Example 2
Summary and Lessons Learned
• Drug discovery has specialized types of data that are
best understood by visualization
• Good visualizations can support the making of good
decisions (and the converse: GIGO…)
• The human element is important – visuals and
analytics should be creatable/usable by scientists
• As new visual analytics experts, consider careers in
an industry where you can add value and be creative
– Subtle plug for drug discovery 
Future Directions and Challenges in
Data Visualization for Drug Discovery
• Human vs. Machine or Human + Machine ?
• Automate tediousness of data prep/integration
• Intuitiveness by design
• Interconnection by design
• Integration of latest visualization techniques
developed for other domains
• Using emerging media eg. VR, Kinect
• What can you think of?
Questions?

More Related Content

What's hot

Molecular docking
Molecular dockingMolecular docking
Molecular docking
palliyath91
 

What's hot (20)

Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Traditional drug design
Traditional drug designTraditional drug design
Traditional drug design
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Bioinformatics for beginners
Bioinformatics for beginnersBioinformatics for beginners
Bioinformatics for beginners
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structure
 
molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...
 
OVERVIEW OF MODERN DRUG DISCOVERY PROCESS
OVERVIEW OF MODERN DRUG DISCOVERY PROCESSOVERVIEW OF MODERN DRUG DISCOVERY PROCESS
OVERVIEW OF MODERN DRUG DISCOVERY PROCESS
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Stages of drug discovery
Stages of drug discoveryStages of drug discovery
Stages of drug discovery
 
Pharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentPharmacophore mapping in Drug Development
Pharmacophore mapping in Drug Development
 
Molecular pharmacology
Molecular pharmacologyMolecular pharmacology
Molecular pharmacology
 
Pharmacophore mapping
Pharmacophore mapping Pharmacophore mapping
Pharmacophore mapping
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Protein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsProtein protein interaction, functional proteomics
Protein protein interaction, functional proteomics
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
High throughput screening
High throughput screeningHigh throughput screening
High throughput screening
 
Ligand based drug design
Ligand based drug designLigand based drug design
Ligand based drug design
 

Similar to Molecular and data visualization in drug discovery

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
Personalized medicine
Personalized medicinePersonalized medicine
Personalized medicine
cancerdrg
 
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionA Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
Gerald Lushington
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 

Similar to Molecular and data visualization in drug discovery (20)

Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
 
Proteins in 3D, NMC 2009
Proteins in 3D, NMC 2009Proteins in 3D, NMC 2009
Proteins in 3D, NMC 2009
 
acs talk open source drug discovery
acs talk open source drug discoveryacs talk open source drug discovery
acs talk open source drug discovery
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
BioIT Drug induced liver injury talk 2011
BioIT Drug induced liver injury talk 2011BioIT Drug induced liver injury talk 2011
BioIT Drug induced liver injury talk 2011
 
Friend harvard 2013-01-30
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Personalized medicine
Personalized medicinePersonalized medicine
Personalized medicine
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Talk at Yale University April 26th 2011: Applying Computational Models for To...
Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...
Talk at Yale University April 26th 2011: Applying Computational Models for To...
 
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of ActionA Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
A Biclustering Method for Rationalizing Chemical Biology Mechanisms of Action
 
Myers CV_2015
Myers CV_2015Myers CV_2015
Myers CV_2015
 
Patient-Organized Genomic Research Studies
Patient-Organized Genomic Research StudiesPatient-Organized Genomic Research Studies
Patient-Organized Genomic Research Studies
 
Ontologies for big data
Ontologies for big dataOntologies for big data
Ontologies for big data
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational Toxicology
 
Integrative bioinformatics analysis of Parkinson's disease related omics data
Integrative bioinformatics analysis of Parkinson's disease related omics dataIntegrative bioinformatics analysis of Parkinson's disease related omics data
Integrative bioinformatics analysis of Parkinson's disease related omics data
 
Generation and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeGeneration and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology Phenotype
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 

Recently uploaded

Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
Sérgio Sacani
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
Sérgio Sacani
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
Sérgio Sacani
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Recently uploaded (20)

Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
Extensive Pollution of Uranus and Neptune’s Atmospheres by Upsweep of Icy Mat...
 
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
Constraints on Neutrino Natal Kicks from Black-Hole Binary VFTS 243
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
Virulence Analysis of Citrus canker caused by Xanthomonas axonopodis pv. citr...
 
The solar dynamo begins near the surface
The solar dynamo begins near the surfaceThe solar dynamo begins near the surface
The solar dynamo begins near the surface
 
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
Emergent ribozyme behaviors in oxychlorine brines indicate a unique niche for...
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
Erythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C KalyanErythropoiesis- Dr.E. Muralinath-C Kalyan
Erythropoiesis- Dr.E. Muralinath-C Kalyan
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
mixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategymixotrophy in cyanobacteria: a dual nutritional strategy
mixotrophy in cyanobacteria: a dual nutritional strategy
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
B lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and ActivationB lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and Activation
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 

Molecular and data visualization in drug discovery

  • 1. Molecular and Data Visualization in Drug Discovery Deepak Bandyopadhyay GlaxoSmithKline
  • 2. Intro: Human Body & Disease Biology • From Wikipedia: – Abnormal condition that affects part or all of an organism. – Associated with specific symptoms and signs. • Causes: – Single cause, e.g. pathogen, poison, nutrient deficiency, genetics – Multiple factors including environment, lifestyle, genetics http://www.biologyguide.net/biol1/1_disease.htm Mycobacterium tuberculosis Chest X-ray showing lung cancer
  • 3. Drug Discovery Parts/Timeline Focus of Drug Discovery • Narrow down on one or a few substances to test in humans and develop into a drug that treats a disease Components: Target Selection and Validation genome protein link to disease disease genetics pathology biological target In Vitro BiologyMedicinal Chemistry (Lead Optimization) Lead Discovery (a.k.a. Screening) In Vivo Biology
  • 4. Molecular and Data Visualization • The two parts of my job at GSK!  • Molecules: – small (drugs/peptides) and large (proteins/DNA/RNA/lipids) – visualized in 1D (SMILES), 2D (structure), 3D (coords / conformations), 4D (Mol. Dynamics) • Data: – Format: numeric / text, continuous / categorical, Delimited/database/XML/proprietary – Source: instruments, manual entry, calculation – About drug discovery projects (key: molecule ID), genomics/proteomics (key: gene/protein ID), clinical studies (key: anon. patient ID), … Ibuprofen DRUG PROTEIN EGFR Ball and stick EGFR ribbons
  • 5. Movie: Introduction to Drug Design By Schrödinger (molecular modeling software company): https://www.youtube.com/watch?v=u49k72rUdyc
  • 6. Bioactivity 101 • Concentration-Response curve and IC50 • Structure Activity Relationship (SAR) pIC50 = -log IC50 IC50 = 12.8 uM (micromolar) pIC50 = 6-log10IC50 = 4.89 Think Avogadro, pH…
  • 7. Molecular Visualization Deconstructed • Representations • Navigation • Interaction • What would you add? Aspirin (ligand) Cox-1 (protein) Binding pocket surface polar +ve charge hydrophobic -ve charge XY translate, Z zoom Rotate about X/Y or Z E.g. in program MOE  F1 F2 F3 Save/restore scenes Select Hide/Show Center Prev/Next Scene Expand Sel. Import/Export Align Compute…
  • 8. Purposes of Molecule Visualization • Understand and rationalize “SAR” in 3D • (Protein) Structure-Based Drug Design. E.g.: – Aspirin Binds COX1/2, Celebrex binds COX2 only • Clearly illustrate biological systems / processes • What other tasks can you think of?
  • 9. Case study 1: Protein-Protein Interactions HIV-1 coat protein gp120 bound to antibody 17b (Light, Heavy) and CD4 gp120/CD4 interfacegp120/antibody L/H interface Rank color:  >  >  >  >  >  >  Ban, Y. E. A., Edelsbrunner, H., & Rudolph, J. (2006). Interface surfaces for protein-protein complexes. J. ACM, 53(3), 361-378.
  • 10. Case-Study 2: Molecular Dynamics Simulation of a drug entering into the binding site of a target protein Decherchi et al., Nature Comms. 6(6155), 2015. https://www.youtube.com/watch?v=ckTqh50r_2w
  • 11. From Molecules to Data Mol spreadsheets, visualizations StarDrop Glowing Molecules™ image from http://www.asteris-app.com/technical-info.htm Hybrid molecule/data visualization
  • 12. Software Systems: Spotfire • Feature set / distinguishing factors: – Handling large datasets via filtering and memory management – Tabular file (CSV, Excel) or database input – Multiple, configurable visualization types – Easy enough for domain experts to use / share – Life science add-ons • Molecule depiction • Specialized –omics packages Binned pIC50 trellised by HBA and HBDpIC50 vs. % inh
  • 13. Software Systems: LiveDesign • Consolidate multiple disconnected tools for molecule design – Integrated Single Platform – Intuitive UI – 2D, 3D, Data & Visuals – Social aspect
  • 14. Dimensions, dimensions… • Molecules: 1D (SMILES e.g. c1ccccc1), 2D (depiction), 3D (coords), 4D (motion) • Data: – 100s of activities, measured and predicted properties per row (compound) – ~100K for gene expression, clinical trial data – Millions for –omics, next-gen sequencing – Then there’s systems biology… • Dimensionality reduction is a key capability – PCA, SOM, Stochastic Proximity Embedding,…
  • 15. Challenges / Types of Visualization • Key capabilities for data visualization – Large data  human comprehension – High-level summary + drill-down – Quickly (auto?) isolate interesting data points http://guides.library.duke.edu/datavis/vis_types map SOM Parallel coords Heat mapprotein Volume rendering http://flagshipbio.com/amino-acid-structure-properties-using-self-organizing-maps/ Radar plot Box Plot Sunburst 2D 3D nD hierarchical Dendro- gram Network/Graph layout Wikipedia
  • 16. All the Data at Once: Vlaaivis T. J. Howe, G. Mahieu, P. Marichal,T. Tabruyn and P. Vugts. Data reduction and representation in drug discovery. Drug Discovery Today 12(1/2):45-53 Jan 2007 R
  • 17. All the Data at Once (cont’d): Radar Plots • Circular histogram for viewing multi-parameter results The influence of the 'organizationalfactor'on compoundquality in drug discovery Paul D. Leeson & Stephen A. St-Gallay Nature Reviews Drug Discovery 10, 749-765 (October 2011) Property differences are scaled to either +1, whereby the company with a positive ('best') property value had the highest magnitude, or −1, whereby the company with the lowest ('worst') value had the highest magnitude.
  • 18. Visualizing Large Datasets P. Ertl & B. Rohde, J. Cheminformatics 4(12), 2012 Gaspar et al. J. Chem. Inf. Model., 2015, 55 (1), pp 84–94 Network-like similarity graph Bajorath et al. • Dimensionality reduction • Graph layout • Activity landscape • Probabilistic property plots • Scaffold abstraction Steven Muchmore, Abbott Labs (now Abbvie) Molecule cloud MolecularProperty 1  MolecularProperty2 Probabilityofsuccess (crossingcellmembrane)
  • 19. SAR Tables • SAR: Structure-Activity Relationship – Split molecule: core/scaffold, pendant R-groups – SAR Table: molecule spreadsheet with R-groups and Activity Data (-OH) (-COOH)
  • 20. SAR Maps - R1 vs. R2 on a Core Selectiveforprotein1pIC502‒pIC501Selectiveforprotein2 R1 R2 Core “scaffold”: D. K. Agrafiotis et al. SAR Maps:  A New SAR Visualization Technique for Medicinal Chemists. J. Med. Chem., 2007, 50 (24), 5926–5937.
  • 21. Clustering • Based on chemical descriptors, biological activity, etc… • Agglomerative or hierarchical Hoek, Keith S. et al.: Metastatic potential of melanomas defined by specific gene expression profiles with no BRAF signature. Pigment Cell Research 19 (4), 290-302 http://chemmine.ucr.edu/help/ Molecules Genes
  • 22. Limitations of Clustering Molecule  single cluster, can be limiting seals (fur) ? singleton ? ducks (bill) ? penguins (flipper) ? Cluster 3 Cluster 10 similar molecules ≠ same cluster Many singletons Complete Link Cluster ID ClusterSize
  • 23. Automatic Decomposition into (All) Overlapping Scaffolds Malarial parasite assay pIC50 8.1 … 49 total … 226 total 2 total Molecule Scaffold(s) Related Molecules
  • 24. 8.2 Avg pIC50 8.15 Avg pIC50 7.8 Avg pIC50 7.8 Next Step: Combine with Activities and Properties … 49 total … 226 total 2 total 8.5 8.2 8.0 7.5 7.7 8.5 7.4 7.9 7.7 8.2 Molecule Scaffold(s) Annotation Related Molecules
  • 25. Case Study: Linking Molecules By Scaffolds • Use aggregate properties for decision making • Find related molecules with improved properties  Improving property 1 Improvingactivity2 Aggregate (scaffold) ↓ Drill down (8 molecules) Improving activity 3  Improvingproperty4  >  Keep top half of molecule, substitute bottom half Example 1 Example 2
  • 26. Summary and Lessons Learned • Drug discovery has specialized types of data that are best understood by visualization • Good visualizations can support the making of good decisions (and the converse: GIGO…) • The human element is important – visuals and analytics should be creatable/usable by scientists • As new visual analytics experts, consider careers in an industry where you can add value and be creative – Subtle plug for drug discovery 
  • 27. Future Directions and Challenges in Data Visualization for Drug Discovery • Human vs. Machine or Human + Machine ? • Automate tediousness of data prep/integration • Intuitiveness by design • Interconnection by design • Integration of latest visualization techniques developed for other domains • Using emerging media eg. VR, Kinect • What can you think of?