Your SlideShare is downloading. ×
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Small Molecules and siRNA: Methods to Explore Bioactivity Data

727

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
727
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Outliers in a cliff prediction model are not as severe since SALI changes more slowly than just activity differences
  • For SALI = 0, had to set log10(SALI) = 0Similar performance if we use SALI and not log10(SALI) at least more % variance is explained. Still fail on most significant cliffs
  • View plates (raw, normalized, adjusted, …)Highlight specific genes, siRNA’sView assay statisticsView pathway membership (via Wikipathways)Linkout to external resources (Entrez, GeneCards, …)Hit selection, follow up (DRC)
  • View plates (raw, normalized, adjusted, …)Highlight specific genes, siRNA’sView assay statisticsView pathway membership (via Wikipathways)Linkout to external resources (Entrez, GeneCards, …)Hit selection, follow up (DRC)
  • * Proscillaridin A was not selected in the 20 compounds for further analysis in the paper* 2 cardiac glycosides in the top 3, target appears to be caspase-3 (activating it). CG inhibition of NF-kb is well known . See PNAS 2005, by Pollard* Trabectidin induces lethal DNA strand breaks and blocks cell cycle in G2 phase
  • PSM* genes code for proteosome subunits – so they likely prevent the ubiquination of the IkBa complex, so that RelA+cp50 cannot be released from the IkBa complex and enter the nucleus
  • Size of node indicates potency – larger is more potentLanatosidec and a have Tc = 1 and hence the edge was not shown (ideally it should be shown)
  • Good confirmation that SEA worksSize of node corresponds to SEA confidence score
  • We consider 41 compounds rather than 55, since a number of them did not have sufficiently confident target predictionsWe then get to 18 compounds since, many of the predicted genes, did not map to an NCI PID pathway
  • Pheontypic difference can arise when PPI’s are involved
  • HPRD subnetwork corresponding to the Qiagen HDG has 6782 genes
  • HPRD subnetwork corresponding to the Qiagen HDG has 6782 genes
  • Transcript

    • 1. Small Molecules and siRNA:Methods to Explore Bioactivity Data
      Rajarshi Guha
      NIH Chemical for Translational Therapeutics
      August 17, 2011
      Pfizer, Groton
    • 2. Background
      Cheminformatics methods
      QSAR, diversity analysis, virtual screening, fragments, polypharmacology, networks
      More recently
      siRNAscreening, high content imaging,combination screening
      Extensive use of machine learning
      All tied together with software development
      Integrate small molecule information & biosystems – systems chemical biology
    • 3. Outline
      Exploring the SAR landscape
      The landscape view of SAR data
      Quantifying SAR landscapes
      Extending an SAR landscape
      Linking small molecule & RNAiHTS
      Overview of the Trans NIH RNAi Screening Initiative
      Infrastructure components
      Linking small molecule & siRNA screens
    • 4. The Landscape View of Structure Activity Datasets
    • 5. Structure Activity Relationships
      Similar molecules will have similar activities
      Small changes in structure will lead to small changes in activity
      One implication is that SAR’s are additive
      This is the basis for QSAR modeling
      Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
    • 6. Structure Activity Landscapes
      Rugged gorges or rolling hills?
      Small structural changes associated with large activity changes represent steep slopes in the landscape
      But traditionally, QSAR assumes gentle slopes
      Machine learning is not very good for special cases
      Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
    • 7. Characterizing the Landscape
      A cliff can be numerically characterized
      Structure Activity Landscape Index (SALI)
      Cliffs are characterized by elements of the matrix with very large values
      Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
    • 8. Visualizing SALI Values
      The SALI graph
      Compounds are nodes
      Nodes i,j are connected if SALI(i,j) > X
      Only display connected nodes
    • 9. What Can We Do With SALI’s?
      SALI characterizes cliffs & non-cliffs
      For a given molecular representation, SALI’s gives us an idea of thesmoothness of the SAR landscape
      Models try and encodethis landscape
      Use the landscape to guidedescriptor or model selection
    • 10. Descriptor Space Smoothness
      Edge count of the SALI graph for varying cutoffs
      Measures smoothness of the descriptor space
      Can reduce this to a single number (AUC)
    • 11. Other Examples
      Instead of fingerprints, we use molecular descriptors
      SALI denominator now uses Euclidean distance
      2D & 3D random descriptor sets
      None are really good
      Too rough, or
      Too flat
      2D
      3D
    • 12. Feature Selection Using SALI
      Surprisingly, exhaustive search of 66,000 4-descriptor combinations did not yield semi-smoothly decreasing curves
      Not entirely clear what type of curve is desirable
    • 13. Measuring Model Quality
      A QSAR model should easily encode the “rolling hills”
      A good model captures the most significantcliffs
      Can be formalized as
      How many of the edge orderings of a SALI graph does the model predict correctly?
      Define S (X ), representing the number of edges correctly predicted for a SALI network at a threshold X
      Repeat for varying X and obtain the SALI curve
    • 14. SALI Curves
    • 15. Model Search Using the SCI
      We’ve used the SALI to retrospectively analyze models
      Can we use SALI to develop models?
      Identify a model that captures the cliffs
      Tricky
      Cliffs are fundamentally outliers
      Optimizing for good SALI values implies overfitting
      Need to trade-off between SALI & generalizability
    • 16. Predicting the Landscape
      Rather than predicting activity directly, we can try to predict the SAR landscape
      Implies that we attempt to directly predict cliffs
      Observations are now pairs of molecules
      A more complex problem
      Choice of features is trickier
      Still face the problem of cliffs as outliers
      Somewhat similar to predicting activity differences
      Scheiber et al, Statistical Analysis and Data Mining, 2009, 2, 115-122
    • 17. Motivation
      Predicting activity cliffs corresponds to extending the SAR landscape
      Identify whether a new molecule will perform better or worse compared to the specific molecules in the dataset
      Can be useful for guiding lead optimization, but not necessarily useful for lead hopping
    • 18. Predicting Cliffs
      Dependent variable are pairwise SALI values, calculated using fingerprints
      Independent variables are molecular descriptors – but considered pairwise
      Absolute difference of descriptor pairs, or
      Geometric mean of descriptor pairs

      Develop a model to correlate pairwise descriptors to pairwise SALI values
    • 19. A Test Case
      We first consider the CavalliCoMFA dataset of 30 molecules with pIC50’s
      Evaluate topological and physicochemical descriptors
      Developed random forest models
      On the original observed values (30 obs)
      On the SALI values (435 observations)
      Cavalli, A. et al, J Med Chem, 2002, 45, 3844-3853
    • 20. Double Counting Structures?
      The dependent and independent variables both encode structure.
      But pretty low correlations between individual pairwisedescriptors and the SALI values
    • 21. Model Summaries
      Original pIC50
      RMSE = 0.97
      SALI, AbsDiff
      RMSE = 1.10
      SALI, GeoMean
      RMSE = 1.04
      All models explain similar % of variance of their respective datasets
      Using geometric mean as the descriptor aggregation function seems to perform best
      SALI models are more robust due to larger size of the dataset
    • 22. Test Case 2
      Considered the Holloway docking dataset, 32 molecules with pIC50’s and Einter
      Similar strategy as before
      Need to transform SALI values
      Descriptors show minimal correlation
      Holloway, M.K. et al, J Med Chem, 1995, 38, 305-317
    • 23. Model Summaries
      Original pIC50
      RMSE = 1.05
      SALI, AbsDiff
      RMSE = 0.48
      SALI, GeoMean
      RMSE = 0.48
      The SALI models perform much poorer in terms of % of variance explained
      Descriptor aggregation method does not seem to have much effect
      The SALI models appear to perform decently on the cliffs – but misses the most significant
    • 24. Model Summaries
      Original pIC50
      RMSE = 1.05
      SALI, AbsDiff
      RMSE = 9.76
      SALI, GeoMean
      RMSE = 10.01
      With untransformed SALI values, models perform similarly in terms of % of variance explained
      The most significant cliffs correspond to stereoisomers
    • 25. Test Case 3
      38 adenosine receptor antagonists with reported Ki values; use 35 for training and 3 for testing
      Random forest model on the SALI values performed reasonable well (RMSE = 7.51, R2=0.62)
      Upper end ofSALI rangeis better predicted
      Kalla, R.V. et al, J. Med. Chem., 2006, 48, 1984-2008
    • 26. Test Case 3
      • The dataset does not containing really big cliffs
      • 27. Generally, performance is poorer for smaller cliffs
      For any given hold out molecule, range of error in SALI prediction is large
      Suggests that some form of domain applicability metric would be useful
    • 28. Model Caveats
      Models based on SALI values are dependent on their being an SAR in the original activity data
      Scrambling results for these models are poorer than the original models but aren’t as random as expected
    • 29. Conclusions
      SALI is the first step in characterizing the SAR landscape
      Allows us to directly analyze the landscape, as opposed to individual molecules
      Being able to predict the landscape could serve as a useful way to extend an SAR landscape
    • 30. Joining the Dots: Integrating High Throughput Small Molecule and RNAi Screens
    • 31. RNAi Facility Mission
      Pathway (Reporter assays, e.g. luciferase, b-lactamase)
      Simple Phenotypes (Viability, cytotoxicity, oxidative stress, etc)
      Perform collaborative genome-wide RNAi screening-based projects with intramural investigators
      Advance the science of RNAi and miRNA screening and informatics via technology development to improve efficiency, reliability, and costs.
      Complex Phenotypes (High-content imaging, cell cycle, translocation, etc)
      Range of Assays
    • 32. RNAi Informatics Infrastructure
    • 33. RNAi Analysis Workflow
      Raw and Processed Data
      GO annotations
      Pathways
      Interactions
      Hit List
      Follow-up
    • 34. RNAi Informatics Toolset
      Local databases (screen data, pathways, interactions, etc).
      Commercial pathway tools.
      Custom software for loading, analysis and visualization.
    • 35. Back End Services
      Currently all computational analysis performed on the backend
      R & Bioconductor code
      Custom R package (ncgcrnai) to support NCGC infrastructure
      Partly derived from cellHTS2
      Supports QC metrics, normalization, adjustments, selections, triage, (static) visualization, reports
      Some Java tools for
      Data loading
      Library and plate registration
    • 36. User Accessible Tools
    • 37. User Accessible Tools
    • 38. RNAi& Small Molecule Screens
      CAGCATGAGTACTACAGGCCA
      TACGGGAACTACCATAATTTA
      What targets mediate activity of siRNA and compound
      Pathway elucidation, identification of interactions
      • Reuse pre-existing MLI data
      • 39. Develop new annotated libraries
      Target ID and validation
      Link RNAi generated pathway peturbations to small molecule activities. Could provide insight into polypharmacology
      • Run parallel RNAi screen
      Goal: Develop systems level view of small molecule activity
    • 40. HTS for NF-κB Antagonists
      NF-κB controls DNA transcription
      Involved in cellular responses to stimuli
      Immune response, memory formation
      Inflammation, cancer, auto-immune diseases
      http://www.genego.com
    • 41. HTS for NF-κB Antagonists
      ME-180 cell line
      Stimulate cells using TNF, leading to NF-κB activation, readout via a β-lactamase reporter
      Identify small molecules and siRNA’s that block the resultant activation
    • 42. Small Molecule HTS Summary
      2,899 FDA-approved compounds screened
      55 compounds retested active
      Which components of the NF-κB pathway do they hit?
      17 molecules have target/pathway information in GeneGO
      Literature searches list a few more
      Most Potent Actives
      Proscillaridin A
      Trabectidin
      Digoxin
      Miller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
    • 43. RNAi HTS Summary
      Qiagen HDG library – 6886 genes, 4 siRNA’s per gene
      A total of 567 genes were knockeddown by 1 or more siRNA’s
      We consider >= 2 as a “reliable” hit
      16 reliable hits
      Added in 66 genes for follow up via triage procedure
    • 44. The Obvious Conclusion
      The active compounds target the 16 hits (at least) from the RNAi screen
      Useful if the RNAi screen was small & focused
      But what if we’re investigating a larger system?
      Is there a way to get more specific?
      Can compound data suggest RNAi non-hits?
    • 45. Small Molecule Targets
      Bortezomib (proteosome inhibitor)
      Some small molecules interact with core components
      Daunorubicin (IκBα inhibitor)
    • 46. Small Molecule Targets
      Montelukast (LDT4 antagonist)
      Others are active against upstream targets
      We also get an idea of off -target effects
    • 47. Compound Networks - Similarity
      Evaluate fingerprint-based similarity matrix for the 55 actives
      Connect pairs that exhibit Tc> 0.7
      Edges are weightedby the Tc value
      Most groupings areobvious
    • 48. A “Dictionary” Based Approach
      Create a small-ish annotated library
      “Seed” compounds
      Use it in parallel small molecule/RNAi screens
      Use a similarity based approach to prioritize larger collections, in terms of anticipated targets
      Currently, we’d use structural similarity
      Diversity of prioritized structures is dependent on the diversity of the annotated library
    • 49. Compound Networks - Targets
      Predict targets for the actives using SEA
      Target based compound network maps nearly identically to the similarity based network
      But depending on the predicted target qualitywe get poor (or no) mappings to the RNAi targeted genes
      Keiser, M.J. et al, Nat. Biotech., 2007, 25, 197-206
    • 50. Gene Networks - Pathways
      Nodes are 1374 HDG genes contained in the NCI PID
      Edge indicates two genes/proteins are involved in the same pathway
      “Good” hits tend to be very highly connected
      Wang, L. et al, BMC Genomics, 2009, 10, 220
    • 51. (Reduced) Gene Networks – Pathways
      Nodes are 526 genes with >= 1 siRNA showing knockdown
      Edge indicates two genes/proteins are involved in the same pathway
    • 52. Pathway Based Integration
      Direct matching of targets is not very useful
      Try and map compounds to siRNA targets if the compounds’ predicted target(s) and siRNA targets are in the same pathway
      Considering 16 reliable hits, we cover 26 pathways
      Predicted compound targets cover 131 pathways
      For 18 out of 41 compounds
      3 RNAi-derived pathways not covered by compound-derived pathways
      Rhodopsin, alternative NFkB, FAS
    • 53. Pathway Based Integration
      Still not completely useful, as it only handled 18 compounds
      Depending on target predictions is probably not a great idea
    • 54. Integration Caveats
      Biggest bottleneck is lack of resolution
      Currently, both small molecule and RNAi data are 1-D
      Active or inactive, high/low signal
      CRC’s for small molecules alleviate this a bit
      High content screens can provide significantly more information and so better resolution
      Data size & feature selection are of concern
    • 55. Integration Caveats
      Compound annotations are key
      Currently working on using ChEMBL data to provide target ‘suggestions’
      More comprehensive pathway data will be required
      RNAi and small molecule inhibition do not always lead to the same phenotype
      Could be indicative of promiscuity
      Could indicate true biological differences
      Weiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744
    • 56. Conclusions
      Building up a wealth of small molecule and RNAi data
      “Standard” analysis of RNAi screens relatively straightforward
      Challenges involve integrating RNAi data with other sources
      Primary bottleneck is dimensionality of the data
      Simple flourescence-based approaches do not provide sufficient resolution
      High-content is required
    • 57. Acknowledgements
      John Van Drie
      Gerry Maggiora
      MicLajiness
      JurgenBajorath
      Scott Martin
      Pinar Tuzmen
      CarleenKlump
      DacTrung Nguyen
      Ruili Huang
      Yuhong Wang
    • 58. CPT Sensitization & “Central” Genes
      Yves Pommier, Nat. Rev. Cancer, 2006.
      TOP1 poisons prevent DNA religation resulting in replication-dependent double strand breaks. Cell activates DNA damage response (e.g. ATR).
    • 59. Screening Protocol
      Screen conducted in the human breast cancer cell line MDA-MB-231. Many variables to optimize including transfection conditions, cell seeding density, assay conditions, and the selection of positive and negative controls.
    • 60. Hit Selection
      Follow-Up Dose Response Analysis
      ATR
      Screen #1
      siNeg
      siATR-A
      siATR-B
      siATR-C
      Viability (%)
      Sensitization Ranked by Log2 Fold Change
      CPT (Log M)
      Screen #2
      MAP3K7IP2
      siNeg
      siMAP3K7IP2-A
      siMAP3K7IP2-B
      siMAP3K7IP2-C
      Viability (%)
      siMAP3K7IP2-D
      Sensitization Ranked by Log2 Fold Change
      CPT (Log M)
      Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.
    • 61. Are These Genes Relevant?
      Some are well known to be CPT-sensitizers
      Consider a HPRD PPI sub-network corresponding to the Qiagen HDG gene set
      How “central” are these selected genes?
      Larger values of betweennessindicate that the node lies onmany shortest paths
      Makes sense - a number of them are stress-related
      But some of them have very lowbetweenness values
    • 62. Are These Genes Relevant?
      Most selected genesare densely connected
      A few are not
      Generally did notreconfirm
      Network metrics could be used to provide confidencein selections

    ×