QMC-based Shape Fingerprints
Upcoming SlideShare
Loading in...5
×
 

QMC-based Shape Fingerprints

on

  • 805 views

Quasi-Monte Carlo integration for the fast and effective generation of molecular shape fingerprints.

Quasi-Monte Carlo integration for the fast and effective generation of molecular shape fingerprints.

ACS San Francisco 2010

Statistics

Views

Total Views
805
Views on SlideShare
788
Embed Views
17

Actions

Likes
0
Downloads
5
Comments
0

2 Embeds 17

http://www.linkedin.com 13
https://www.linkedin.com 4

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

QMC-based Shape Fingerprints QMC-based Shape Fingerprints Presentation Transcript

  • Quasi-Monte Carlo integration for the fast and effective generation of molecular shape fingerprints John D. MacCuish Norah E. MacCuish, Michael Hawrylycz, and Mitch Chapman ACS San Francisco 2010 COMP Drug Discovery john.maccuish@mesaac.com
  • Outline • Why Shape? Why Shape Fingerprints? • Prior Work 1.00 0.90 0.80 0.70 0.60 • quasi-Monte Carlo Integration 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 • Shape Fingerprint Generation • Performance, Examples • Future Work
  • Why Shape? • Shape is a component in binding... • 3D QSAR • similarity searching • compound acquisition (shape diversity) • virtual screening... • Can be confounding -- multiple binding sites, surface binding... • One more tool in the toolbox... • Boost to 2D • Pluses and minuses: Molecular Shape and Medicinal Chemistry: A Perspective, JMedChem, Feb. 2010, Nicholls, et al
  • Why Shape Fingerprints? • Compute shape comparisons efficiently on a large scale • Efficient storage • Simple.
  • Prior Work • Shape as a mixture of spherical gaussians (atoms) (Grant and Pickup...’96) • Fingerprints composed of key shapes, given the above method (Haigh, et al...’05). • Ultrafast Shape Recognition: Ballester, Richards 2007 • ShaEP, Vainio, et al 2009
  • Monte Carlo Integration • Approximate an integral (e.g., a 3D volume) by random sampling • Approximate volume of odd shapes or manifolds that are analytically difficult or have no closed form solution. • Better error convergence than grid sampling (Metropolis).
  • quasi-Monte Carlo Integration (QMC) • In practice, quasi-randomly generated points have best error convergence in low dimensions. • Beats uniform random sampling (e.g., pseudo, dart throwing, etc.) • QMC became popular in early-90s in computer graphics (Shirley, ’91) mid-90s in finance - options/futures pricing (Morokoff, Caflisch, ’95)
  • Approximate Volume quasi-random grid pseudo-random (low discrepancy) P 1.00 1.00 1.00 0.90 0.90 0.90 0.80 0.80 0.80 0.70 0.70 0.70 0.60 0.60 0.60 0.50 0.50 0.50 0.40 0.40 0.40 0.30 0.30 0.30 0.20 0.20 0.20 0.10 0.10 0.10 0.00 0.00 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Error related to the Error related to just Error related to just number of points number of points number of points and the dimension
  • Approximation Error • grid -- for error ε, need 1/εd grid points -- error convergence exponential in the dimension • pseudo -- 1⁄√n convergence • quasi -- 1/n convergence in practice in low dimensions • quasi-random fewest points needed for equivalent approximation error.
  • Approximating the Volume of a molecule • E.g., CPK with van der Waals radii -- Set of intersecting spheres • Find a suitable bounding region for molecules • Point sample the bounding region and tally up the points either inside or outside of the atom spheres.
  • Approximating the Volume of a molecule • Bounding Volume times (Points Inside)/(All Points) ~= molecular volume. • Sphere or scalene ellipsoid as a bounding region reduces total overall points.
  • Volume Percent Error 95% below 95% below 4961 low energy 4.5% error 3.6% error conformations 1357 Molecules 100-550 MW flexible, inflexible balloon conformations 95% below 95% below 1-15 conformations 2% error 1.3% error
  • Fingerprint Generation Preliminaries • Need a bounding region that covers the conformational space of the database of small molecule conformations. • Cube? Sphere? Scalene Ellipsoid. • Need orientation and alignment.
  • Fewer points 1.00 0.90 0.80 0.70 0.60 Don’t need these points Don’t need these points 0.50 0.40 0.30 0.20 0.10 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
  • Fingerprint Generation 1. Generate a sample quasi-point set bounding region centered and fixed at (0,0,0); Sphere of ~11 Å radius. This is fixed, done once. 2. Mean center atom centers point set of a conformation to be fingerprinted. 3. Find sample points in the atom spheres (or function or your choice, e.g., solvent accessible surface ...) 4. Find the principal axes for a shape using the sampled points and the atom centers with SVD. Adjust for SVD sign ambiguity (Bro, et al, 2008).
  • Fingerprint Generation 5. Rotate atom centers point set to Principal Axes (PA) 6. Find points in PA configuration and create fingerprint with 4 orientations -- points in molecule ‘1’, points out ‘0’ 7. Subfingerprints, each the length of the number of points.
  • Molecule in a Volume of quasi-Random points
  • Four configurations
  • Alignment Overlay
  • Alignment Overlay
  • Shape Fingerprints • 4 fingerprints per shape. e.g., 10,240 X 4 bits = 5.12 Kbytes. • Number and density of points determines resolution, speed of comparison, and storage size. • Number of bits on is small, typically 3-10% with CPK model. Fingerprints compress significantly.
  • Fingerprint Similarity • Choose 1 subFP of Shape FP A and compare it with all 4 subFPs of Shape FP B. • The largest similarity comparison is chosen. • Bit difference per subFP is small in a given FP
  • Fingerprint Similarity • Inherent slight asymmetry of alignment of point sets of different sizes. • Aligning B to A’s principal axes, vs aligning A to B’s principal axes. • Try this on your favorite method... • Use bit string similarity measure of your choice... Tanimoto, Ochai (Cosine), ..., Tversky,... Baroni-Urbani/Bush...
  • Performance • With ~10K points and including IO: • Fingerprint generation: ~500K conformations per hour • Fingerprint Tanimoto comparisons: 130KX130K matrix in 24 hours • Large scale similarity searching trivial, e.g., 10X1M < 10 minutes • Numbers estimated with 2009 iMac, with an Intel Core i7 2.8GHz processor and 4GB RAM, running Mac OS X 10.6.2; small to large compounds;
  • Space! • 5.12 Kbytes per fingerprint -- ~90% compression • 2 million conformations = 1+ GBs • Space and CPU improving all of the time...
  • FP Experiments Typical all-pairs Tanimoto similarity distribution 1357, 2D 768 MACCS Keys fingerprints Where the action is: Similarity searching, Clustering, etc. above 0.7
  • FP Experiments All-pairs ShapeFingerprints 4961 confs. Where the action is: Similarity searching, Still values above 0.7 Clustering, Alignment, etc. above ~0.65
  • Examples • Multi-conformation set of actives in a secondary screen in ROCK2 from PubChem • Xray ROCK2 ligand • Fingerprint shape cluster • Analyze shape cluster that contains the xray structure with ChemTattoo 3D “Fragment database analysis using molecular shape fingerprints”, ACS San Francisco, Wednesday 8:40 , CINF 106
  • Shape Clustering Example - ROCK2
  • Future Work • Other shape functions • Other low discrepancy point sets in different distributions -- density where it is needed. • Speed (time) and compression (space) • More practical applications
  • Acknowledgments • JMol • Balloon • OpenBabel • R • PubChem • PDB john.maccuish@mesaac.com