More Related Content Similar to CAESAR II:The Combination of Direct Geometry Method and CAESAR Algorithm for Super Fast Conformational Search (20) CAESAR II:The Combination of Direct Geometry Method and CAESAR Algorithm for Super Fast Conformational Search1. CAESAR II:
The Combination of Direct Geometry
Method and CAESAR Algorithm for Super
Fast Conformational Search
Jiabo Li, Ph.D.
ACS Meeting, March 21-25, 2010 , San Francisco
2. Product Roadmap Disclaimer
• This presentation and/or any related documents contains statements
regarding our plans or expectations for future features, enhancements
or functionalities of current or future products (collectively
"Enhancements"). Our plans or expectations are subject to change at
any time at our discretion. Accordingly, Accelrys is making no
representation, undertaking no commitment or legal obligation to
create, develop or license any product or Enhancements. The
presentation, documents or any related statements are not intended
to, nor shall, create any legal obligation upon Accelrys, and shall not be
relied upon in purchasing any product. Any such obligation shall only
result from a written agreement executed by both parties. In addition,
information disclosed in this presentation and related documents,
whether oral or written, is confidential or proprietary information of
Accelrys. It shall be used only for the purpose of furthering our
business relationship, and shall not be disclosed to third parties.
© 2009 Accelrys, Inc. 2
3. Outline
• Original CAESAR: very efficient torsion search
– Recursive partition of a molecule
– Recursive build-up of conformations
– Fast energy screening for bad clashes
– New strategy for eliminating TopSym duplicates
• CAESAR II: Deal with various geometric constraints
– New algorithm for 3D structure generation for ring systems
– Enforce stereo-chemistry (chirality, cis-trans, stereogenic)
– Ring conformation library and access of ring conformations
• Validation of CAESAR II: Speed, Quality, Diversity
– Quality (binding conformations)
– Diversity: 3D pharmacophore space, distribution of radius of gyration
– Test on macrocyles
© 2009 Accelrys, Inc. 3
4. Conformation Sampling is important
•The Conformationimportant in many
3D conformation search is Search Problem
applications
– How a new compound binds to a protein is usually unknown
– 3D pharmacophore modeling
– Docking
• Efficient search is challenging !
– Most drug molecule is flexible in their conformations
– Low energy conformation space is high dimensional and highly
irregular in shape.
– Stereo-chemistry and ring closure constraints
• CAESAR is an efficient search algorithm
© 2009 Accelrys, Inc. 4
5. CAESAR: Conformation Algorithm base on Energy
Screening And Recursive Buildup
1. Recursively partition 2. Recursively assemble
E A
= E A + E B + E A− B
B
3. Quickly filter out bad clashes 4. New method for eliminating duplicates
© 2009 Accelrys, Inc. 5
6. 3D Search results
Database: build with both Catalyst/FAST and CAESAR for a test set of 50,000 molecules
Table 1. Number of catSearch hits with three 3D pharmacophore queries
Catalyst/FAST CAESAR Common
ang-IIHypo 90 100 73
Hypo2 247 236 215
ang-IIHypoShape 22 21 13
© 2009 Accelrys, Inc. 6
7. Reference of CAESAR I
J. Li, T. Ehlers, J. Sutter, S. Varma-O’Brien, and J. Kirchmair,
CAESAR: A New Conformer Generation Algorithm Based on
Recursive Buildup and Local Rotational Symmetry
Consideration, J. Chem. Inf. Model. 2007, 47, 1923-1932
© 2009 Accelrys, Inc. 7
8. New Developments of CAESAR
• CAESAR II
– Direct Geometry Method for conformation generation of
constraint structures
– New method for stereo chemistry control (chirality, cis-
trans etc).
– New strategy of ring conformation library
© 2009 Accelrys, Inc. 8
9. Traditional method for ring structure
(Distance Geometry)
• – Bound Smoothing (can be very time consuming)
• – Embedding from a distance matrix
• – Optimization of the generated structures
© 2009 Accelrys, Inc. 9
10. Direct Geometry Method
• Direct 3D coordinate modification according to geometric
constraints
• Type of geometric constraints
– Bond length
– Bond angle (co-linear, 180 degree)
– Torsion (co-planar, 180 degree)
– Stereo chemistry (chirality, cis-trans, stereogenic)
– VDW clash
• All types of constraints can be converted into distance
constraints
© 2009 Accelrys, Inc. 10
11. Bond Length
• Bond Length correction
Bond length between C16 and N15 are too long.
Correction: Move the two atoms to each other.
© 2009 Accelrys, Inc. 11
12. Bond Angle
• Bond angle correction
Bond anlge C12-C13-C14 is too small.
Correction: Increase the distance between C12 and C14.
© 2009 Accelrys, Inc. 12
13. Linear bonds
• Linear bond correction
Carbon C2 is off line.
Correction: Move C2 to its correct position.
© 2009 Accelrys, Inc. 13
14. VDW Clash
• Remove VDW clash
Two atoms H3 and H6 are too close..
Correction: Move H3 and H6 from each other.
© 2009 Accelrys, Inc. 14
15. Other types of geometric constraints
can also converted into distance
constraints
• For instance, simple distance constraints does help for
chiral centers. We can use stereo templates with correct
chirality to guide each atom’s move to achieve the
correct geometries. SOS by Zhu and Agrafiotis also had
similar idea.
• If the chirality is unknown, no additional constraints are
needed
© 2009 Accelrys, Inc. 15
16. Put the simple ideas into practice
• Not a single correction can satisfy all the
constraints.
• Correction needs to be done iteratively
• Control of convergence is important
© 2009 Accelrys, Inc. 16
17. Test 1: Diamond structures from random
starting coordinates
© 2009 Accelrys, Inc. 17
20. Timing:Direct Geometry Method VS.
Distance Geometry Method
Table 2: CPU time (second) for generating 3D structures
Molecule Direct Method DG Method Ratio
C60 0.12 1 8
Diamond 0.04 43 1000
© 2009 Accelrys, Inc. 20
22. Test 4: Conformation sampling of macrocycles (Pascal
Bonnet data set). Diversity by fingerprints
Table 3: Number of pharmacophore fingerprints of conformation models of macrocycles
MOL CASERII OMEGA
Number of 3 Number of 4 Number of 3 Number of 4
Num. of Num. of
points points points points
Conf. Conf.
Fingerprints Fingerprints Fingerprints Fingerprints
P1 45 10567 251585 0 0 0
P2 71 8375 236449 76 8948 448917
P3 8 714 4468 157 3550 37078
P4 206 6104 165997 200 6440 109051
P5 11 3735 60097 22 4142 83547
P6 19 10481 221494 49 7326 322580
CD6 25 1207 44184 6 1062 29948
G6 6 250 498 9 131 342
G8 6 648 2476 13 359 1828
G10 6 1374 7503 11 760 5922
G12 6 2354 20640 5 969 8202
G14 6 3735 41996 2 817 6458
G16 6 4784 101998 2 1578 14654
G18 6 7345 165467 3 1154 23631
G20 6 8626 245656 2 1305 26909
SUM-1 433 70299 1570508 557 38541 1119067
SUM-2 388 59732 1318923 557 38541 1119067
Notes: (1) Bin Size 1.5A. All other setting are default in DS 2.1.
(2) D8-D14 molecules failed in fingerprint generations, thus were excluded.
(3) SUM-1: summation of all 15 molecules. SUM-2: P1 excluded.
© 2009 Accelrys, Inc. 22
23. Test 4: Conformation sampling of macrocycles
(Pascal Bonnet data set). Radius of gyration
Figure 1. Distribution of radius gyration of conformations generated
by OMEGA and CAESAR II
© 2009 Accelrys, Inc. 23
24. Distribution of sum of atom-atom distances
Figure 2. Distribution of atom-atom distance summation of
conformations generated by OMEGA and CAESAR II
© 2009 Accelrys, Inc. 24
25. Test 5. Find bioactive conformation with
CAESAR I test dataset
Table 4. RMSD of the best fitting conformation to the bioactive conformations
PDB Ligands CASERII OMEGA
(CAESAR I test data) (maxconf=400) (maxconf=400)
Average 0.96 0.93
RMSD(angstrom)
CPU time (s) 226 4385
RMSD < 0.5 26.6% 22.9%
RMSD < 1.0 61.0% 61.9%
RMSD <1.5 82.1% 87.6%
RMSD <2.0 92.4% 94.4%
*Machine: Intel(R) Xeon(R) CPU E7420 @ 2.13GHz
© 2009 Accelrys, Inc. 25
26. Push efficiency to the new limit: reuse ring
conformations by creating a library
• Scan 6M compounds=> 100,000 ring/rigid structures
• Generation conformations for all rings in the library using the BEST
method
• Build index for the library
• ~100MB file size
© 2009 Accelrys, Inc. 26
27. Retrieve ring conformations from library
efficiently
• Load the index file
• Read in ring conformation from file if it is not cached in the
memory, else just use the ring conformation from memory
• If the ring is not in the library, generate ring conformations on
the fly using the direct method, and save them in the memory
for reuse
© 2009 Accelrys, Inc. 27
28. Speed Test of on-the-fly conformation
generation using CAESAR II
• Test condition
– Data set: CAP2008 database, ~6million compounds
– Max Conformations/compound = 100
– Ring conformation library is pre-generated using Catalyst/BEST
method
– Quad-core CPU, 2.2 GHz, parallel computing
– Without saving conformations in SD file (I/O bottleneck)
• Speed
– It takes 1.5 hours for generating conformations for all 6 million
compounds, or 250 compounds/second/processor
© 2009 Accelrys, Inc. 28
29. Summary
• There are two new technologies in CAESAR II which make the new
algorithm much more robust and efficient than the original CAESAR
– Direct method for 3D structure generation of ring and rigid structures
– New ring/rigid structure library and retrieving method
• The ring conformation generation using Direct Geometry Method is
highly efficient and robust
• The conformer model of ring molecules has good coverage of 3D
phamacophore space.
© 2009 Accelrys, Inc. 29
30. Acknowledgment
• Jon Sutter
• Honglin Li
• Fang Bai
• David Zhang
• Paul Flook
• Frank Brown
© 2009 Accelrys, Inc. 30