RMSD: routine measure stirs doubts
Jeremy J. Yang

What about the variance?

Root mean square deviation (RMSD...
Upcoming SlideShare
Loading in …5

RMSD: routine measure stirs doubts


Published on

Poster presented at the 230th National ACS meeting in Washington, D. C., 2005.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

RMSD: routine measure stirs doubts

  1. 1. RMSD: routine measure stirs doubts Jeremy J. Yang Introduction What about the variance? Root mean square deviation (RMSD) measurements between molecule conformations are routine and widespread in scientific literature. It appears widely accepted that to measure the difference or sameness between conformers, that is, a distance in "conformation space", a single RMSD calculation is sufficient. Is the current widespread use of RMSD justified, and if so how? This study examines the uses and limitations of RMSD, and some alternative or supplementary measures of distance between conformers. Definition of RMSD The root mean square deviation can be calculated for any two equal sized vectors: Fundamental RMSD problem: the tasks of interest require an assessment of a geometric relationship which cannot always be summarized well by one scalar measurement. NCI 130813 RMSD = RMSD=4.47 Variance=11.34 Maxd=12.33 ShapeTanimoto=0.88 Example: 2 conformations of NCI 130813 Shape similarity (+): intuitive, rigorous, physical, fast using OE Shape Toolkit (-): shape alone ignores chemistry Median distance (+): can reveal substructure equality (-): ignores outliers A large substructure may be perfectly aligned despite large overall RMSD. The variance among distances and/or the maximum distance can reveal this. N 1 2 ∑(xi − yi ) N i =1 Mean distance (+): most intuitive (-): no unique analytic minimum (-): no variance weighting Methods for further study: RMSD reduces N comparisons to a single scalar measure. € In the realm of molecular geometry, RMSD is used to compare two sets of atomic coordinates. xi and yi represent two 3D positions of the ith atom. The set of atoms may comprise a complete molecule or substructure. The coordinates may exist in a defined reference frame (e.g. protein receptor site) in which case each geometry is said to represent a pose of the molecule. Or, coordinates may only have relative meaning and thus comprise a conformer for the molecule. In that case, the RMSD is normally minimized by finding the optimal alignment of the conformers. Minimized RMSD For any two conformers, an alignment exists for which RMSD is minimized. This alignment can be determined analytically1. This alignment can be the means or the end. Many modelling tasks require geometrical alignment. Symmetry - a critical detail Calculating the correct min-RMSD requires the optimal auto-isomorphism for cases of molecular symmetry, an implementation detail which requires rigorous chemoinformatics to avoid errors. For what tasks is RMSD used? •  Compare two conformations. •  Compare two poses of the same or different conformations. •  Compare the coordinates of substructures. •  Measure the quality of a computed model vs. reference data (X-ray crystallographic or NMR). •  Measure the diversity of an ensemble of conformers or poses. •  Characterize and compare ensembles of conformations. Advantages of RMSD •  Easily calculated. •  Unique, analytic minimum1. •  Metric property2 (triangle inequality satisfied), thus more intuitive measure of “conformational space”. •  Emphasizes variance (relative to ordinary mean) Variance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis Max distance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis RMST (torsion) (+): not size dependent (+): fast, no minimization (-): ignores rings Low correlation reflects information missed by RMSD. Big RMSDs less informative Where “big” is dependent on molecule size. Hence while RMSD can define a conformation “space” for a single molecule, its scale is dependent on molecule size and other graph-topological descriptors, thus its use in describing heterogeneous databases is hampered. N-atoms is an arbitrary descriptor of size. Other descriptors investigated: Nrotors, mol-length (3D), enhanced Wiener index, Randic coefficient (branching). Pharmacophore RMSD Instead of all atoms, use pharmacophore points (-): requires expertise (-): not readily automated (+): involves expertise Uncolored graph RMSD Like shape similarity, indicates some chemistry-suppressed geometrical equivalence. 3D max common substruct 3D match criteria needed (-): high computational cost Conclusions RMSD is a useful and convenient measure but has limitations which can lead to errors and oversights. At minimum, investigators should be alert to cases where RMSD should be supplemented by other measures. •  In some cases RMSD is insufficient. •  In general larger RMSDs warrant closer inspection. •  As molecule size increases, RMSD range increases, and descriptive power decreases. •  Several other measures are available to help characterize geometric relationships not well handled by RMSD. •  Low correlations indicate RMSD does not reveal information provided by these alternative methods for geometry comparison. •  Conformer discrimination tests should include a variance or max atomdistance test in addition to RMSD. References and Notes New methods yield new information4 Test molecule: benzylpenicillin (41 atoms, 28 conformations5) Also tested: dopamine (22 atoms, 7 conformations), methotrexate (54 atoms, 290 conformations) benzylpenicillin ! RMST, centrality-weighted Weight = subtree ratio (+): lever-arm effect considered (+): still fast Increased correlation confirms that centrality-weighting works. Weights could be squared to increase effect. RMST, all torsions including rings Could also be centrality weighted (+): more comprehensive than straight RMST dopamine ! methotrexate ! Distance matrix RMSD (+): no minimization, fast (-): less intuitive 1.  "A solution for the best rotation to relate two sets of vectors", Wolfgang Kabsch, Acta Cryst. (1976), A32, p922-923. 2.  "Metric properties of the root-mean-square deviation of vector sets", Kaindl K, Steipe B., Acta Crystallogr, 1997, A53, p809. 3.  "Efficient RMSD measures for the comparison of two molecular ensembles", Rafael Brüschweile, Proteins: Structure, Function, and Genetics, Volume 50, Issue 1, 2003, p26-34. 4.  All algorithms implemented using OEChem (OpenEye Scientific Software). 5.  All conformations generated with Omega (OpenEye Scientific Software). 6.  Shape similarity calculated using ROCS (OpenEye Scientific Software). 3600 Cerrillos Road Suite 1107 Santa Fe, New Mexico 87507 505.473.7385 info@eyesopen.com www.eyesopen.com