Promiscuous patterns and perils in PubChem and the MLSCN
RMSD: routine measure stirs doubts
1. RMSD: routine measure stirs doubts
Jeremy J. Yang
Introduction
What about the variance?
Root mean square deviation (RMSD) measurements between molecule
conformations are routine and widespread in scientific literature. It appears
widely accepted that to measure the difference or sameness between
conformers, that is, a distance in "conformation space", a single RMSD
calculation is sufficient. Is the current widespread use of RMSD justified, and
if so how? This study examines the uses and limitations of RMSD, and some
alternative or supplementary measures of distance between conformers.
Definition of RMSD
The root mean square deviation can be calculated for any two equal sized
vectors:
Fundamental RMSD problem:
the tasks of interest require
an assessment of a geometric
relationship which cannot
always be summarized well by
one scalar measurement.
NCI 130813
RMSD =
RMSD=4.47
Variance=11.34
Maxd=12.33
ShapeTanimoto=0.88
Example:
2 conformations of
NCI 130813
Shape similarity
(+): intuitive, rigorous,
physical, fast using OE
Shape Toolkit
(-): shape alone ignores
chemistry
Median distance
(+): can reveal
substructure equality
(-): ignores outliers
A large substructure may be
perfectly aligned despite large
overall RMSD. The variance
among distances and/or the
maximum distance can reveal
this.
N
1
2
∑(xi − yi )
N i =1
Mean distance
(+): most intuitive
(-): no unique analytic
minimum
(-): no variance weighting
Methods for further study:
RMSD reduces N comparisons to a single scalar measure.
€
In the realm of molecular geometry, RMSD is used to compare two sets of
atomic coordinates. xi and yi represent two 3D positions of the ith atom.
The set of atoms may comprise a complete molecule or substructure. The
coordinates may exist in a defined reference frame (e.g. protein receptor
site) in which case each geometry is said to represent a pose of the
molecule.
Or, coordinates may only have relative meaning and thus
comprise a conformer for the molecule. In that case, the RMSD is normally
minimized by finding the optimal alignment of the conformers.
Minimized RMSD
For any two conformers, an alignment exists for which RMSD is minimized.
This alignment can be determined analytically1. This alignment can be the
means or the end. Many modelling tasks require geometrical alignment.
Symmetry - a critical detail
Calculating the correct min-RMSD requires the optimal auto-isomorphism for
cases of molecular symmetry, an implementation detail which requires
rigorous chemoinformatics to avoid errors.
For what tasks is RMSD used?
• Compare two conformations.
• Compare two poses of the same or different conformations.
• Compare the coordinates of substructures.
• Measure the quality of a computed model vs. reference data (X-ray
crystallographic or NMR).
• Measure the diversity of an ensemble of conformers or poses.
• Characterize and compare ensembles of conformations.
Advantages of RMSD
• Easily calculated.
• Unique, analytic minimum1.
• Metric property2 (triangle inequality satisfied), thus more intuitive measure
of “conformational space”.
• Emphasizes variance (relative to ordinary mean)
Variance (of RMSD-aligned conformations)
(+): simple, intuitive, provides criterion for further analysis
Max distance (of RMSD-aligned conformations)
(+): simple, intuitive, provides criterion for further analysis
RMST (torsion)
(+): not size dependent
(+): fast, no minimization
(-): ignores rings
Low correlation reflects
information missed by RMSD.
Big RMSDs less informative
Where “big” is dependent on molecule size. Hence while RMSD can define
a conformation “space” for a single molecule, its scale is dependent on
molecule size and other graph-topological descriptors, thus its use in
describing heterogeneous databases is hampered.
N-atoms is an arbitrary
descriptor of size. Other
descriptors investigated: Nrotors, mol-length (3D),
enhanced Wiener index,
Randic coefficient
(branching).
Pharmacophore RMSD
Instead of all atoms, use
pharmacophore points
(-): requires expertise
(-): not readily automated
(+): involves expertise
Uncolored graph RMSD
Like shape similarity, indicates
some chemistry-suppressed
geometrical equivalence.
3D max common substruct
3D match criteria needed
(-): high computational cost
Conclusions
RMSD is a useful and convenient measure but has limitations which can lead to
errors and oversights. At minimum, investigators should be alert to cases
where RMSD should be supplemented by other measures.
• In some cases RMSD is insufficient.
• In general larger RMSDs warrant closer inspection.
• As molecule size increases, RMSD range increases, and descriptive power
decreases.
• Several other measures are available to help characterize geometric
relationships not well handled by RMSD.
• Low correlations indicate RMSD does not reveal information provided by
these alternative methods for geometry comparison.
• Conformer discrimination tests should include a variance or max atomdistance test in addition to RMSD.
References and Notes
New methods yield new information4
Test molecule: benzylpenicillin (41 atoms, 28 conformations5)
Also tested: dopamine (22 atoms, 7 conformations), methotrexate (54
atoms, 290 conformations)
benzylpenicillin
!
RMST, centrality-weighted
Weight = subtree ratio
(+): lever-arm effect
considered
(+): still fast
Increased correlation confirms
that centrality-weighting works.
Weights could be squared to
increase effect.
RMST, all torsions including
rings
Could also be centrality weighted
(+): more comprehensive than
straight RMST
dopamine
!
methotrexate
!
Distance matrix RMSD
(+): no minimization, fast
(-): less intuitive
1. "A solution for the best rotation to relate two sets of vectors", Wolfgang Kabsch, Acta
Cryst. (1976), A32, p922-923.
2. "Metric properties of the root-mean-square deviation of vector sets", Kaindl K, Steipe
B., Acta Crystallogr, 1997, A53, p809.
3. "Efficient RMSD measures for the comparison of two molecular ensembles", Rafael
Brüschweile, Proteins: Structure, Function, and Genetics, Volume 50, Issue 1, 2003,
p26-34.
4. All algorithms implemented using OEChem (OpenEye Scientific Software).
5. All conformations generated with Omega (OpenEye Scientific Software).
6. Shape similarity calculated using ROCS (OpenEye Scientific Software).
3600 Cerrillos Road
Suite 1107
Santa Fe, New Mexico 87507
505.473.7385
info@eyesopen.com
www.eyesopen.com