SlideShare a Scribd company logo
1 of 1
Download to read offline
RMSD: routine measure stirs doubts
Jeremy J. Yang
Introduction

What about the variance?

Root mean square deviation (RMSD) measurements between molecule
conformations are routine and widespread in scientific literature. It appears
widely accepted that to measure the difference or sameness between
conformers, that is, a distance in "conformation space", a single RMSD
calculation is sufficient. Is the current widespread use of RMSD justified, and
if so how? This study examines the uses and limitations of RMSD, and some
alternative or supplementary measures of distance between conformers.

Definition of RMSD
The root mean square deviation can be calculated for any two equal sized
vectors:

Fundamental RMSD problem:
the tasks of interest require
an assessment of a geometric
relationship which cannot
always be summarized well by
one scalar measurement.

NCI 130813

RMSD =

RMSD=4.47
Variance=11.34
Maxd=12.33
ShapeTanimoto=0.88

Example:
2 conformations of
NCI 130813

Shape similarity
(+): intuitive, rigorous,
physical, fast using OE
Shape Toolkit
(-): shape alone ignores
chemistry

Median distance
(+): can reveal
substructure equality
(-): ignores outliers

A large substructure may be
perfectly aligned despite large
overall RMSD. The variance
among distances and/or the
maximum distance can reveal
this.

N

1
2
∑(xi − yi )
N i =1

Mean distance
(+): most intuitive
(-): no unique analytic
minimum
(-): no variance weighting

Methods for further study:

RMSD reduces N comparisons to a single scalar measure.

€

In the realm of molecular geometry, RMSD is used to compare two sets of
atomic coordinates. xi and yi represent two 3D positions of the ith atom.
The set of atoms may comprise a complete molecule or substructure. The
coordinates may exist in a defined reference frame (e.g. protein receptor
site) in which case each geometry is said to represent a pose of the
molecule.
Or, coordinates may only have relative meaning and thus
comprise a conformer for the molecule. In that case, the RMSD is normally
minimized by finding the optimal alignment of the conformers.

Minimized RMSD
For any two conformers, an alignment exists for which RMSD is minimized.
This alignment can be determined analytically1. This alignment can be the
means or the end. Many modelling tasks require geometrical alignment.

Symmetry - a critical detail
Calculating the correct min-RMSD requires the optimal auto-isomorphism for
cases of molecular symmetry, an implementation detail which requires
rigorous chemoinformatics to avoid errors.

For what tasks is RMSD used?
•  Compare two conformations.
•  Compare two poses of the same or different conformations.
•  Compare the coordinates of substructures.
•  Measure the quality of a computed model vs. reference data (X-ray
crystallographic or NMR).
•  Measure the diversity of an ensemble of conformers or poses.
•  Characterize and compare ensembles of conformations.

Advantages of RMSD
•  Easily calculated.
•  Unique, analytic minimum1.
•  Metric property2 (triangle inequality satisfied), thus more intuitive measure
of “conformational space”.
•  Emphasizes variance (relative to ordinary mean)

Variance (of RMSD-aligned conformations)
(+): simple, intuitive, provides criterion for further analysis
Max distance (of RMSD-aligned conformations)
(+): simple, intuitive, provides criterion for further analysis

RMST (torsion)
(+): not size dependent
(+): fast, no minimization
(-): ignores rings
Low correlation reflects
information missed by RMSD.

Big RMSDs less informative
Where “big” is dependent on molecule size. Hence while RMSD can define
a conformation “space” for a single molecule, its scale is dependent on
molecule size and other graph-topological descriptors, thus its use in
describing heterogeneous databases is hampered.
N-atoms is an arbitrary
descriptor of size. Other
descriptors investigated: Nrotors, mol-length (3D),
enhanced Wiener index,
Randic coefficient
(branching).

Pharmacophore RMSD
Instead of all atoms, use
pharmacophore points
(-): requires expertise
(-): not readily automated
(+): involves expertise

Uncolored graph RMSD
Like shape similarity, indicates
some chemistry-suppressed
geometrical equivalence.

3D max common substruct
3D match criteria needed
(-): high computational cost

Conclusions
RMSD is a useful and convenient measure but has limitations which can lead to
errors and oversights. At minimum, investigators should be alert to cases
where RMSD should be supplemented by other measures.
•  In some cases RMSD is insufficient.
•  In general larger RMSDs warrant closer inspection.
•  As molecule size increases, RMSD range increases, and descriptive power
decreases.
•  Several other measures are available to help characterize geometric
relationships not well handled by RMSD.
•  Low correlations indicate RMSD does not reveal information provided by
these alternative methods for geometry comparison.
•  Conformer discrimination tests should include a variance or max atomdistance test in addition to RMSD.

References and Notes

New methods yield new information4
Test molecule: benzylpenicillin (41 atoms, 28 conformations5)
Also tested: dopamine (22 atoms, 7 conformations), methotrexate (54
atoms, 290 conformations)

benzylpenicillin
!

RMST, centrality-weighted
Weight = subtree ratio
(+): lever-arm effect
considered
(+): still fast
Increased correlation confirms
that centrality-weighting works.
Weights could be squared to
increase effect.

RMST, all torsions including
rings
Could also be centrality weighted
(+): more comprehensive than
straight RMST

dopamine
!

methotrexate
!

Distance matrix RMSD
(+): no minimization, fast
(-): less intuitive

1.  "A solution for the best rotation to relate two sets of vectors", Wolfgang Kabsch, Acta
Cryst. (1976), A32, p922-923.
2.  "Metric properties of the root-mean-square deviation of vector sets", Kaindl K, Steipe
B., Acta Crystallogr, 1997, A53, p809.
3.  "Efficient RMSD measures for the comparison of two molecular ensembles", Rafael
Brüschweile, Proteins: Structure, Function, and Genetics, Volume 50, Issue 1, 2003,
p26-34.
4.  All algorithms implemented using OEChem (OpenEye Scientific Software).
5.  All conformations generated with Omega (OpenEye Scientific Software).
6.  Shape similarity calculated using ROCS (OpenEye Scientific Software).

3600 Cerrillos Road
Suite 1107
Santa Fe, New Mexico 87507

505.473.7385
info@eyesopen.com
www.eyesopen.com

More Related Content

What's hot

Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision makingKartik Bansal
 
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...Hasnat Israq
 
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION Tanya Singla
 
Strehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic FilterStrehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic FilterIJMER
 
20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)Will Shen
 
Statistical analysis in analytical chemistry
Statistical analysis in analytical chemistryStatistical analysis in analytical chemistry
Statistical analysis in analytical chemistryJethro Masangkay
 
Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Peter Kenny
 
Analytic hierarchy process (AHP)
Analytic hierarchy process (AHP)Analytic hierarchy process (AHP)
Analytic hierarchy process (AHP)Udit Jain
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersionRajaKrishnan M
 
Lecture slides stats1.13.l12.air
Lecture slides stats1.13.l12.airLecture slides stats1.13.l12.air
Lecture slides stats1.13.l12.airatutor_te
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI posterFunan Shi
 

What's hot (16)

Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision making
 
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...
Orthogonal Property of Standard Design/Orthogonality of Design and Factorial ...
 
Dimensional Formulae and Dimensional Equations | Physics
Dimensional Formulae and Dimensional Equations | PhysicsDimensional Formulae and Dimensional Equations | Physics
Dimensional Formulae and Dimensional Equations | Physics
 
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Session 4 plan verification prostate
Session 4 plan verification prostateSession 4 plan verification prostate
Session 4 plan verification prostate
 
Strehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic FilterStrehl Ratio with Higher-Order Parabolic Filter
Strehl Ratio with Higher-Order Parabolic Filter
 
20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)
 
Ahp
AhpAhp
Ahp
 
Statistical analysis in analytical chemistry
Statistical analysis in analytical chemistryStatistical analysis in analytical chemistry
Statistical analysis in analytical chemistry
 
Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC)
 
Analytic hierarchy process (AHP)
Analytic hierarchy process (AHP)Analytic hierarchy process (AHP)
Analytic hierarchy process (AHP)
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Chisquare
ChisquareChisquare
Chisquare
 
Lecture slides stats1.13.l12.air
Lecture slides stats1.13.l12.airLecture slides stats1.13.l12.air
Lecture slides stats1.13.l12.air
 
2015.01.07 - HAI poster
2015.01.07 - HAI poster2015.01.07 - HAI poster
2015.01.07 - HAI poster
 

Similar to RMSD: routine measure stirs doubts

Quantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipQuantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipRaniBhagat1
 
Evolutionary inaccuracy of pairwise structural alignments (slide)
Evolutionary inaccuracy of pairwise structural alignments (slide)Evolutionary inaccuracy of pairwise structural alignments (slide)
Evolutionary inaccuracy of pairwise structural alignments (slide)Nguyen Chien
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchAdrian Olszewski
 
Assessing Error Bound For Dominant Point Detection
Assessing Error Bound For Dominant Point DetectionAssessing Error Bound For Dominant Point Detection
Assessing Error Bound For Dominant Point DetectionCSCJournals
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputationLeonardo Auslender
 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...ijcsity
 
A New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition PerformanceA New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition PerformanceCSCJournals
 
Mixture Regression Model for Incomplete Data
Mixture Regression Model for Incomplete DataMixture Regression Model for Incomplete Data
Mixture Regression Model for Incomplete DataLoc Nguyen
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...ijcisjournal
 
Treatment by alternative methods of regression gas chromatographic retention ...
Treatment by alternative methods of regression gas chromatographic retention ...Treatment by alternative methods of regression gas chromatographic retention ...
Treatment by alternative methods of regression gas chromatographic retention ...ijics
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Peter Kenny
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-completeDr Hemant Sharma
 
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...CSCJournals
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...ijsc
 

Similar to RMSD: routine measure stirs doubts (20)

Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Sem with amos ii
Sem with amos iiSem with amos ii
Sem with amos ii
 
Quantitative Structure Activity Relationship
Quantitative Structure Activity RelationshipQuantitative Structure Activity Relationship
Quantitative Structure Activity Relationship
 
LINEAR ALGEBRA.pptx
LINEAR ALGEBRA.pptxLINEAR ALGEBRA.pptx
LINEAR ALGEBRA.pptx
 
Evolutionary inaccuracy of pairwise structural alignments (slide)
Evolutionary inaccuracy of pairwise structural alignments (slide)Evolutionary inaccuracy of pairwise structural alignments (slide)
Evolutionary inaccuracy of pairwise structural alignments (slide)
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental research
 
SEM
SEMSEM
SEM
 
Assessing Error Bound For Dominant Point Detection
Assessing Error Bound For Dominant Point DetectionAssessing Error Bound For Dominant Point Detection
Assessing Error Bound For Dominant Point Detection
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...Performance analysis of regularized linear regression models for oxazolines a...
Performance analysis of regularized linear regression models for oxazolines a...
 
A New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition PerformanceA New Method Based on MDA to Enhance the Face Recognition Performance
A New Method Based on MDA to Enhance the Face Recognition Performance
 
Mixture Regression Model for Incomplete Data
Mixture Regression Model for Incomplete DataMixture Regression Model for Incomplete Data
Mixture Regression Model for Incomplete Data
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
 
Treatment by alternative methods of regression gas chromatographic retention ...
Treatment by alternative methods of regression gas chromatographic retention ...Treatment by alternative methods of regression gas chromatographic retention ...
Treatment by alternative methods of regression gas chromatographic retention ...
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-complete
 
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
 
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...
 

More from Jeremy Yang

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesJeremy Yang
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APIJeremy Yang
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerJeremy Yang
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterJeremy Yang
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Jeremy Yang
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discoveryJeremy Yang
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingJeremy Yang
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of ComputingJeremy Yang
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds posterJeremy Yang
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryJeremy Yang
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDJeremy Yang
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesJeremy Yang
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...Jeremy Yang
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsJeremy Yang
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
 

More from Jeremy Yang (20)

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles Explorer
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of Computing
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of Computing
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds poster
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discovery
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARD
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applications
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 

RMSD: routine measure stirs doubts

  • 1. RMSD: routine measure stirs doubts Jeremy J. Yang Introduction What about the variance? Root mean square deviation (RMSD) measurements between molecule conformations are routine and widespread in scientific literature. It appears widely accepted that to measure the difference or sameness between conformers, that is, a distance in "conformation space", a single RMSD calculation is sufficient. Is the current widespread use of RMSD justified, and if so how? This study examines the uses and limitations of RMSD, and some alternative or supplementary measures of distance between conformers. Definition of RMSD The root mean square deviation can be calculated for any two equal sized vectors: Fundamental RMSD problem: the tasks of interest require an assessment of a geometric relationship which cannot always be summarized well by one scalar measurement. NCI 130813 RMSD = RMSD=4.47 Variance=11.34 Maxd=12.33 ShapeTanimoto=0.88 Example: 2 conformations of NCI 130813 Shape similarity (+): intuitive, rigorous, physical, fast using OE Shape Toolkit (-): shape alone ignores chemistry Median distance (+): can reveal substructure equality (-): ignores outliers A large substructure may be perfectly aligned despite large overall RMSD. The variance among distances and/or the maximum distance can reveal this. N 1 2 ∑(xi − yi ) N i =1 Mean distance (+): most intuitive (-): no unique analytic minimum (-): no variance weighting Methods for further study: RMSD reduces N comparisons to a single scalar measure. € In the realm of molecular geometry, RMSD is used to compare two sets of atomic coordinates. xi and yi represent two 3D positions of the ith atom. The set of atoms may comprise a complete molecule or substructure. The coordinates may exist in a defined reference frame (e.g. protein receptor site) in which case each geometry is said to represent a pose of the molecule. Or, coordinates may only have relative meaning and thus comprise a conformer for the molecule. In that case, the RMSD is normally minimized by finding the optimal alignment of the conformers. Minimized RMSD For any two conformers, an alignment exists for which RMSD is minimized. This alignment can be determined analytically1. This alignment can be the means or the end. Many modelling tasks require geometrical alignment. Symmetry - a critical detail Calculating the correct min-RMSD requires the optimal auto-isomorphism for cases of molecular symmetry, an implementation detail which requires rigorous chemoinformatics to avoid errors. For what tasks is RMSD used? •  Compare two conformations. •  Compare two poses of the same or different conformations. •  Compare the coordinates of substructures. •  Measure the quality of a computed model vs. reference data (X-ray crystallographic or NMR). •  Measure the diversity of an ensemble of conformers or poses. •  Characterize and compare ensembles of conformations. Advantages of RMSD •  Easily calculated. •  Unique, analytic minimum1. •  Metric property2 (triangle inequality satisfied), thus more intuitive measure of “conformational space”. •  Emphasizes variance (relative to ordinary mean) Variance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis Max distance (of RMSD-aligned conformations) (+): simple, intuitive, provides criterion for further analysis RMST (torsion) (+): not size dependent (+): fast, no minimization (-): ignores rings Low correlation reflects information missed by RMSD. Big RMSDs less informative Where “big” is dependent on molecule size. Hence while RMSD can define a conformation “space” for a single molecule, its scale is dependent on molecule size and other graph-topological descriptors, thus its use in describing heterogeneous databases is hampered. N-atoms is an arbitrary descriptor of size. Other descriptors investigated: Nrotors, mol-length (3D), enhanced Wiener index, Randic coefficient (branching). Pharmacophore RMSD Instead of all atoms, use pharmacophore points (-): requires expertise (-): not readily automated (+): involves expertise Uncolored graph RMSD Like shape similarity, indicates some chemistry-suppressed geometrical equivalence. 3D max common substruct 3D match criteria needed (-): high computational cost Conclusions RMSD is a useful and convenient measure but has limitations which can lead to errors and oversights. At minimum, investigators should be alert to cases where RMSD should be supplemented by other measures. •  In some cases RMSD is insufficient. •  In general larger RMSDs warrant closer inspection. •  As molecule size increases, RMSD range increases, and descriptive power decreases. •  Several other measures are available to help characterize geometric relationships not well handled by RMSD. •  Low correlations indicate RMSD does not reveal information provided by these alternative methods for geometry comparison. •  Conformer discrimination tests should include a variance or max atomdistance test in addition to RMSD. References and Notes New methods yield new information4 Test molecule: benzylpenicillin (41 atoms, 28 conformations5) Also tested: dopamine (22 atoms, 7 conformations), methotrexate (54 atoms, 290 conformations) benzylpenicillin ! RMST, centrality-weighted Weight = subtree ratio (+): lever-arm effect considered (+): still fast Increased correlation confirms that centrality-weighting works. Weights could be squared to increase effect. RMST, all torsions including rings Could also be centrality weighted (+): more comprehensive than straight RMST dopamine ! methotrexate ! Distance matrix RMSD (+): no minimization, fast (-): less intuitive 1.  "A solution for the best rotation to relate two sets of vectors", Wolfgang Kabsch, Acta Cryst. (1976), A32, p922-923. 2.  "Metric properties of the root-mean-square deviation of vector sets", Kaindl K, Steipe B., Acta Crystallogr, 1997, A53, p809. 3.  "Efficient RMSD measures for the comparison of two molecular ensembles", Rafael Brüschweile, Proteins: Structure, Function, and Genetics, Volume 50, Issue 1, 2003, p26-34. 4.  All algorithms implemented using OEChem (OpenEye Scientific Software). 5.  All conformations generated with Omega (OpenEye Scientific Software). 6.  Shape similarity calculated using ROCS (OpenEye Scientific Software). 3600 Cerrillos Road Suite 1107 Santa Fe, New Mexico 87507 505.473.7385 info@eyesopen.com www.eyesopen.com