SlideShare a Scribd company logo
1 of 32
Download to read offline
Chemical Similarity – An overview

         Dr. Nina Jeliazkova
          Ideaconsult Ltd.
         www.ideaconsult.net
Similarity : philosophers’ view
• exploiting the similarity concept is a sign of immature science
  (Quine)
• “it is ill defined to say “A is similar to B” and it is only
  meaningful to say “A is similar to B with respect to C”




   A chemical “A” cannot be similar to a chemical “B”
                    in absolute terms
  but only with respect to some measurable key feature

           ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   2
Similarity : chemists’ view
• Intuitively, based on expert judgment
  A chemist would describe “similar” compounds in
    terms of “approximately similar backbone and
    almost the same functional groups”.
• Chemists have different views on similarity
  Experience, context
  Lajiness et al. (2004). Assessment of the Consistency of Medicinal Chemists in
      Reviewing Sets of Compounds, J. Med. Chem., 47(20), 4891-4896.




           ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   3
Chemical similarity
• Computerized similarity assessment needs
  unambiguous definitions
• Structurally similar molecules have similar biological
  activities
   – The basic tenet of chemical similarity
   – Long supporting experience
   – Many exceptions Exceptions are important!
• Identification of the most informative representation of
  molecular structures Avoiding information loss is important!
• Similarity measures
            ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   4
Chemical similarity quantified
• Numerical representation of chemical structure
   –   Structural similarity
   –   Descriptor –based similarity
   –   3D similarity
   –   Field –based
   –   Spectral
   –   Quantum mechanics
   –   More…
• Comparison between numerical representations
   – Distance-like
   – Association,
   – Correlation

              ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   5
Structural similarity
• Substructure searching
• Maximum Common Substructure
• Fragment approach
    – Atom, bond or ring counts, degree of connectivity
    – Atom-centred, bond-centred, ring-centred fragments
    – Fingerprints, molecular holograms, atom environments
• Topological descriptors
    – Hosoya’ Z, Wiener number, Randic index, indices on distance matrices of graph
      (Bonchev & Trinajstic), bonding connectivity indices (Basak), Balaban J indices,
      etc.
    – Initially designed to account for branching, linearity, presence of cycles and other
      topological features
    – Attempts to include 3D information (e.g. distance matrices instead of adjacency
      matrices)
• Molecular eigenvalues (BCUT)


                 ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)       6
Structural similarity
                                          •     Oral LD50 for male rats =            A single group makes difference …
                                                2.5g/kg
                                          •     Dermal LD50 for male rats                          but …
                                                = 3.54g/kg
                                       •        Not irritating to eyes of
3-(2-chloro-4-(trifluoromethyl)phenoxy-)
            phenyl acetate,                     rabbits                              Isosteric replacements of groups
                                          •     Slightly irritating to skin of
          CAS# 50594-77-9
                                                rabbits                              •Substituents:
                                          •     Not mutagenic in                          •F, Cl, Br, I, CF3,NO2
                                                Salmonella strains                        •Methyl,Ethyl, Isoprpyl, Cyclopropyl, t-
                                                                                          Butyl,-OH,-SH,-NH2,-OMe,-N(Me)2
                                          •     Higher potential binding
                                                affinity to the estrogen             •Atoms and groups in rings:
                                                receptor than the
                                                nitrophenyl acetate                       •-CH=,-N=
                                                                                          •-CH2-,-NH-,-O-,-S-

                                                                                     •More …
                                          •     Higher potential
                                                to cause cancer
                                                than the phenyl                      Depends on the endpoint!
                                                acetate
5-(2-chloro-4-(trifluoromethyl)phenoxy)-2-                                           (e.g. lipophilicity, receptor binding,
           nitrophenyl acetate,
            CAS# 50594-44-0
                                                                                     many nice examples in Kubinyi H.
           Walker . J. (2003) ,QSARs for pollution prevention, Toxicity Screening,
                                                                                     Chemical similarity and biological
                   Risk Assessment and Web Applications, SETAC Press                 activities)

                                  ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)                              7
Structural similarity
•   Rosenkranz H.S., Cunningham A.R. (2001)
    Chemical Categories for Health Hazard
    Identification: A feasibility Study,
    Regulatory Toxicology and Pharmacology
    33, 313-318.
•   Examined the reliability of using chemical
    categories to classify HPV chemicals as
    toxic or nontoxic
•   Found: “most often only a proportion of
    chemicals in a category were toxic”
•   Conclusion: "traditional organic
    chemical categories do not encompass
    groups of chemical that are
    predominately either toxic or nontoxic
    across a number of toxicological
    endpoints or even for specific toxic
    activities”                                         The bold portion of the chemical in the Category column defined
                                                         the fragment used to query each data set. Abbreviations: EyI,eye
                                                       irritation;LD50, rat LD50; Dev, developmental toxicity;CA, rodent
                                                       carcinogenesis; Mnt, in vivo induction of micronuclei; Sal, Salmonella
                                                       mutagenesis; MLA, mutagenesis in cultured mouse lymphoma cells.


                  ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)                                         8
3D Similarity
•   Distance-based and angle-based descriptors (e.g. inter-atomic distance)
•   Field similarity (not exhaustive list)
     –   Comparative Molecular Field Analysis (CoMFA), CoMSIA
     –   Electrostatic potential
     –   Shape
     –   Electron density
     –   Test probe
     –   Any grid-based structural property
•   Molecular multi-pole moments (CoMMA)
•   Shape descriptors (not exhaustive list)
     –   van der Waals volume and surface (reflect the size of substituents)
     –   Taft steric parameter
     –   STERIMOL
     –   Molecular Shape Analysis
     –   4D QSAR
     –   WHIM descriptors
•   Receptor binding


                    ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   9
Structurally similar compounds can have very
            different 3D properties




                                                        Kubinyi, H., Chemical Similarity and Biological activity


         ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)                               10
Physicochemical properties
•   Molecular weight
•   Octanol - water partition coefficient
•   Total energy
•   Heat of formation
•   Ionization potential
•   Molar refractivity
•   More…

            ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   11
Quantum chemistry approaches
• The wave function and the density function contain all
  the information of a system.
   – All the information about any molecule could be extracted
     from the electron density. Bond creation and bond breaking
     in chemical reactions, as well as the shape changes in
     conformational processes, are expressed by changes in the
     electronic density of molecules. The electronic density fully
     determines the nuclear distribution, hence the electronic
     density and its changes account for all the relevant chemical
     information about the molecule.
   – In principle, quantum-chemical theory should be able to
     provide precise quantitative descriptions of molecular
     structures and their chemical properties.

            ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   12
Quantum chemistry approaches
• Quantum chemical descriptors - characterize the
  reactivity, shape and binding properties of a complete
  molecule or molecular fragments and substituents:
   HOMO and LUMO energies, total energy, number of filled
    orbitals, standard deviation of partial atomic charges and
    electron densities, dipole moment, partial atomic charges
• Approaches from The Theory of Atom in Molecules –
  BCP space, TAE/RECON, MEDLA, QShAR
  (additive density fragments)
• Quantum chemistry calculations depend on several
  levels of approximation
• Computationally intensive

            ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   13
Reactivity
• Similarity between reactions
• Similarity of chemical structures assessed by
  generalized reaction types and by gross
  structural features. Two structures are
  considered similar if they can be converted by
  reactions belonging to the same predefined
  groups (for example oxidation or substitution
  reactions).

          ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   14
Similarity indices
• Association, correlation, distance coefficients
• Most popular :
   – Tanimoto distance (fingerprints)                   TAB 
                                                                       N AB
                                                                 N A  N B  N AB
   – Euclidean distance (descriptors)
   – Carbo index (fields)         C           AB
                                                      Z AB
                                                     Z AA Z BB



• Essentially a classification problem has to be solved
  (decide if a query compound is closer to one or another
  set of compounds)
   – Many methods available (Discriminant Analysis, Neural
     networks, SVM, Bayesian classification, etc.)
   – Statistical assumptions and statistical error is involved
             ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   15
Similarity indices




                Association indices                                         Correlation indices

J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter-
Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & High
Throughput Screening,5, 155-166 155
                      ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)             16
Fingerprint similarity
                                                        • Information loss – fragments
                                                          presence and absence instead of
                                                          counts
                                                        • Bit string saturation – within a
                                                          large database almost all bits are set
                                                        • Can give nonintuitive results
                                                        • The average similarity appears to
                                                          increase        with the complexity
                                                          of the query compound
                                                        • Larger queries are more
 The distribution of Tanimoto values                      discriminating (flatter curve,
  found in database searches with a                       Tanimoto values spread wider)
      range of query molecules                          • Smaller queries have sharp peak,
                                                          unable to distinguish between
Flower D., On the Properties of             Bit           molecules
String-Based Measures of Chemical Similarity, J.
Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998

                       ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   17
Distance indices
                                                   • Euclidean distance

                                                   • City-block distance

                                                   • Mahalanobis distance



                                                       Distances obey triangle inequality


Equidistant contours = Points on the
equal distance from the query point

                 ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)      18
Similarity in descriptor space




Comparison between a point and groups of points is a classification problem. Euclidean
  distance performs very well if groups are separable (left). Other classification methods
                                    help in other cases.

                  ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)      19
What do we measure
• We compare numerical
  representations of chemical
  compounds
   – The numerical representation
     is not unique
   – The numerical representation
     includes only part of all the
     information about the
     compound
   – A distance measure reflects
     “closeness” only if the data
     holds specific assumptions

            ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   20
Example: Y. Martin et al ( 2002)
Do structurally similar molecules have similar
             biological activity ?
  • Set of 1645 chemicals with IC50s for monoamine
    oxidase inhibition
  • Daylight fingerprints 1024 bits long ( 0-7 bonds)
  • When using Tanimoto coefficient with a cut off value of
    0.85 only 30 % of actives were detected
Cutoff values        % of actives detected                      % False positives

                                                       J. Med. Chem. 2002,45,4350-4358
                ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   21
Chemical similarity caveats
• The similarity computation may not correctly represent
  the intuitive similarity between two chemical structures
   – The properties of a chemical might not be implicit in its
     molecular structure
   – Molecular structure might not be fully measured and
     represented by a set of numbers (information loss)
   – Comparison by similarity indices may be counterintuitive
• Intuitively similar chemical structures may not have
  similar biological activity
   – Bioisosteric compounds
   – Structurally similar molecules may have different mechanisms of action

              ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   22
Similarity and Activity
                “Neighbourhood principle”
• Proximity with respect to          Similar activity
  descriptors does not necessary           values
  mean proximity with respect to the
  activity
• Depends on the relationship
  between descriptor and activity
    – True if a continuous &
      monotonous (e.g. linear,…)
      relationship holds between
      descriptors and activity
    – The linear relationship is only a
      special case, given the complexity
      of biochemical interactions. Its use
      should be justified in every specific
      case and/or used only locally                   Neighbourhood in the
                                                          descriptor space

                 ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   23
Similarity vs. Activity
  Black square: Salmonella mutagenicity of aromatic amines [Debnath et al. 1992] (log TA98)
  Red circle: Glende et al. 2001 set: alkyl-substituted (ortho to the amino function) derivatives not included in
  original Debnath data set




logP, Ehomo, Elumo




                                    Similar compounds, Relatively small data set

                       ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)                        24
Similarity by atom environments vs. logP




     Syracuse Research KOWWin training set, 2400 compounds
                 (diverse compounds, large data set)
      ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   25
Neighbourhood principle (Paterson plot)




The differences between the descriptor values are plotted on X axis, while the
     differences between activity values are plotted on Y axis. For a good
 neighbourhood behaviour, the upper left triangle region should be empty (no
         large differences in activity for small differences in descriptors).

               ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   26
Neighbourhood principle (Paterson plot)




      ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   27
Molecular representation requirements
• Information preserving or allowing only controlled loss
  of information
• Feature selection
   – By domain knowledge (e.g. receptor binding, any knowledge
     of mechanism of action)
   – By verification of the « neighbourhood » assumption
   – By feature selection methods
      • Examples: PCA, Entropy, Gini index, Kullback-Leibler distance, filter
        and wrapper methods
      • Compounds should cluster tightly within a class and be far apart for
        different classes
• Combining different measures (consensus approach)

             ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   28
Structure is not the sole factor for biological
                   activity
• Interactions with
  environment
   –   Solvation effects
   –   Metabolism
   –   Time dependence
   –   More...
• Biological activity in
  different species

              ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   29
Conclusions
• Molecular similarity is relative
• Molecular representation and similarity index have to account for
  the underlying bio-chemistry
• Validation of the similarity formulation and its algorithmic
  solution is essential
   – “Neighbourhood” assumption has to be proved case by case

“As understanding of the chemistry and biology of
  drug action improves and a greater ability to model
  the underlying mechanisms appears, the need for
  ‘similarity’ approaches will diminish.”
   Bender, A.; Glen, R. C. (2004)
     Molecular similarity: a key technique in molecular informatics.
     Org. Biomol. Chem., 2(22), 3204-3218


              ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005)   30
Thank you!
Nikolova N., Jaworska J.,
   Approaches to Measure Chemical
        Similarity - a Review,
QSAR Comb. Sci. 22 (2003) pp.1006-1024

More Related Content

What's hot

Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)Peter Kenny
 
Non-covalent protein-ligand interactions? Easy as Pi
Non-covalent protein-ligand interactions? Easy as PiNon-covalent protein-ligand interactions? Easy as Pi
Non-covalent protein-ligand interactions? Easy as PiDavid Thompson
 
Spectroscopic and chemical techniques for structure elucidation of alkaloids
Spectroscopic and chemical techniques for structure elucidation of alkaloidsSpectroscopic and chemical techniques for structure elucidation of alkaloids
Spectroscopic and chemical techniques for structure elucidation of alkaloidsSidratal Muntaha
 
Purification techniques chromatography
Purification techniques chromatographyPurification techniques chromatography
Purification techniques chromatographyHalavath Ramesh
 
Study of nanostructured perovskite oxide NSMO using CN-gel method.
Study of nanostructured perovskite oxide NSMO using CN-gel method. Study of nanostructured perovskite oxide NSMO using CN-gel method.
Study of nanostructured perovskite oxide NSMO using CN-gel method. Pooja29081984
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Affinity chromatography
Affinity chromatographyAffinity chromatography
Affinity chromatographySIHAS
 
Chiral chromatography & ion pair chromatography
Chiral chromatography & ion pair chromatographyChiral chromatography & ion pair chromatography
Chiral chromatography & ion pair chromatographyHemantBansode2
 
POLYMERS IN CHIRAL SEPARATION
POLYMERS IN CHIRAL SEPARATIONPOLYMERS IN CHIRAL SEPARATION
POLYMERS IN CHIRAL SEPARATIONlavanyak49
 
Spectral studies of pinacyanol chloride in sodium alkyl sulfate
Spectral studies of pinacyanol chloride in sodium alkyl sulfateSpectral studies of pinacyanol chloride in sodium alkyl sulfate
Spectral studies of pinacyanol chloride in sodium alkyl sulfateAlexander Decker
 
Correlation between corrosion inhibitive effect and quantum molecular structu...
Correlation between corrosion inhibitive effect and quantum molecular structu...Correlation between corrosion inhibitive effect and quantum molecular structu...
Correlation between corrosion inhibitive effect and quantum molecular structu...Al Baha University
 
Oral presentation at JCS spring meeting-2014
Oral presentation at JCS spring meeting-2014Oral presentation at JCS spring meeting-2014
Oral presentation at JCS spring meeting-2014Gururaj M. Shivashimpi
 

What's hot (19)

Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)
 
Non-covalent protein-ligand interactions? Easy as Pi
Non-covalent protein-ligand interactions? Easy as PiNon-covalent protein-ligand interactions? Easy as Pi
Non-covalent protein-ligand interactions? Easy as Pi
 
Spectroscopic and chemical techniques for structure elucidation of alkaloids
Spectroscopic and chemical techniques for structure elucidation of alkaloidsSpectroscopic and chemical techniques for structure elucidation of alkaloids
Spectroscopic and chemical techniques for structure elucidation of alkaloids
 
Purification techniques chromatography
Purification techniques chromatographyPurification techniques chromatography
Purification techniques chromatography
 
Campbell6e lecture ch5
Campbell6e lecture ch5Campbell6e lecture ch5
Campbell6e lecture ch5
 
Study of nanostructured perovskite oxide NSMO using CN-gel method.
Study of nanostructured perovskite oxide NSMO using CN-gel method. Study of nanostructured perovskite oxide NSMO using CN-gel method.
Study of nanostructured perovskite oxide NSMO using CN-gel method.
 
D04010021033
D04010021033D04010021033
D04010021033
 
Chromic phenomena
Chromic phenomenaChromic phenomena
Chromic phenomena
 
Poster qamar
Poster qamarPoster qamar
Poster qamar
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Affinity chromatography
Affinity chromatographyAffinity chromatography
Affinity chromatography
 
Characterization of proteins
Characterization of proteinsCharacterization of proteins
Characterization of proteins
 
Chiral chromatography & ion pair chromatography
Chiral chromatography & ion pair chromatographyChiral chromatography & ion pair chromatography
Chiral chromatography & ion pair chromatography
 
POLYMERS IN CHIRAL SEPARATION
POLYMERS IN CHIRAL SEPARATIONPOLYMERS IN CHIRAL SEPARATION
POLYMERS IN CHIRAL SEPARATION
 
Effects of addition of BaTiO3 Nano particles on the conductivity of PVdF/PMMA...
Effects of addition of BaTiO3 Nano particles on the conductivity of PVdF/PMMA...Effects of addition of BaTiO3 Nano particles on the conductivity of PVdF/PMMA...
Effects of addition of BaTiO3 Nano particles on the conductivity of PVdF/PMMA...
 
Spectral studies of pinacyanol chloride in sodium alkyl sulfate
Spectral studies of pinacyanol chloride in sodium alkyl sulfateSpectral studies of pinacyanol chloride in sodium alkyl sulfate
Spectral studies of pinacyanol chloride in sodium alkyl sulfate
 
Correlation between corrosion inhibitive effect and quantum molecular structu...
Correlation between corrosion inhibitive effect and quantum molecular structu...Correlation between corrosion inhibitive effect and quantum molecular structu...
Correlation between corrosion inhibitive effect and quantum molecular structu...
 
Msc chiral chromatography
Msc chiral chromatographyMsc chiral chromatography
Msc chiral chromatography
 
Oral presentation at JCS spring meeting-2014
Oral presentation at JCS spring meeting-2014Oral presentation at JCS spring meeting-2014
Oral presentation at JCS spring meeting-2014
 

Viewers also liked

Hierarchical multi-label classification of ToxCast datasets
Hierarchical multi-label classification of ToxCast datasetsHierarchical multi-label classification of ToxCast datasets
Hierarchical multi-label classification of ToxCast datasetsNina Jeliazkova
 
Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...Nina Jeliazkova
 
Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.Nina Jeliazkova
 
ISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overviewISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overviewNina Jeliazkova
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryNina Jeliazkova
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
Making the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformMaking the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformNina Jeliazkova
 

Viewers also liked (7)

Hierarchical multi-label classification of ToxCast datasets
Hierarchical multi-label classification of ToxCast datasetsHierarchical multi-label classification of ToxCast datasets
Hierarchical multi-label classification of ToxCast datasets
 
Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...Seamless and uniform access to chemical data and tools experience gained in d...
Seamless and uniform access to chemical data and tools experience gained in d...
 
Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.Linking the silos. Data and predictive models integration in toxicology.
Linking the silos. Data and predictive models integration in toxicology.
 
ISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overviewISA-TAB and ISA-TAB-Nano overview
ISA-TAB and ISA-TAB-Nano overview
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and query
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Making the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platformMaking the data available with AMBIT cheminformatics platform
Making the data available with AMBIT cheminformatics platform
 

Similar to Chemical similarity

Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformaticsBenjamin Bucior
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure DeterminationAmjad Ibrahim
 
Davey l1 macromolec-struc-anlys(1) lec 1
Davey l1 macromolec-struc-anlys(1) lec 1Davey l1 macromolec-struc-anlys(1) lec 1
Davey l1 macromolec-struc-anlys(1) lec 1RANJANI001
 
MDC Connects: Proteins, structures and how to get them
MDC Connects: Proteins, structures and how to get themMDC Connects: Proteins, structures and how to get them
MDC Connects: Proteins, structures and how to get themMedicines Discovery Catapult
 
Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)Peter Kenny
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screeningDeependra Ban
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomicssonam786
 
CASTEP Software.pptx
CASTEP Software.pptxCASTEP Software.pptx
CASTEP Software.pptxMohsan10
 
Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer
Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer
Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer Farhat Ansari
 
Mass Spectrometry.ppt
Mass Spectrometry.pptMass Spectrometry.ppt
Mass Spectrometry.pptBangaluru
 
Lecture6 100717171815-phpapp01
Lecture6 100717171815-phpapp01Lecture6 100717171815-phpapp01
Lecture6 100717171815-phpapp01Cleophas Rwemera
 

Similar to Chemical similarity (20)

Assignment 105B.pptx
Assignment 105B.pptxAssignment 105B.pptx
Assignment 105B.pptx
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
Resume
ResumeResume
Resume
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 
Davey l1 macromolec-struc-anlys(1) lec 1
Davey l1 macromolec-struc-anlys(1) lec 1Davey l1 macromolec-struc-anlys(1) lec 1
Davey l1 macromolec-struc-anlys(1) lec 1
 
MDC Connects: Proteins, structures and how to get them
MDC Connects: Proteins, structures and how to get themMDC Connects: Proteins, structures and how to get them
MDC Connects: Proteins, structures and how to get them
 
Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
 
INBRE 4
INBRE 4INBRE 4
INBRE 4
 
Descriptors
DescriptorsDescriptors
Descriptors
 
F.y.b. pharm syllbus
F.y.b. pharm syllbusF.y.b. pharm syllbus
F.y.b. pharm syllbus
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
59 330-l2-ms1
59 330-l2-ms159 330-l2-ms1
59 330-l2-ms1
 
CASTEP Software.pptx
CASTEP Software.pptxCASTEP Software.pptx
CASTEP Software.pptx
 
Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer
Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer
Unit II: Polymer, Basic Concepts, Classification, Uses of Polymer
 
Mass Spectrometry.ppt
Mass Spectrometry.pptMass Spectrometry.ppt
Mass Spectrometry.ppt
 
Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017Food metabolomics Arapitsas 2017
Food metabolomics Arapitsas 2017
 
Lecture6 100717171815-phpapp01
Lecture6 100717171815-phpapp01Lecture6 100717171815-phpapp01
Lecture6 100717171815-phpapp01
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 

Chemical similarity

  • 1. Chemical Similarity – An overview Dr. Nina Jeliazkova Ideaconsult Ltd. www.ideaconsult.net
  • 2. Similarity : philosophers’ view • exploiting the similarity concept is a sign of immature science (Quine) • “it is ill defined to say “A is similar to B” and it is only meaningful to say “A is similar to B with respect to C” A chemical “A” cannot be similar to a chemical “B” in absolute terms but only with respect to some measurable key feature ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 2
  • 3. Similarity : chemists’ view • Intuitively, based on expert judgment A chemist would describe “similar” compounds in terms of “approximately similar backbone and almost the same functional groups”. • Chemists have different views on similarity Experience, context Lajiness et al. (2004). Assessment of the Consistency of Medicinal Chemists in Reviewing Sets of Compounds, J. Med. Chem., 47(20), 4891-4896. ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 3
  • 4. Chemical similarity • Computerized similarity assessment needs unambiguous definitions • Structurally similar molecules have similar biological activities – The basic tenet of chemical similarity – Long supporting experience – Many exceptions Exceptions are important! • Identification of the most informative representation of molecular structures Avoiding information loss is important! • Similarity measures ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 4
  • 5. Chemical similarity quantified • Numerical representation of chemical structure – Structural similarity – Descriptor –based similarity – 3D similarity – Field –based – Spectral – Quantum mechanics – More… • Comparison between numerical representations – Distance-like – Association, – Correlation ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 5
  • 6. Structural similarity • Substructure searching • Maximum Common Substructure • Fragment approach – Atom, bond or ring counts, degree of connectivity – Atom-centred, bond-centred, ring-centred fragments – Fingerprints, molecular holograms, atom environments • Topological descriptors – Hosoya’ Z, Wiener number, Randic index, indices on distance matrices of graph (Bonchev & Trinajstic), bonding connectivity indices (Basak), Balaban J indices, etc. – Initially designed to account for branching, linearity, presence of cycles and other topological features – Attempts to include 3D information (e.g. distance matrices instead of adjacency matrices) • Molecular eigenvalues (BCUT) ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 6
  • 7. Structural similarity • Oral LD50 for male rats = A single group makes difference … 2.5g/kg • Dermal LD50 for male rats but … = 3.54g/kg • Not irritating to eyes of 3-(2-chloro-4-(trifluoromethyl)phenoxy-) phenyl acetate, rabbits Isosteric replacements of groups • Slightly irritating to skin of CAS# 50594-77-9 rabbits •Substituents: • Not mutagenic in •F, Cl, Br, I, CF3,NO2 Salmonella strains •Methyl,Ethyl, Isoprpyl, Cyclopropyl, t- Butyl,-OH,-SH,-NH2,-OMe,-N(Me)2 • Higher potential binding affinity to the estrogen •Atoms and groups in rings: receptor than the nitrophenyl acetate •-CH=,-N= •-CH2-,-NH-,-O-,-S- •More … • Higher potential to cause cancer than the phenyl Depends on the endpoint! acetate 5-(2-chloro-4-(trifluoromethyl)phenoxy)-2- (e.g. lipophilicity, receptor binding, nitrophenyl acetate, CAS# 50594-44-0 many nice examples in Kubinyi H. Walker . J. (2003) ,QSARs for pollution prevention, Toxicity Screening, Chemical similarity and biological Risk Assessment and Web Applications, SETAC Press activities) ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 7
  • 8. Structural similarity • Rosenkranz H.S., Cunningham A.R. (2001) Chemical Categories for Health Hazard Identification: A feasibility Study, Regulatory Toxicology and Pharmacology 33, 313-318. • Examined the reliability of using chemical categories to classify HPV chemicals as toxic or nontoxic • Found: “most often only a proportion of chemicals in a category were toxic” • Conclusion: "traditional organic chemical categories do not encompass groups of chemical that are predominately either toxic or nontoxic across a number of toxicological endpoints or even for specific toxic activities” The bold portion of the chemical in the Category column defined the fragment used to query each data set. Abbreviations: EyI,eye irritation;LD50, rat LD50; Dev, developmental toxicity;CA, rodent carcinogenesis; Mnt, in vivo induction of micronuclei; Sal, Salmonella mutagenesis; MLA, mutagenesis in cultured mouse lymphoma cells. ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 8
  • 9. 3D Similarity • Distance-based and angle-based descriptors (e.g. inter-atomic distance) • Field similarity (not exhaustive list) – Comparative Molecular Field Analysis (CoMFA), CoMSIA – Electrostatic potential – Shape – Electron density – Test probe – Any grid-based structural property • Molecular multi-pole moments (CoMMA) • Shape descriptors (not exhaustive list) – van der Waals volume and surface (reflect the size of substituents) – Taft steric parameter – STERIMOL – Molecular Shape Analysis – 4D QSAR – WHIM descriptors • Receptor binding ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 9
  • 10. Structurally similar compounds can have very different 3D properties Kubinyi, H., Chemical Similarity and Biological activity ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 10
  • 11. Physicochemical properties • Molecular weight • Octanol - water partition coefficient • Total energy • Heat of formation • Ionization potential • Molar refractivity • More… ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 11
  • 12. Quantum chemistry approaches • The wave function and the density function contain all the information of a system. – All the information about any molecule could be extracted from the electron density. Bond creation and bond breaking in chemical reactions, as well as the shape changes in conformational processes, are expressed by changes in the electronic density of molecules. The electronic density fully determines the nuclear distribution, hence the electronic density and its changes account for all the relevant chemical information about the molecule. – In principle, quantum-chemical theory should be able to provide precise quantitative descriptions of molecular structures and their chemical properties. ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 12
  • 13. Quantum chemistry approaches • Quantum chemical descriptors - characterize the reactivity, shape and binding properties of a complete molecule or molecular fragments and substituents: HOMO and LUMO energies, total energy, number of filled orbitals, standard deviation of partial atomic charges and electron densities, dipole moment, partial atomic charges • Approaches from The Theory of Atom in Molecules – BCP space, TAE/RECON, MEDLA, QShAR (additive density fragments) • Quantum chemistry calculations depend on several levels of approximation • Computationally intensive ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 13
  • 14. Reactivity • Similarity between reactions • Similarity of chemical structures assessed by generalized reaction types and by gross structural features. Two structures are considered similar if they can be converted by reactions belonging to the same predefined groups (for example oxidation or substitution reactions). ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 14
  • 15. Similarity indices • Association, correlation, distance coefficients • Most popular : – Tanimoto distance (fingerprints) TAB  N AB N A  N B  N AB – Euclidean distance (descriptors) – Carbo index (fields) C  AB Z AB Z AA Z BB • Essentially a classification problem has to be solved (decide if a query compound is closer to one or another set of compounds) – Many methods available (Discriminant Analysis, Neural networks, SVM, Bayesian classification, etc.) – Statistical assumptions and statistical error is involved ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 15
  • 16. Similarity indices Association indices Correlation indices J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter- Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & High Throughput Screening,5, 155-166 155 ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 16
  • 17. Fingerprint similarity • Information loss – fragments presence and absence instead of counts • Bit string saturation – within a large database almost all bits are set • Can give nonintuitive results • The average similarity appears to increase with the complexity of the query compound • Larger queries are more The distribution of Tanimoto values discriminating (flatter curve, found in database searches with a Tanimoto values spread wider) range of query molecules • Smaller queries have sharp peak, unable to distinguish between Flower D., On the Properties of Bit molecules String-Based Measures of Chemical Similarity, J. Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998 ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 17
  • 18. Distance indices • Euclidean distance • City-block distance • Mahalanobis distance Distances obey triangle inequality Equidistant contours = Points on the equal distance from the query point ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 18
  • 19. Similarity in descriptor space Comparison between a point and groups of points is a classification problem. Euclidean distance performs very well if groups are separable (left). Other classification methods help in other cases. ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 19
  • 20. What do we measure • We compare numerical representations of chemical compounds – The numerical representation is not unique – The numerical representation includes only part of all the information about the compound – A distance measure reflects “closeness” only if the data holds specific assumptions ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 20
  • 21. Example: Y. Martin et al ( 2002) Do structurally similar molecules have similar biological activity ? • Set of 1645 chemicals with IC50s for monoamine oxidase inhibition • Daylight fingerprints 1024 bits long ( 0-7 bonds) • When using Tanimoto coefficient with a cut off value of 0.85 only 30 % of actives were detected Cutoff values % of actives detected % False positives J. Med. Chem. 2002,45,4350-4358 ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 21
  • 22. Chemical similarity caveats • The similarity computation may not correctly represent the intuitive similarity between two chemical structures – The properties of a chemical might not be implicit in its molecular structure – Molecular structure might not be fully measured and represented by a set of numbers (information loss) – Comparison by similarity indices may be counterintuitive • Intuitively similar chemical structures may not have similar biological activity – Bioisosteric compounds – Structurally similar molecules may have different mechanisms of action ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 22
  • 23. Similarity and Activity “Neighbourhood principle” • Proximity with respect to Similar activity descriptors does not necessary values mean proximity with respect to the activity • Depends on the relationship between descriptor and activity – True if a continuous & monotonous (e.g. linear,…) relationship holds between descriptors and activity – The linear relationship is only a special case, given the complexity of biochemical interactions. Its use should be justified in every specific case and/or used only locally Neighbourhood in the descriptor space ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 23
  • 24. Similarity vs. Activity Black square: Salmonella mutagenicity of aromatic amines [Debnath et al. 1992] (log TA98) Red circle: Glende et al. 2001 set: alkyl-substituted (ortho to the amino function) derivatives not included in original Debnath data set logP, Ehomo, Elumo Similar compounds, Relatively small data set ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 24
  • 25. Similarity by atom environments vs. logP Syracuse Research KOWWin training set, 2400 compounds (diverse compounds, large data set) ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 25
  • 26. Neighbourhood principle (Paterson plot) The differences between the descriptor values are plotted on X axis, while the differences between activity values are plotted on Y axis. For a good neighbourhood behaviour, the upper left triangle region should be empty (no large differences in activity for small differences in descriptors). ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 26
  • 27. Neighbourhood principle (Paterson plot) ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 27
  • 28. Molecular representation requirements • Information preserving or allowing only controlled loss of information • Feature selection – By domain knowledge (e.g. receptor binding, any knowledge of mechanism of action) – By verification of the « neighbourhood » assumption – By feature selection methods • Examples: PCA, Entropy, Gini index, Kullback-Leibler distance, filter and wrapper methods • Compounds should cluster tightly within a class and be far apart for different classes • Combining different measures (consensus approach) ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 28
  • 29. Structure is not the sole factor for biological activity • Interactions with environment – Solvation effects – Metabolism – Time dependence – More... • Biological activity in different species ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 29
  • 30. Conclusions • Molecular similarity is relative • Molecular representation and similarity index have to account for the underlying bio-chemistry • Validation of the similarity formulation and its algorithmic solution is essential – “Neighbourhood” assumption has to be proved case by case “As understanding of the chemistry and biology of drug action improves and a greater ability to model the underlying mechanisms appears, the need for ‘similarity’ approaches will diminish.” Bender, A.; Glen, R. C. (2004) Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem., 2(22), 3204-3218 ECB Workshop on Chemical Similarity and TTC approaches (7-8 Nov 2005) 30
  • 32. Nikolova N., Jaworska J., Approaches to Measure Chemical Similarity - a Review, QSAR Comb. Sci. 22 (2003) pp.1006-1024