Computational Chemistry:
 From Theory to Practice
     6th December 2007
     David C. Thompson
Overview
 An introduction to computational chemistry
  – Which method, where, and why?


 A novel 3D QM-based descriptor (perhaps?)

 Computational chemistry for drug design
  – Fragment-based de novo design
Some background, and some
         theory
The Problem
       Motivation: Top 20 best-selling drugs
        in America had sales of ~ $65bn in
        2005[1]
       New drug development costs are in
        excess of $800M[2]
       Roughly 10K structures are made and
        tested for every new drug reaching the
        market[3]
[1] The Best-Selling drugs in America, IMS health, 2006
[2] The Tufts Center for the study of drug development
[3] Boston Consulting Group, 2005
The Solution
 Solve the Schrödinger equation:

             H" = E"
 Ψ determines all properties of the
  system


!
Unfortunately…
“The underlying physical laws necessary
for the mathematical theory of a large part of
physics and the whole of chemistry are thus
completely known, and the difficulty is only
that the application of these laws leads to
equations much too complicated to be
soluble.” – P. A. M. Dirac (1929)
The Solution - DFT?
 The electron density, ρ, can be derived from
  Ψ

 And, it turns out that all properties of a system
  can be derived from ρ
   – ρ is a function of 3 variables
   – Ψ is a function of 4N variables

 This is great, right?
   – Sure, but didn’t I tell you? In getting this far, I made a functional which
     contains all of the “confusion”, and I don’t rightly know what it looks like. . .
Accuracy vs. Speed
                    Accuracy

                    Speed
                                              Ph.D

105-6   EMM   104    EHF 102 EDFT 101   E     PD1

                                              PD2

        EDFT can be improved but we need to
        understand the physics of how
        “electrons get along”: Ec=E-EHF
Gas phase water: An example
      A DFT calculation
       takes ~9s

      An “Exact”
       calculation[4] took
       150h, 250Gb of
       memory, and 800Gb
       of disk

[4] G. K.-L. Chan and M. Head-Gordon, J. Chem. Phys. 118, 8551 2003
Gas phase water: An example
 Water has 10 electrons

 The 1A4Q receptor has
  ~104 valence electrons

 A full quantum
  mechanical calculation
  is just not practical
The Hospital that ate my Wife. . .
 Information theoretic properties of a model system:


              Sr = " $ # (r) ln[ #(r)]dr
              S p = " $ % (p) ln[% (p)]dp
              ST = Sr + S p

 Doesn’t Sr look a little familiar?

  !
A novel descriptor?
        Continuous form of a measure used in molecular
         similarity:
                                S = "# pi ln[ pi ]
                                           i
        Could we use Sr as a measure of similarity?

        Moreover, could Sr be a 3D QM-based
           !
         structural descriptor?
            – Literature search has shown that this has not been
              considered before (I think)[5]
[5] M. Karelson, “Quantum-chemical descriptors in QSAR”, in Computational Medicinal
Chemistry for Drug Discovery, P. Bultnick et al, Eds., (New York, Dekker, 2003), pp 641-667
A novel descriptor?

         We want to make this useful
             – But we still have the problem of finding ρ in a
               timely fashion
         Why don’t we approximate ρ?
             – We construct a pro-molecular density from a sum
               of fitted s-Gaussians[6]

            "(r) # " Mol (r) = % "$ (r) = % % c$i exp(&'$i (r & R$ ) 2 )
                                  $             $   i

         Turns out that this isn’t as bad as you might
          think[7]
! P. Constans and R. Carbó, J. Chem. Inf. Sci. 35, 1046 1995
 [6]
 [7] J. I. Rodriguez, D. C. Thompson, and P. W. Ayers Unpublished data
Homebrew quantum mechanics

       All of this has been done on my iMac at home

       Molecular integrations performed using the
        Becke/Lebedev grids in PyQuante[8]

       Co-opted graduate students into doing
        MathCad checks for me. . .



[8] Python Quantum Chemistry - http://pyquante.sourceforge.net/
Homebrew quantum mechanics




        H1   Rz   H2
Homebrew quantum mechanics

  Molecule                 Sr
       H2O               -7.42
       H2S               3.94
     Benzene            -27.09
Cyclohexane (chair)     -35.94

                 Perhaps Sr isn’t that discriminatory?
                 Plan B - Sr (r) = " #(r)ln[ # (r)]
And that might look like. . .
Summary

 Introduced a novel, 3D, quantum
  mechanics based structural descriptor
  – Its utility, if any, will be further examined


 Feedback is encouraged
Some background, and some
         practice
Project involvement
 Detailed analysis of in-house high-throughput virtual
  screening protocol
   − Detailed curation of large data set of protein-ligand
     complexes


 Late-stage discovery project support
   − Lead optimization
   − Lead generation

 Fragment-based de novo design
Fragment-based de novo design:
      The problem at hand
 Search space of new molecular entities is essentially
  infinite
   – The number of chemically feasible, drug like molecules
     ~1060-10100

 Such a large space cannot be searched exhaustively

 De novo design offers a broad exploration of
  chemical space
   – The range of molecules generated is only limited by the
     heuristics of the de novo design program
Ligand Efficiency



            High ligand
            efficiency area                                          #G    RT ln(IC50 )
                                                              LE = "    $"
                                                                      N         N

                                            Low ligand
                                            efficiency area

                                                      !




R. Carr et al., Drug Discov. Today, 10, 987 2005
Project requirements
 Exploit potential gaps in literature

 If possible use in-house chemical equity

 Modular design

 Efficient deployment strategy
De novo design: Link or Grow?



                                                                  LINK




                                                                  GROW



G. Schneider et al., Nature Reviews Drug Discovery, 4, 649 2005
CONFIRM
                                              O         O-                    OH



                                                             d

    A pre-prepared bridge library is
    searched using the atom type of the
    connection points, and the distance
    d as a search query
                                                      Bridge library db

   Bridges that match the search query   d
    are attached to the fragments                                    O   O-        OH




                                                                 +
                                          N
                                                  N
   Complete molecules are prepared
    for docking – enumeration of          …
                                          O

    tautomers, isomers, and ionization
    states                                        N
                                                  H


                                              O         O-                    OH




   Prepared molecules are docked
    into the target binding site
Bridge Libraries
     Bridge library derived from             Application of filters
         corporate database
                                               − Molecular Weight
≤ 3 rot. bonds             ≤ 4 rot. bonds          • <200 MW

                                               − No. of rotatable bonds
     Lib3                      Lib4                • ≤3
                                                   • ≤4

             OMEGA Expansion
                                             Conformational
                                              expansion with OMEGA
     Lib3E                     Lib4E
                                               – 4 bridge libraries
                                                   • Lib3 → Lib3E
                                                   • Lib4 → Lib4E
CONFIRM: Novelty

       Bridges come from molecules within the Wyeth
        CORP database:
           – Bridges obtained “…from a given ring scaffold by removing
             all of the atoms, except acyclic linker atoms, between pairs
             of ring systems, and the anchor atoms on the ring system.”
             [9]

       Similar to CAVEAT[10], however:
           – We do not use orientation of bonds, but location of atoms
             (vector vs. scalar)
           – CAVEAT searches 3D databases looking for suitable
             molecular frameworks to satisfy the vector pairs
               • We already have well defined positions of small molecule
                 binders
[9] R. Nilakantan et al., J. Chem. Inf. Mod. 46(3), 1069-1077 2006
[10] G. Lauri, and P. A. Bartlett, J. Comp.-Aided. Mol. Design 8(1), 51-66 1994
CONFIRM: Test Sets

 Taken from the curated data set of protein-ligand
  complexes
   – High crystallographic resolution ≤ 2.2Å
   – Two well resolved fragment moieties connected via a bridge
   – Both fragments interact with spatially disparate regions of
     the protein
             PDB Ascension                    RMSD/Å
                             Resolution/Å
                 Code                       SP     XP

                 1SRJ           1.80        1.19   0.95

                 1A4Q           1.90        0.27   0.29

                 1YDR           2.20        0.40   0.43

                 1FCZ           1.38        0.30   0.43
CONFIRM: 1SRJ example

-
    O      O                         OH



                 N
                      N

                 Bridge
                     3.7Å
    Fragment 1                Fragment 2




                                           1SRJ X-ray Structure (green carbons)
                                           CONFIRM XP Pose (orange carbons)
CONFIRM: 1A4Q example

                              -O         O



          Bridge              O

                      N
                                             NH2

                          O        HN        O




 Fragment 1
                    5.9Å
                                   Fragment 2




             No. of           No. with             1A4Q X-ray Structure (green carbons)
Library      Unique       Fragment 1 and
              Hits         2 RMSD < 2Å             CONFIRM XP Pose (orange carbons)
 Lib4         274                  84

Lib4E         370                  154
CONFIRM: 1MTU example
      Important for binding – we wish to keep this fragment




Search bridge library for suggestions for bridging atoms
      Use ROCS to search for alternative groups to go here
CONFIRM: 1MTU example
 Search Lib4E with distance query of 5Å
   – 2852 bridges
 Search Lead-like database using ROCS and this
  query:
              X        O




                           N




                           HN




 Use Combo score, only keep top 100

 Use CONFIRM to enumerate, prepare, and dock
CONFIRM: 1MTU example
CONFIRM: 1MTU example
CONFIRM: 1MTU example
Summary
 Following comprehensive literature search, multiple
  algorithms for linking/growing fragments developed
 Final linking approach, dubbed ‘CONFIRM’, uses in-
  house chemical equity
 Modular design, allowed for rapid:
   − Implementation
   − Testing
   − Analysis and modification
 Publication completed, submitted to . . .
 Currently exploring use on drug discovery projects
Acknowledgments
 Computational Chemistry Group at Wyeth
  Research Cambridge
 Dr. Christine Humblet
 Prof. K. D. Sen
 Prof. P. W. Ayers
  – J. S. M. Anderson
  – J. I. Rodriguez

Computational Chemistry: From Theory to Practice

  • 1.
    Computational Chemistry: FromTheory to Practice 6th December 2007 David C. Thompson
  • 2.
    Overview  An introductionto computational chemistry – Which method, where, and why?  A novel 3D QM-based descriptor (perhaps?)  Computational chemistry for drug design – Fragment-based de novo design
  • 3.
  • 4.
    The Problem  Motivation: Top 20 best-selling drugs in America had sales of ~ $65bn in 2005[1]  New drug development costs are in excess of $800M[2]  Roughly 10K structures are made and tested for every new drug reaching the market[3] [1] The Best-Selling drugs in America, IMS health, 2006 [2] The Tufts Center for the study of drug development [3] Boston Consulting Group, 2005
  • 5.
    The Solution  Solvethe Schrödinger equation: H" = E"  Ψ determines all properties of the system !
  • 6.
    Unfortunately… “The underlying physicallaws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the application of these laws leads to equations much too complicated to be soluble.” – P. A. M. Dirac (1929)
  • 7.
    The Solution -DFT?  The electron density, ρ, can be derived from Ψ  And, it turns out that all properties of a system can be derived from ρ – ρ is a function of 3 variables – Ψ is a function of 4N variables  This is great, right? – Sure, but didn’t I tell you? In getting this far, I made a functional which contains all of the “confusion”, and I don’t rightly know what it looks like. . .
  • 8.
    Accuracy vs. Speed Accuracy Speed Ph.D 105-6 EMM 104 EHF 102 EDFT 101 E PD1 PD2 EDFT can be improved but we need to understand the physics of how “electrons get along”: Ec=E-EHF
  • 9.
    Gas phase water:An example  A DFT calculation takes ~9s  An “Exact” calculation[4] took 150h, 250Gb of memory, and 800Gb of disk [4] G. K.-L. Chan and M. Head-Gordon, J. Chem. Phys. 118, 8551 2003
  • 10.
    Gas phase water:An example  Water has 10 electrons  The 1A4Q receptor has ~104 valence electrons  A full quantum mechanical calculation is just not practical
  • 11.
    The Hospital thatate my Wife. . .  Information theoretic properties of a model system: Sr = " $ # (r) ln[ #(r)]dr S p = " $ % (p) ln[% (p)]dp ST = Sr + S p  Doesn’t Sr look a little familiar? !
  • 12.
    A novel descriptor?  Continuous form of a measure used in molecular similarity: S = "# pi ln[ pi ] i  Could we use Sr as a measure of similarity?  Moreover, could Sr be a 3D QM-based ! structural descriptor? – Literature search has shown that this has not been considered before (I think)[5] [5] M. Karelson, “Quantum-chemical descriptors in QSAR”, in Computational Medicinal Chemistry for Drug Discovery, P. Bultnick et al, Eds., (New York, Dekker, 2003), pp 641-667
  • 13.
    A novel descriptor?  We want to make this useful – But we still have the problem of finding ρ in a timely fashion  Why don’t we approximate ρ? – We construct a pro-molecular density from a sum of fitted s-Gaussians[6] "(r) # " Mol (r) = % "$ (r) = % % c$i exp(&'$i (r & R$ ) 2 ) $ $ i  Turns out that this isn’t as bad as you might think[7] ! P. Constans and R. Carbó, J. Chem. Inf. Sci. 35, 1046 1995 [6] [7] J. I. Rodriguez, D. C. Thompson, and P. W. Ayers Unpublished data
  • 14.
    Homebrew quantum mechanics  All of this has been done on my iMac at home  Molecular integrations performed using the Becke/Lebedev grids in PyQuante[8]  Co-opted graduate students into doing MathCad checks for me. . . [8] Python Quantum Chemistry - http://pyquante.sourceforge.net/
  • 15.
  • 16.
    Homebrew quantum mechanics Molecule Sr H2O -7.42 H2S 3.94 Benzene -27.09 Cyclohexane (chair) -35.94 Perhaps Sr isn’t that discriminatory? Plan B - Sr (r) = " #(r)ln[ # (r)]
  • 17.
    And that mightlook like. . .
  • 18.
    Summary  Introduced anovel, 3D, quantum mechanics based structural descriptor – Its utility, if any, will be further examined  Feedback is encouraged
  • 19.
    Some background, andsome practice
  • 20.
    Project involvement  Detailedanalysis of in-house high-throughput virtual screening protocol − Detailed curation of large data set of protein-ligand complexes  Late-stage discovery project support − Lead optimization − Lead generation  Fragment-based de novo design
  • 21.
    Fragment-based de novodesign: The problem at hand  Search space of new molecular entities is essentially infinite – The number of chemically feasible, drug like molecules ~1060-10100  Such a large space cannot be searched exhaustively  De novo design offers a broad exploration of chemical space – The range of molecules generated is only limited by the heuristics of the de novo design program
  • 22.
    Ligand Efficiency High ligand efficiency area #G RT ln(IC50 ) LE = " $" N N Low ligand efficiency area ! R. Carr et al., Drug Discov. Today, 10, 987 2005
  • 23.
    Project requirements  Exploitpotential gaps in literature  If possible use in-house chemical equity  Modular design  Efficient deployment strategy
  • 24.
    De novo design:Link or Grow? LINK GROW G. Schneider et al., Nature Reviews Drug Discovery, 4, 649 2005
  • 25.
    CONFIRM O O- OH d  A pre-prepared bridge library is searched using the atom type of the connection points, and the distance d as a search query Bridge library db  Bridges that match the search query d are attached to the fragments O O- OH + N N  Complete molecules are prepared for docking – enumeration of … O tautomers, isomers, and ionization states N H O O- OH  Prepared molecules are docked into the target binding site
  • 26.
    Bridge Libraries Bridge library derived from  Application of filters corporate database − Molecular Weight ≤ 3 rot. bonds ≤ 4 rot. bonds • <200 MW − No. of rotatable bonds Lib3 Lib4 • ≤3 • ≤4 OMEGA Expansion  Conformational expansion with OMEGA Lib3E Lib4E – 4 bridge libraries • Lib3 → Lib3E • Lib4 → Lib4E
  • 27.
    CONFIRM: Novelty  Bridges come from molecules within the Wyeth CORP database: – Bridges obtained “…from a given ring scaffold by removing all of the atoms, except acyclic linker atoms, between pairs of ring systems, and the anchor atoms on the ring system.” [9]  Similar to CAVEAT[10], however: – We do not use orientation of bonds, but location of atoms (vector vs. scalar) – CAVEAT searches 3D databases looking for suitable molecular frameworks to satisfy the vector pairs • We already have well defined positions of small molecule binders [9] R. Nilakantan et al., J. Chem. Inf. Mod. 46(3), 1069-1077 2006 [10] G. Lauri, and P. A. Bartlett, J. Comp.-Aided. Mol. Design 8(1), 51-66 1994
  • 28.
    CONFIRM: Test Sets Taken from the curated data set of protein-ligand complexes – High crystallographic resolution ≤ 2.2Å – Two well resolved fragment moieties connected via a bridge – Both fragments interact with spatially disparate regions of the protein PDB Ascension RMSD/Å Resolution/Å Code SP XP 1SRJ 1.80 1.19 0.95 1A4Q 1.90 0.27 0.29 1YDR 2.20 0.40 0.43 1FCZ 1.38 0.30 0.43
  • 29.
    CONFIRM: 1SRJ example - O O OH N N Bridge 3.7Å Fragment 1 Fragment 2 1SRJ X-ray Structure (green carbons) CONFIRM XP Pose (orange carbons)
  • 30.
    CONFIRM: 1A4Q example -O O Bridge O N NH2 O HN O Fragment 1 5.9Å Fragment 2 No. of No. with 1A4Q X-ray Structure (green carbons) Library Unique Fragment 1 and Hits 2 RMSD < 2Å CONFIRM XP Pose (orange carbons) Lib4 274 84 Lib4E 370 154
  • 31.
    CONFIRM: 1MTU example Important for binding – we wish to keep this fragment Search bridge library for suggestions for bridging atoms Use ROCS to search for alternative groups to go here
  • 32.
    CONFIRM: 1MTU example Search Lib4E with distance query of 5Å – 2852 bridges  Search Lead-like database using ROCS and this query: X O N HN  Use Combo score, only keep top 100  Use CONFIRM to enumerate, prepare, and dock
  • 33.
  • 34.
  • 35.
  • 36.
    Summary  Following comprehensiveliterature search, multiple algorithms for linking/growing fragments developed  Final linking approach, dubbed ‘CONFIRM’, uses in- house chemical equity  Modular design, allowed for rapid: − Implementation − Testing − Analysis and modification  Publication completed, submitted to . . .  Currently exploring use on drug discovery projects
  • 37.
    Acknowledgments  Computational ChemistryGroup at Wyeth Research Cambridge  Dr. Christine Humblet  Prof. K. D. Sen  Prof. P. W. Ayers – J. S. M. Anderson – J. I. Rodriguez