Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chemical Space Exploration

657 views

Published on

Slides for talk at AstraZeneca (SE), January 21, 2020

Published in: Science
  • Be the first to comment

  • Be the first to like this

Chemical Space Exploration

  1. 1. Chemical Space Exploration Jan H. Jensen University of Copenhagen The game of Go has 10170 possible positions, yet computers can now beat grandmasters. Can we use similar approaches for chemistry? Chemical Space 1060 possible molecules (1023 stars in the observable universe) 108 molecules made so far Almost all of chemical space is unexplored but how do we search such a large space?
  2. 2. GAN 2014, VAE 2013, RL 2013 VAE applied to molecules Oct. 2016
  3. 3. The Fundamental Challenge 1060 106 100 1 AI? Recurrent NNs autocomplete for molecules Autoencoders molecules as vectors Genetic Algorithms Evolving new molecules
  4. 4. “to be or not to be that is the question” 27 characters and 39 positions 2739 = 6.7 x 1055 possible sentences Yet a genetic algorithm can consistently find the correct sentence by considering only 50,000 sentences How? A Simple Example from Shakespeare 1055 104 1
  5. 5. to be or not to be that is the question ll hczcoanysflshfkeoomatsinswqm ld jpzn pssogzosqrnapy ywuwqakdvrs snibjoqmziwx ll hczcoanysflshf + keoomatsinswqm ld jpzn pssogzosqrnapy yw + uwqakdvrs snibjoqmziwx pssogzosqrnapy ywkeoomatsinswqm ld jpzn pssogzosqrnapy ywketomatsinswqm ld jpzn score = 1 score = 1 score = 2 Genetic Algorithm score = 3 Generate 100 random sequences Score sequences Pick pair of sequences based on score Mate/crossover Mutate Score Mate Mutate
  6. 6. 1-(26/27)39 or 77% of the 6.7 x 1055 possible sequences have at least one character placed correctly 77% of sequences have score ≥ 1 Sequence Space path Maria H. Rasmussen
  7. 7. Need Additive and Semi-Continuous Scores
  8. 8. Is it possible to find one specific molecule among 1060? Rediscovery Score = Tanimoto Similarity OH H2N OH Tanimoto = 0.33
  9. 9. Is it possible to find one specific molecule among 1060? Rediscovery Score = Tanimoto Similarity OH H2N OH Tanimoto = 3 in common 9 total
  10. 10. O HN O S O O OH Can we find Troglitazone? (55 unique fragments)
  11. 11. CC1=C(O)C(C)=C2CCC(C)(COC3=CC=C(CC4SC(=O)NC4=O)C=C3)OC2=C1C So what’s the problem? String can easily be matched with GA, but … Scoring requires sequence to correspond to real molecule Most matings/mutations fail, i.e. many fewer paths *Starting population Tanimoto score between 0.23 - 0.32 Only one fragment not represented Rediscovery using SMILES fails, despite a lot of help* O HN O S O O OH CC(C + OC = CC(COC Emilie Henault
  12. 12. Success Using Graph-Based Methods Molecules are more like crossword puzzles crossover Chem. Sci. 2019, J. Chem. Inf. Comput. Sci. 2004, JACS 2013 github.com/jensengroup/GB-GA Emilie Henault
  13. 13. O HN O S O O OH O S O N NF F F NH2 O S O N S N N Some molecules are harder to find Missing fragments
  14. 14. Finding Chromophores using Genetic Algorithms (molecules absorbing at 300-500 nm are removed from starting population) (Computed using xTB-STDA//MMFF, population = 20) score = λ-score + f -score Chemical Space Emilie Henault
  15. 15. Finding Chromophores using Genetic Algorithms These molecules absorb strongly round 400 nm
  16. 16. Docking using Genetic Algorithms These molecules have better docking scores than native ligand (Target = 𝛽2-adrenergic receptor, minimizing Glide htvs_ds score) (Population = 400, 50 generations, 20 GA searches) Casper Steinmann (Aalborg U) native ligand
  17. 17. Docking using Genetic Algorithms native ligand Casper Steinmann (Aalborg U)
  18. 18. Is it possible to find 1 specific molecule among 1060? Yes, if the property of interest is cumulative function of structure most building blocks can be identified beforehand Because there are any many paths to the molecule Most properties of interest have many solutions, each with many paths Chemical Space Chemical Space path
  19. 19. Future Directions 1060 10x 100 1 How small can we make x? Smaller x, better scoring function

×