SlideShare a Scribd company logo
Chemical Space Exploration
Jan H. Jensen
University of Copenhagen
The game of Go has 10170 possible positions, yet
computers can now beat grandmasters.
Can we use similar approaches for chemistry?
Chemical Space
1060 possible molecules
(1023 stars in the observable universe)
108 molecules made so far
Almost all of chemical
space is unexplored but
how do we search such
a large space?
GAN 2014,
VAE 2013,
RL 2013
VAE
applied to
molecules
Oct. 2016
The Fundamental Challenge
1060 106 100 1
AI?
Recurrent NNs
autocomplete
for molecules
Autoencoders
molecules as
vectors
Genetic Algorithms
Evolving new
molecules
“to be or not to be that is the question”
27 characters and 39 positions
2739 = 6.7 x 1055 possible sentences
Yet a genetic algorithm can consistently find the correct
sentence by considering only 50,000 sentences
How?
A Simple Example from Shakespeare
1055 104 1
to be or not to be that is the question
ll hczcoanysflshfkeoomatsinswqm ld jpzn
pssogzosqrnapy ywuwqakdvrs snibjoqmziwx
ll hczcoanysflshf + keoomatsinswqm ld jpzn
pssogzosqrnapy yw + uwqakdvrs snibjoqmziwx
pssogzosqrnapy ywkeoomatsinswqm ld jpzn
pssogzosqrnapy ywketomatsinswqm ld jpzn
score = 1
score = 1
score = 2
Genetic Algorithm
score = 3
Generate 100 random sequences
Score sequences
Pick pair of sequences based on score
Mate/crossover
Mutate
Score
Mate
Mutate
1-(26/27)39 or 77% of the
6.7 x 1055 possible sequences have
at least one character placed correctly
77% of sequences have score ≥ 1
Sequence Space
path
Maria H.
Rasmussen
Need Additive and Semi-Continuous Scores
Is it possible to find one specific molecule among 1060?
Rediscovery
Score = Tanimoto Similarity
OH
H2N OH
Tanimoto = 0.33
Is it possible to find one specific molecule among 1060?
Rediscovery
Score = Tanimoto Similarity
OH
H2N OH
Tanimoto =
3 in common
9 total
O
HN
O
S
O
O
OH
Can we find Troglitazone?
(55 unique fragments)
CC1=C(O)C(C)=C2CCC(C)(COC3=CC=C(CC4SC(=O)NC4=O)C=C3)OC2=C1C
So what’s the problem?
String can easily be matched with GA, but …
Scoring requires sequence to correspond to real molecule
Most matings/mutations fail, i.e. many fewer paths
*Starting population
Tanimoto score between 0.23 - 0.32
Only one fragment not represented
Rediscovery using SMILES fails, despite a lot of help*
O
HN
O
S
O
O
OH
CC(C + OC = CC(COC
Emilie
Henault
Success Using Graph-Based Methods
Molecules are more
like crossword puzzles
crossover
Chem. Sci. 2019, J. Chem. Inf. Comput. Sci. 2004, JACS 2013
github.com/jensengroup/GB-GA
Emilie
Henault
O
HN
O
S
O
O
OH
O
S
O
N
NF
F
F
NH2
O
S
O
N
S
N
N
Some molecules are harder to find
Missing
fragments
Finding Chromophores using Genetic Algorithms
(molecules absorbing at 300-500 nm are removed from starting population)
(Computed using xTB-STDA//MMFF, population = 20)
score = λ-score + f -score
Chemical Space
Emilie
Henault
Finding Chromophores using Genetic Algorithms
These molecules absorb strongly round 400 nm
Docking using Genetic Algorithms
These molecules have better docking scores than native ligand
(Target = 𝛽2-adrenergic receptor, minimizing Glide htvs_ds score)
(Population = 400, 50 generations, 20 GA searches)
Casper
Steinmann
(Aalborg U)
native ligand
Docking using Genetic Algorithms
native ligand
Casper
Steinmann
(Aalborg U)
Is it possible to find 1 specific molecule among 1060?
Yes, if
the property of interest is cumulative function of structure
most building blocks can be identified beforehand
Because there are any many paths to the molecule
Most properties of interest have many solutions, each with many paths
Chemical Space Chemical Space
path
Future Directions
1060 10x 100 1
How small can we make x?
Smaller x, better scoring function

More Related Content

More from molmodbasics

Can We Automate Computational Studies of Enzymes? Lessons from Small-Molecul...
Can We Automate Computational Studies of Enzymes?  Lessons from Small-Molecul...Can We Automate Computational Studies of Enzymes?  Lessons from Small-Molecul...
Can We Automate Computational Studies of Enzymes? Lessons from Small-Molecul...
molmodbasics
 
Open is Better
Open is BetterOpen is Better
Open is Better
molmodbasics
 
Proteiner du kan regne med
Proteiner du kan regne med Proteiner du kan regne med
Proteiner du kan regne med
molmodbasics
 
Using semiempirical methods for fast and automated predictions
Using semiempirical methods for fast and automated predictionsUsing semiempirical methods for fast and automated predictions
Using semiempirical methods for fast and automated predictions
molmodbasics
 
Jan H. Jensen: profile
Jan H. Jensen: profileJan H. Jensen: profile
Jan H. Jensen: profile
molmodbasics
 
Why I blog
Why I blogWhy I blog
Why I blog
molmodbasics
 
Why I tweet
Why I tweetWhy I tweet
Why I tweet
molmodbasics
 
Can semiempirical methods be used for high throughput screening (for enzyme m...
Can semiempirical methods be used for high throughput screening (for enzyme m...Can semiempirical methods be used for high throughput screening (for enzyme m...
Can semiempirical methods be used for high throughput screening (for enzyme m...
molmodbasics
 
Thermodynamics for Biochemists: a YouTube textbook
Thermodynamics for Biochemists: a YouTube textbookThermodynamics for Biochemists: a YouTube textbook
Thermodynamics for Biochemists: a YouTube textbook
molmodbasics
 
Predicting accurate absolute binding energies in aqueous solution: thermodyn...
Predicting accurate absolute binding energies in aqueous solution: thermodyn...Predicting accurate absolute binding energies in aqueous solution: thermodyn...
Predicting accurate absolute binding energies in aqueous solution: thermodyn...
molmodbasics
 
I lecture nomore
I lecture nomoreI lecture nomore
I lecture nomore
molmodbasics
 
Teaching Tools and Tips
Teaching Tools and TipsTeaching Tools and Tips
Teaching Tools and Tips
molmodbasics
 
Short answer questions on thermodynamics
Short answer questions on thermodynamicsShort answer questions on thermodynamics
Short answer questions on thermodynamics
molmodbasics
 
Different kinds of peer instruction questions for thermodynamics
Different kinds of peer instruction questions for thermodynamicsDifferent kinds of peer instruction questions for thermodynamics
Different kinds of peer instruction questions for thermodynamics
molmodbasics
 
Teaching Tools and Tips
Teaching Tools and TipsTeaching Tools and Tips
Teaching Tools and Tips
molmodbasics
 
Quantum Biochemistry: the rise of semiempirical methods
Quantum Biochemistry: the rise of semiempirical methodsQuantum Biochemistry: the rise of semiempirical methods
Quantum Biochemistry: the rise of semiempirical methods
molmodbasics
 
Peer instruction questions on thermodynamics part 1
Peer instruction questions on thermodynamics part 1Peer instruction questions on thermodynamics part 1
Peer instruction questions on thermodynamics part 1molmodbasics
 
Protein structure determination & refinement using QM-derived chemical shifts
Protein structure determination & refinement using QM-derived chemical shiftsProtein structure determination & refinement using QM-derived chemical shifts
Protein structure determination & refinement using QM-derived chemical shifts
molmodbasics
 
The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...
The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...
The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...
molmodbasics
 
Quantum biochemistry
Quantum biochemistryQuantum biochemistry
Quantum biochemistry
molmodbasics
 

More from molmodbasics (20)

Can We Automate Computational Studies of Enzymes? Lessons from Small-Molecul...
Can We Automate Computational Studies of Enzymes?  Lessons from Small-Molecul...Can We Automate Computational Studies of Enzymes?  Lessons from Small-Molecul...
Can We Automate Computational Studies of Enzymes? Lessons from Small-Molecul...
 
Open is Better
Open is BetterOpen is Better
Open is Better
 
Proteiner du kan regne med
Proteiner du kan regne med Proteiner du kan regne med
Proteiner du kan regne med
 
Using semiempirical methods for fast and automated predictions
Using semiempirical methods for fast and automated predictionsUsing semiempirical methods for fast and automated predictions
Using semiempirical methods for fast and automated predictions
 
Jan H. Jensen: profile
Jan H. Jensen: profileJan H. Jensen: profile
Jan H. Jensen: profile
 
Why I blog
Why I blogWhy I blog
Why I blog
 
Why I tweet
Why I tweetWhy I tweet
Why I tweet
 
Can semiempirical methods be used for high throughput screening (for enzyme m...
Can semiempirical methods be used for high throughput screening (for enzyme m...Can semiempirical methods be used for high throughput screening (for enzyme m...
Can semiempirical methods be used for high throughput screening (for enzyme m...
 
Thermodynamics for Biochemists: a YouTube textbook
Thermodynamics for Biochemists: a YouTube textbookThermodynamics for Biochemists: a YouTube textbook
Thermodynamics for Biochemists: a YouTube textbook
 
Predicting accurate absolute binding energies in aqueous solution: thermodyn...
Predicting accurate absolute binding energies in aqueous solution: thermodyn...Predicting accurate absolute binding energies in aqueous solution: thermodyn...
Predicting accurate absolute binding energies in aqueous solution: thermodyn...
 
I lecture nomore
I lecture nomoreI lecture nomore
I lecture nomore
 
Teaching Tools and Tips
Teaching Tools and TipsTeaching Tools and Tips
Teaching Tools and Tips
 
Short answer questions on thermodynamics
Short answer questions on thermodynamicsShort answer questions on thermodynamics
Short answer questions on thermodynamics
 
Different kinds of peer instruction questions for thermodynamics
Different kinds of peer instruction questions for thermodynamicsDifferent kinds of peer instruction questions for thermodynamics
Different kinds of peer instruction questions for thermodynamics
 
Teaching Tools and Tips
Teaching Tools and TipsTeaching Tools and Tips
Teaching Tools and Tips
 
Quantum Biochemistry: the rise of semiempirical methods
Quantum Biochemistry: the rise of semiempirical methodsQuantum Biochemistry: the rise of semiempirical methods
Quantum Biochemistry: the rise of semiempirical methods
 
Peer instruction questions on thermodynamics part 1
Peer instruction questions on thermodynamics part 1Peer instruction questions on thermodynamics part 1
Peer instruction questions on thermodynamics part 1
 
Protein structure determination & refinement using QM-derived chemical shifts
Protein structure determination & refinement using QM-derived chemical shiftsProtein structure determination & refinement using QM-derived chemical shifts
Protein structure determination & refinement using QM-derived chemical shifts
 
The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...
The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...
The 2013 Bjerrum-Brønsted-Lang Lecture: quantum biochemistry and the rise of ...
 
Quantum biochemistry
Quantum biochemistryQuantum biochemistry
Quantum biochemistry
 

Recently uploaded

SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
OmAle5
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
ShibsekharRoy1
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 

Recently uploaded (20)

SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE  AND ITS BENIFITS.pptxIMPORTANCE OF ALGAE  AND ITS BENIFITS.pptx
IMPORTANCE OF ALGAE AND ITS BENIFITS.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
BIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROIDBIOTRANSFORMATION MECHANISM FOR OF STEROID
BIOTRANSFORMATION MECHANISM FOR OF STEROID
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 

Chemical Space Exploration

  • 1. Chemical Space Exploration Jan H. Jensen University of Copenhagen The game of Go has 10170 possible positions, yet computers can now beat grandmasters. Can we use similar approaches for chemistry? Chemical Space 1060 possible molecules (1023 stars in the observable universe) 108 molecules made so far Almost all of chemical space is unexplored but how do we search such a large space?
  • 2. GAN 2014, VAE 2013, RL 2013 VAE applied to molecules Oct. 2016
  • 3. The Fundamental Challenge 1060 106 100 1 AI? Recurrent NNs autocomplete for molecules Autoencoders molecules as vectors Genetic Algorithms Evolving new molecules
  • 4. “to be or not to be that is the question” 27 characters and 39 positions 2739 = 6.7 x 1055 possible sentences Yet a genetic algorithm can consistently find the correct sentence by considering only 50,000 sentences How? A Simple Example from Shakespeare 1055 104 1
  • 5. to be or not to be that is the question ll hczcoanysflshfkeoomatsinswqm ld jpzn pssogzosqrnapy ywuwqakdvrs snibjoqmziwx ll hczcoanysflshf + keoomatsinswqm ld jpzn pssogzosqrnapy yw + uwqakdvrs snibjoqmziwx pssogzosqrnapy ywkeoomatsinswqm ld jpzn pssogzosqrnapy ywketomatsinswqm ld jpzn score = 1 score = 1 score = 2 Genetic Algorithm score = 3 Generate 100 random sequences Score sequences Pick pair of sequences based on score Mate/crossover Mutate Score Mate Mutate
  • 6. 1-(26/27)39 or 77% of the 6.7 x 1055 possible sequences have at least one character placed correctly 77% of sequences have score ≥ 1 Sequence Space path Maria H. Rasmussen
  • 7. Need Additive and Semi-Continuous Scores
  • 8. Is it possible to find one specific molecule among 1060? Rediscovery Score = Tanimoto Similarity OH H2N OH Tanimoto = 0.33
  • 9. Is it possible to find one specific molecule among 1060? Rediscovery Score = Tanimoto Similarity OH H2N OH Tanimoto = 3 in common 9 total
  • 10. O HN O S O O OH Can we find Troglitazone? (55 unique fragments)
  • 11. CC1=C(O)C(C)=C2CCC(C)(COC3=CC=C(CC4SC(=O)NC4=O)C=C3)OC2=C1C So what’s the problem? String can easily be matched with GA, but … Scoring requires sequence to correspond to real molecule Most matings/mutations fail, i.e. many fewer paths *Starting population Tanimoto score between 0.23 - 0.32 Only one fragment not represented Rediscovery using SMILES fails, despite a lot of help* O HN O S O O OH CC(C + OC = CC(COC Emilie Henault
  • 12. Success Using Graph-Based Methods Molecules are more like crossword puzzles crossover Chem. Sci. 2019, J. Chem. Inf. Comput. Sci. 2004, JACS 2013 github.com/jensengroup/GB-GA Emilie Henault
  • 14. Finding Chromophores using Genetic Algorithms (molecules absorbing at 300-500 nm are removed from starting population) (Computed using xTB-STDA//MMFF, population = 20) score = λ-score + f -score Chemical Space Emilie Henault
  • 15. Finding Chromophores using Genetic Algorithms These molecules absorb strongly round 400 nm
  • 16. Docking using Genetic Algorithms These molecules have better docking scores than native ligand (Target = 𝛽2-adrenergic receptor, minimizing Glide htvs_ds score) (Population = 400, 50 generations, 20 GA searches) Casper Steinmann (Aalborg U) native ligand
  • 17. Docking using Genetic Algorithms native ligand Casper Steinmann (Aalborg U)
  • 18. Is it possible to find 1 specific molecule among 1060? Yes, if the property of interest is cumulative function of structure most building blocks can be identified beforehand Because there are any many paths to the molecule Most properties of interest have many solutions, each with many paths Chemical Space Chemical Space path
  • 19. Future Directions 1060 10x 100 1 How small can we make x? Smaller x, better scoring function