Using AI/ML in drug discovery to repurpose new drugs. General cautions about the use of artificial intelligence and general pitfalls and best practices for generating data.
2. Do Not Distribute - Copyright BioTeam, Inc., All Rights Reserved
What is AI?
2020 2
AI means using empirical data to generate an algorithm
that can predict or make decisions on, new data.
Deep Learning
Method where features are not
explicitly outlined
Machine Learning
Methods that improve with
experience via implicit algorithms
Artificial Intelligence
Methods where computers to
make decisions imitating humans
AI
ML
DL
3. Do Not Distribute - Copyright BioTeam, Inc., All Rights Reserved
What is AI?
• It is not a black box
• Results are not fact
• It will probably not replace
traditional methods
• Not difficult to get started
2020 3
AI means using empirical data to generate an algorithm
that can predict or make decisions on, new data.
• It is a method with inputs and
outputs
• Results are mathematical
• It may replace traditional
methods
• A tool
4. When is it appropriate to use AI?
• Now
• You may be using it already
• When time-to-solution matters
• When throughput matters
• When there are many covariates to consider
• When modeling is to difficult and will take time to develop
• When you have lots of data
4
5. What if I don’t know where to start
• Start with an specific problem statement
• Are there aliens out there?
• Can I identify something in a image?
5
6. Tips and Tricks
• Check out Google’s Teachable Machine https://teachablemachine.withgoogle.com
7. Tips and Tricks
• Check out Tensorflow’s Playground http://playground.tensorflow.org
8. Tips and Tricks
• Check out Andrej Karpathy blog http://karpathy.github.io/2019/04/25/recipe/
9. Data tips
9
• Your data will likely
• Have artifacts
• Be incomplete
• Be skewed
• Be biased
• Be wrong
• Be multi-modal
• Be noisy
This is part of AI. Embrace it. Accept now that your data will be bad.
11. Tools for drug discovery
• Molecular Docking aims to find
drugs that fit in areas of an
organism that interfere with
typical function
• It can take minutes to days to
sample a single molecule with
various conformations
• We may not have a good idea of
the target site
11
Source Wikimedia Commons
12. ChemProp
• A deep learning framework for
drug discovery
• Developed by MIT’s CSAIL
• Pulls drugs from the Broad
Repurposing Hub
• Uses Message Passing Neural
Network (MPNN)
• Input features is fairly simple
12
Data encoding for training data
SMILES Activity
COC1=CC(=C(C=C1)OC)C2=C3C=C(C(=O)C=C3OC4=CC(=C(C=C42)O)O)
O
1
COC1=CC(=C(C=C1)/C=N/NC(=O)C2=NN(C(=N2)C3=CC=CC=C3)C4=CC=
CC=C4)O
1
CN1C2=C(C=C(C=C2)NC(=O)CCl)N(C1=O)C 1
CCS(=O)(=O)N1C(CC(=N1)C2=CC(=CC=C2)NS(=O)(=O)C)C3=CC=C(C=C
3)C
0
CCOC1=CC=C(C=C1)NC(=O)CSC2=NN=C(C=C2)C3=CC=CC=N3 1
CCOC1=CC=C(C=C1)CNC(=O)C2CCN(CC2)S(=O)(=O)C3=CC4=C(C=C3)N
C(=O)CCC4
1
CCOC(=O)N1CCN(CC1)S(=O)(=O)C2=CC=C(C=C2)C(=O)NNC3=NC4=C(C
=CC=C4S3)C
1
CCN(CC)S(=O)(=O)C1=CC=CC(=C1)C(=O)N[C@@H](C(C)C)C(=O)NNC(=
O)C2=CC=CC=C2
0
CCN(CC)S(=O)(=O)C1=CC=C(C=C1)S(=O)(=O)N2CCCC2C(=O)O 1
CCN(CC)C1=CC(=C(C=C1)/C=N/NC(=O)C2=CC(=CC=C2)S(=O)(=O)NC3=
CC=CC=C3OC)O
1
CCCN1C=NC2=C1C=C(C(=C2N)C)C 0
CCCN1C(=O)C(SC1=O)CC(=O)NC2=CC=C(C=C2)C 1
CCC(C)NC(=O)C1CCN(CC1)S(=O)(=O)C2=CC=CC3=C2N=CC=C3 0
16. Playing with ChemProp
16
3CLpro Inhibition prediction from SARS-CoV model
Drug Name SMILES Activity Probability
Zafirlukast
Cc1ccccc1S(=O)(=O)NC(=O)c2cc(OC)c(
cc2)Cc3cn(C)c4ccc(cc43)NC(=O)OC5C
CCC5
0.72431216
Montelukast
CC(C)(C1=CC=CC=C1CCC(C2=CC=CC(
=C2)C=CC3=NC4=C(C=CC(=C4)Cl)C=C
3)SCC5(CC5)CC(=O)O)O idasanutlin
0.60056485
Ritonavir
CC(C)C1=NC(=CS1)CN(C)C(=O)NC(C(C
)C)C(=O)NC(CC2=CC=CC=C2)CC(C(CC
3=CC=CC=C3)NC(=O)OCC4=CN=CS4)O
0.51782315
Remdesivir
CCC(CC)COC(=O)C(C)NP(=O)(OCC1C(
C(C(O1)(C#N)C2=CC=C3N2N=CN=C3N)
O)O)OC4=CC=CC=C4
0.46806238
Indinavir
CC(C)(C)NC(=O)C1CN(CCN1CC(CC(CC
2=CC=CC=C2)C(=O)NC3C(CC4=CC=CC
=C34)O)O)CC5=CN=CC=C5
0.42568066
Carfilzomib
CC(C)CC(C(=O)C1(CO1)C)NC(=O)C(CC
2=CC=CC=C2)NC(=O)C(CC(C)C)NC(=O)
C(CCC3=CC=CC=C3)NC(=O)CN4CCOC
C4
0.40163301
17. 17
Disclaimer:
This was an exercise to explore ChemProp, not
SARS-CoV. These results are preliminary at best
and need to be thoroughly explored and peer
reviewed before any conclusions or medically-
relevant actions can be taken. Please note that
the information presented has not been formally
peer reviewed and expresses the opinions of the
BioTeam.
In short: it was a toy example and does not
constitute any medical advice!
Come talk to BioTeam about
your scientific goals