©2016 BioTeam, Inc. All Rights Reserved.©2020 BioTeam, Inc. All Rights Reserved.
Fernanda Foertter
Sr. Scientific Consultant
With the Power of AI
Comes Great Responsibility
Do Not Distribute - Copyright BioTeam, Inc., All Rights Reserved
What is AI?
2020 2
AI means using empirical data to generate an algorithm
that can predict or make decisions on, new data.
Deep Learning
Method where features are not
explicitly outlined
Machine Learning
Methods that improve with
experience via implicit algorithms
Artificial Intelligence
Methods where computers to
make decisions imitating humans
AI
ML
DL
Do Not Distribute - Copyright BioTeam, Inc., All Rights Reserved
What is AI?
• It is not a black box
• Results are not fact
• It will probably not replace
traditional methods
• Not difficult to get started
2020 3
AI means using empirical data to generate an algorithm
that can predict or make decisions on, new data.
• It is a method with inputs and
outputs
• Results are mathematical
• It may replace traditional
methods
• A tool
When is it appropriate to use AI?
• Now
• You may be using it already
• When time-to-solution matters
• When throughput matters
• When there are many covariates to consider
• When modeling is to difficult and will take time to develop
• When you have lots of data
4
What if I don’t know where to start
• Start with an specific problem statement
• Are there aliens out there?
• Can I identify something in a image?
5
Tips and Tricks
• Check out Google’s Teachable Machine https://teachablemachine.withgoogle.com
Tips and Tricks
• Check out Tensorflow’s Playground http://playground.tensorflow.org
Tips and Tricks
• Check out Andrej Karpathy blog http://karpathy.github.io/2019/04/25/recipe/
Data tips
9
• Your data will likely
• Have artifacts
• Be incomplete
• Be skewed
• Be biased
• Be wrong
• Be multi-modal
• Be noisy
This is part of AI. Embrace it. Accept now that your data will be bad.
Project by Jennifer Hart
Tools for drug discovery
• Molecular Docking aims to find
drugs that fit in areas of an
organism that interfere with
typical function
• It can take minutes to days to
sample a single molecule with
various conformations
• We may not have a good idea of
the target site
11
Source Wikimedia Commons
ChemProp
• A deep learning framework for
drug discovery
• Developed by MIT’s CSAIL
• Pulls drugs from the Broad
Repurposing Hub
• Uses Message Passing Neural
Network (MPNN)
• Input features is fairly simple
12
Data encoding for training data
SMILES Activity
COC1=CC(=C(C=C1)OC)C2=C3C=C(C(=O)C=C3OC4=CC(=C(C=C42)O)O)
O
1
COC1=CC(=C(C=C1)/C=N/NC(=O)C2=NN(C(=N2)C3=CC=CC=C3)C4=CC=
CC=C4)O
1
CN1C2=C(C=C(C=C2)NC(=O)CCl)N(C1=O)C 1
CCS(=O)(=O)N1C(CC(=N1)C2=CC(=CC=C2)NS(=O)(=O)C)C3=CC=C(C=C
3)C
0
CCOC1=CC=C(C=C1)NC(=O)CSC2=NN=C(C=C2)C3=CC=CC=N3 1
CCOC1=CC=C(C=C1)CNC(=O)C2CCN(CC2)S(=O)(=O)C3=CC4=C(C=C3)N
C(=O)CCC4
1
CCOC(=O)N1CCN(CC1)S(=O)(=O)C2=CC=C(C=C2)C(=O)NNC3=NC4=C(C
=CC=C4S3)C
1
CCN(CC)S(=O)(=O)C1=CC=CC(=C1)C(=O)N[C@@H](C(C)C)C(=O)NNC(=
O)C2=CC=CC=C2
0
CCN(CC)S(=O)(=O)C1=CC=C(C=C1)S(=O)(=O)N2CCCC2C(=O)O 1
CCN(CC)C1=CC(=C(C=C1)/C=N/NC(=O)C2=CC(=CC=C2)S(=O)(=O)NC3=
CC=CC=C3OC)O
1
CCCN1C=NC2=C1C=C(C(=C2N)C)C 0
CCCN1C(=O)C(SC1=O)CC(=O)NC2=CC=C(C=C2)C 1
CCC(C)NC(=O)C1CCN(CC1)S(=O)(=O)C2=CC=CC3=C2N=CC=C3 0
13
ChemProp
14
ChemProp
15
Playing with ChemProp
16
3CLpro Inhibition prediction from SARS-CoV model
Drug Name SMILES Activity Probability
Zafirlukast
Cc1ccccc1S(=O)(=O)NC(=O)c2cc(OC)c(
cc2)Cc3cn(C)c4ccc(cc43)NC(=O)OC5C
CCC5
0.72431216
Montelukast
CC(C)(C1=CC=CC=C1CCC(C2=CC=CC(
=C2)C=CC3=NC4=C(C=CC(=C4)Cl)C=C
3)SCC5(CC5)CC(=O)O)O idasanutlin
0.60056485
Ritonavir
CC(C)C1=NC(=CS1)CN(C)C(=O)NC(C(C
)C)C(=O)NC(CC2=CC=CC=C2)CC(C(CC
3=CC=CC=C3)NC(=O)OCC4=CN=CS4)O
0.51782315
Remdesivir
CCC(CC)COC(=O)C(C)NP(=O)(OCC1C(
C(C(O1)(C#N)C2=CC=C3N2N=CN=C3N)
O)O)OC4=CC=CC=C4
0.46806238
Indinavir
CC(C)(C)NC(=O)C1CN(CCN1CC(CC(CC
2=CC=CC=C2)C(=O)NC3C(CC4=CC=CC
=C34)O)O)CC5=CN=CC=C5
0.42568066
Carfilzomib
CC(C)CC(C(=O)C1(CO1)C)NC(=O)C(CC
2=CC=CC=C2)NC(=O)C(CC(C)C)NC(=O)
C(CCC3=CC=CC=C3)NC(=O)CN4CCOC
C4
0.40163301
17
Disclaimer:
This was an exercise to explore ChemProp, not
SARS-CoV. These results are preliminary at best
and need to be thoroughly explored and peer
reviewed before any conclusions or medically-
relevant actions can be taken. Please note that
the information presented has not been formally
peer reviewed and expresses the opinions of the
BioTeam.
In short: it was a toy example and does not
constitute any medical advice!
Come talk to BioTeam about
your scientific goals

BioIT Webinar on AI and data methods for drug discovery

  • 1.
    ©2016 BioTeam, Inc.All Rights Reserved.©2020 BioTeam, Inc. All Rights Reserved. Fernanda Foertter Sr. Scientific Consultant With the Power of AI Comes Great Responsibility
  • 2.
    Do Not Distribute- Copyright BioTeam, Inc., All Rights Reserved What is AI? 2020 2 AI means using empirical data to generate an algorithm that can predict or make decisions on, new data. Deep Learning Method where features are not explicitly outlined Machine Learning Methods that improve with experience via implicit algorithms Artificial Intelligence Methods where computers to make decisions imitating humans AI ML DL
  • 3.
    Do Not Distribute- Copyright BioTeam, Inc., All Rights Reserved What is AI? • It is not a black box • Results are not fact • It will probably not replace traditional methods • Not difficult to get started 2020 3 AI means using empirical data to generate an algorithm that can predict or make decisions on, new data. • It is a method with inputs and outputs • Results are mathematical • It may replace traditional methods • A tool
  • 4.
    When is itappropriate to use AI? • Now • You may be using it already • When time-to-solution matters • When throughput matters • When there are many covariates to consider • When modeling is to difficult and will take time to develop • When you have lots of data 4
  • 5.
    What if Idon’t know where to start • Start with an specific problem statement • Are there aliens out there? • Can I identify something in a image? 5
  • 6.
    Tips and Tricks •Check out Google’s Teachable Machine https://teachablemachine.withgoogle.com
  • 7.
    Tips and Tricks •Check out Tensorflow’s Playground http://playground.tensorflow.org
  • 8.
    Tips and Tricks •Check out Andrej Karpathy blog http://karpathy.github.io/2019/04/25/recipe/
  • 9.
    Data tips 9 • Yourdata will likely • Have artifacts • Be incomplete • Be skewed • Be biased • Be wrong • Be multi-modal • Be noisy This is part of AI. Embrace it. Accept now that your data will be bad.
  • 10.
  • 11.
    Tools for drugdiscovery • Molecular Docking aims to find drugs that fit in areas of an organism that interfere with typical function • It can take minutes to days to sample a single molecule with various conformations • We may not have a good idea of the target site 11 Source Wikimedia Commons
  • 12.
    ChemProp • A deeplearning framework for drug discovery • Developed by MIT’s CSAIL • Pulls drugs from the Broad Repurposing Hub • Uses Message Passing Neural Network (MPNN) • Input features is fairly simple 12 Data encoding for training data SMILES Activity COC1=CC(=C(C=C1)OC)C2=C3C=C(C(=O)C=C3OC4=CC(=C(C=C42)O)O) O 1 COC1=CC(=C(C=C1)/C=N/NC(=O)C2=NN(C(=N2)C3=CC=CC=C3)C4=CC= CC=C4)O 1 CN1C2=C(C=C(C=C2)NC(=O)CCl)N(C1=O)C 1 CCS(=O)(=O)N1C(CC(=N1)C2=CC(=CC=C2)NS(=O)(=O)C)C3=CC=C(C=C 3)C 0 CCOC1=CC=C(C=C1)NC(=O)CSC2=NN=C(C=C2)C3=CC=CC=N3 1 CCOC1=CC=C(C=C1)CNC(=O)C2CCN(CC2)S(=O)(=O)C3=CC4=C(C=C3)N C(=O)CCC4 1 CCOC(=O)N1CCN(CC1)S(=O)(=O)C2=CC=C(C=C2)C(=O)NNC3=NC4=C(C =CC=C4S3)C 1 CCN(CC)S(=O)(=O)C1=CC=CC(=C1)C(=O)N[C@@H](C(C)C)C(=O)NNC(= O)C2=CC=CC=C2 0 CCN(CC)S(=O)(=O)C1=CC=C(C=C1)S(=O)(=O)N2CCCC2C(=O)O 1 CCN(CC)C1=CC(=C(C=C1)/C=N/NC(=O)C2=CC(=CC=C2)S(=O)(=O)NC3= CC=CC=C3OC)O 1 CCCN1C=NC2=C1C=C(C(=C2N)C)C 0 CCCN1C(=O)C(SC1=O)CC(=O)NC2=CC=C(C=C2)C 1 CCC(C)NC(=O)C1CCN(CC1)S(=O)(=O)C2=CC=CC3=C2N=CC=C3 0
  • 13.
  • 14.
  • 15.
  • 16.
    Playing with ChemProp 16 3CLproInhibition prediction from SARS-CoV model Drug Name SMILES Activity Probability Zafirlukast Cc1ccccc1S(=O)(=O)NC(=O)c2cc(OC)c( cc2)Cc3cn(C)c4ccc(cc43)NC(=O)OC5C CCC5 0.72431216 Montelukast CC(C)(C1=CC=CC=C1CCC(C2=CC=CC( =C2)C=CC3=NC4=C(C=CC(=C4)Cl)C=C 3)SCC5(CC5)CC(=O)O)O idasanutlin 0.60056485 Ritonavir CC(C)C1=NC(=CS1)CN(C)C(=O)NC(C(C )C)C(=O)NC(CC2=CC=CC=C2)CC(C(CC 3=CC=CC=C3)NC(=O)OCC4=CN=CS4)O 0.51782315 Remdesivir CCC(CC)COC(=O)C(C)NP(=O)(OCC1C( C(C(O1)(C#N)C2=CC=C3N2N=CN=C3N) O)O)OC4=CC=CC=C4 0.46806238 Indinavir CC(C)(C)NC(=O)C1CN(CCN1CC(CC(CC 2=CC=CC=C2)C(=O)NC3C(CC4=CC=CC =C34)O)O)CC5=CN=CC=C5 0.42568066 Carfilzomib CC(C)CC(C(=O)C1(CO1)C)NC(=O)C(CC 2=CC=CC=C2)NC(=O)C(CC(C)C)NC(=O) C(CCC3=CC=CC=C3)NC(=O)CN4CCOC C4 0.40163301
  • 17.
    17 Disclaimer: This was anexercise to explore ChemProp, not SARS-CoV. These results are preliminary at best and need to be thoroughly explored and peer reviewed before any conclusions or medically- relevant actions can be taken. Please note that the information presented has not been formally peer reviewed and expresses the opinions of the BioTeam. In short: it was a toy example and does not constitute any medical advice! Come talk to BioTeam about your scientific goals