2. Practical example
Story:
A company that produces or intends
to produce some particular
compound (drug, make up, paint,
glue, toilet refresher, whatever..) is
obliged to test, if this compound is
toxic for human and how toxic it is.
What are the options to check
this?
Teuthrin, Cyclopropanecarboxylic acid
3. Practical example
Bioassay Computer modeling
In silico: using QSAR (QSPR) based
on machine learning to predict
In vivo and in vitro assays with
properties of interest without direct
mice, dogs, rats or other species
experiment.
4. Option 1: Bioassay
Classical and currently widely used method
for measuring toxicity is bioassay with
mice, rats, dogs or other species.
What are advantages and disadvantages
5. Option 1: Bioassay
For bioassay we would typically need:
• Dozens of mice for checking several concentrations of
tested compound
• In some assays we need to wait for next generation
• We may need to test against several organisms (rat,
mouse) and dierent administration routes (oral, skin, IV
injection)
• Test can take upto several months
• Test would cost upto dozens of thousands dollars
What if we need to measure toxicity for 100 000 compounds?
6. Option 2: Modeling
What are the steps required to build
predictive model for physicochemical or
biological property?
• Prepare dataset of experimental data
• Choose and calculate molecular
descriptors
• Apply machine learning method
7. Molecular descriptors
What is descriptor? Most simple examples?
Descriptor is some numerical property of chemical
compound.
• Simplest constitutional descriptors: MW, NA, nDB, ..
• Molecular properties: LogP, hydrophilic factor, ..
• Randic molecular profiles
• Various topological and 3D indices and profiles
9. Machine learning
What kind of machine learning methods do
you know?
• Linear regression
• K nearest neighbors (KNN)
• Partial Least Regression
• Neural networks
• Support Vector Machines
12. SMILES — a string representation
C1=CC=C(C=C1)Br
CC(F)F
COC(C(Cl)Cl)(F)
F
13. InChI — one more approach
InChI (international chemical identifier) — a standart, developed by IUPAC
for a textual identifier of chemical substances
InChI: InChI=1S/C6H5Br/c7-6-4-2-1-3-5-6/h1-5H
InChIKey: QARVLSVVCXYDNA-UHFFFAOYSA
InChI: InChI=1S/C2H4F2/c1-2(3)4/h2H,1H3
InChIKey: NPNPZTNLOVBDOC-UHFFFAOYSA
InChI: InChI=1S/C3H4Cl2F2O/c1-8-3(6,7)2(4)5/h2H,1H3
InChIKey: RFKMCNOHBTXSMU-UHFFFAOYSA