Chemoinformatics in Action

928 views

Published on

AACIMP 2009 Summer School lecture by Yuriy Sushko and Sergii Novotarskyi. "Environmental Chemoinfornatics" course.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
928
On SlideShare
0
From Embeds
0
Number of Embeds
101
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Chemoinformatics in Action

  1. 1. Chemoinformatics in action: some question for audience Yuriy Sushko, Sergii Novotarskyi
  2. 2. Practical example Story: A company that produces or intends to produce some particular compound (drug, make up, paint, glue, toilet refresher, whatever..) is obliged to test, if this compound is toxic for human and how toxic it is. What are the options to check this? Teuthrin, Cyclopropanecarboxylic acid
  3. 3. Practical example Bioassay Computer modeling In silico: using QSAR (QSPR) based on machine learning to predict In vivo and in vitro assays with properties of interest without direct mice, dogs, rats or other species experiment.
  4. 4. Option 1: Bioassay Classical and currently widely used method for measuring toxicity is bioassay with mice, rats, dogs or other species. What are advantages and disadvantages
  5. 5. Option 1: Bioassay For bioassay we would typically need: • Dozens of mice for checking several concentrations of tested compound • In some assays we need to wait for next generation • We may need to test against several organisms (rat, mouse) and dierent administration routes (oral, skin, IV injection) • Test can take upto several months • Test would cost upto dozens of thousands dollars What if we need to measure toxicity for 100 000 compounds?
  6. 6. Option 2: Modeling What are the steps required to build predictive model for physicochemical or biological property? • Prepare dataset of experimental data • Choose and calculate molecular descriptors • Apply machine learning method
  7. 7. Molecular descriptors What is descriptor? Most simple examples? Descriptor is some numerical property of chemical compound. • Simplest constitutional descriptors: MW, NA, nDB, .. • Molecular properties: LogP, hydrophilic factor, .. • Randic molecular profiles • Various topological and 3D indices and profiles
  8. 8. Molecular descriptors 2.54 4.25 -5.71 3.26 0.57 -0.07 1.45 6.34 8.28 2.78 -5.67 -2.33 1.45 7.34 8.35 1.64 -5.56 -4.45
  9. 9. Machine learning What kind of machine learning methods do you know? • Linear regression • K nearest neighbors (KNN) • Partial Least Regression • Neural networks • Support Vector Machines
  10. 10. Some additional facts Popular formats for representing molecules in databases • SDF • SMILES • INCHI
  11. 11. SDF — a plain text file benzene ACD/Labs0812062058 header 6 6 0 0 0 0 0 0 0 0 1 V2000 1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 atom information -0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 3 1 2 0 0 0 0 4 2 2 0 0 0 0 5 3 1 0 0 0 0 bond information 6 4 1 0 0 0 0 6 5 2 0 0 0 0 M END $$$$ > <Unique_ID> XCA3464366 > <ClogP> 5.825 tags > <Vendor> Sigma > <Molecular Weight> 499.611
  12. 12. SMILES — a string representation C1=CC=C(C=C1)Br CC(F)F COC(C(Cl)Cl)(F) F
  13. 13. InChI — one more approach InChI (international chemical identifier) — a standart, developed by IUPAC for a textual identifier of chemical substances InChI: InChI=1S/C6H5Br/c7-6-4-2-1-3-5-6/h1-5H InChIKey: QARVLSVVCXYDNA-UHFFFAOYSA InChI: InChI=1S/C2H4F2/c1-2(3)4/h2H,1H3 InChIKey: NPNPZTNLOVBDOC-UHFFFAOYSA InChI: InChI=1S/C3H4Cl2F2O/c1-8-3(6,7)2(4)5/h2H,1H3 InChIKey: RFKMCNOHBTXSMU-UHFFFAOYSA

×