Data analysis in QSAR        Noel O’Boyle  Dave Palmer, John Mitchell
Quantitative Structure-Activity             Relationship (QSAR)Also QSPR (Property)                       Perceive        ...
Quantitative Structure-Activity              Relationship (QSAR)Also QSPR (Property)                             Perceive ...
Quantitative Structure-Activity              Relationship (QSAR)                                           Need to use an ...
Quantitative Structure-Activity                   Relationship (QSAR)                                                     ...
Feature selection                  Number of descriptors →Features = descriptors                2n different subsets
Parameter optimisation– Models have parameters which should be tuned   • Support vector machines        – gamma, cost, eps...
QSAR: Feature selection and parameter             optimisation• Aim: To find a robust method of simultaneously  optimising...
Ant Colony Optimisation (ACO)•   Insipired by behaviour of ants foraging for food•   Ants lay down pheromones, which influ...
Ant Colony Optimisation (ACO)• population of ants (typically 50 to 100)    – each ant represents a model – i.e. a subset o...
Does it work?
Upcoming SlideShare
Loading in …5
×

Data Analysis in QSAR

1,019
-1

Published on

Unilever Centre seminar May 2006

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,019
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
50
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Analysis in QSAR

  1. 1. Data analysis in QSAR Noel O’Boyle Dave Palmer, John Mitchell
  2. 2. Quantitative Structure-Activity Relationship (QSAR)Also QSPR (Property) Perceive Physical Structure Predict property Propose
  3. 3. Quantitative Structure-Activity Relationship (QSAR)Also QSPR (Property) Perceive Physical Structure Predict property Propose Descriptors Molecular weight No.of H-bonding groups Surface area MOE: 180 descriptors
  4. 4. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) orBuild model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3)
  5. 5. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) orBuild model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3)Testing must be carried out on a hold-out set from the same distributionas the the set of molecules used for optimisation=> dont optimise model on diverse setAssessing Model Fit by Cross-Validation, JCICS, 2003, 43, 579.Beware of q2, J.Mol.Graph.Model., 2002, 20, 269.
  6. 6. Feature selection Number of descriptors →Features = descriptors 2n different subsets
  7. 7. Parameter optimisation– Models have parameters which should be tuned • Support vector machines – gamma, cost, epsilon
  8. 8. QSAR: Feature selection and parameter optimisation• Aim: To find a robust method of simultaneously optimising the parameters and performing feature selection• Prediction of solubility for drug-like molecules – David Palmer – Support vector machines • gamma, cost, epsilon – 127 descriptors• Large search space – stochastic algorithm required
  9. 9. Ant Colony Optimisation (ACO)• Insipired by behaviour of ants foraging for food• Ants lay down pheromones, which influence the path taken by other ants. Meanwhile pheromones are evaporating.• Ants’ trails converge to shortest path between nest and food• Ant Colony Optimisation (ACO) – Marco Dorigo, PhD Thesis, 1992.• ACO for feature selection – Shen et al, JCIM, 2005, 45, 1024.• We have extended it to perform simultaneous parameter optimisation
  10. 10. Ant Colony Optimisation (ACO)• population of ants (typically 50 to 100) – each ant represents a model – i.e. a subset of descriptors and values for the parameters – each ant has a fitness score, e.g. 10-fold cross validation rmse• it is more likely that an ant will choose a particular descriptor/parameter value in the next iteration if – many ants have chosen it in this iteration (local search), or – many ants have chosen it in their best models to date (global search)• for descriptors/parameter values that are not chosen in the current or best models, the probability that they will be chosen decreases (evaporation)
  11. 11. Does it work?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×