Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data analysis in QSAR

        Noel O’Boyle

  Dave Palmer, John Mitchell
Quantitative Structure-Activity
             Relationship (QSAR)
Also QSPR (Property)



                       Perceive
 ...
Quantitative Structure-Activity
              Relationship (QSAR)
Also QSPR (Property)



                             Per...
Quantitative Structure-Activity
              Relationship (QSAR)
                                           Need to use a...
Quantitative Structure-Activity
                   Relationship (QSAR)
                                                   ...
Feature selection




                  Number of descriptors →

Features = descriptors                2n different subsets
Parameter optimisation

– Models have parameters which should be tuned
   • Support vector machines
        – gamma, cost,...
QSAR: Feature selection and parameter
             optimisation
• Aim: To find a robust method of simultaneously
  optimis...
Ant Colony Optimisation (ACO)
•   Insipired by behaviour of ants foraging for food
•   Ants lay down pheromones, which inf...
Ant Colony Optimisation (ACO)

• population of ants (typically 50 to 100)
    – each ant represents a model – i.e. a subse...
Does it work?
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Qsar lecture
Next
Upcoming SlideShare
Qsar lecture
Next
Download to read offline and view in fullscreen.

1

Share

Data Analysis in QSAR

Download to read offline

Unilever Centre seminar May 2006

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Data Analysis in QSAR

  1. 1. Data analysis in QSAR Noel O’Boyle Dave Palmer, John Mitchell
  2. 2. Quantitative Structure-Activity Relationship (QSAR) Also QSPR (Property) Perceive Physical Structure Predict property Propose
  3. 3. Quantitative Structure-Activity Relationship (QSAR) Also QSPR (Property) Perceive Physical Structure Predict property Propose Descriptors Molecular weight No.of H-bonding groups Surface area MOE: 180 descriptors
  4. 4. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) or Build model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3)
  5. 5. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) or Build model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3) Testing must be carried out on a hold-out set from the same distribution as the the set of molecules used for optimisation => don't optimise model on diverse set Assessing Model Fit by Cross-Validation, JCICS, 2003, 43, 579. Beware of q2, J.Mol.Graph.Model., 2002, 20, 269.
  6. 6. Feature selection Number of descriptors → Features = descriptors 2n different subsets
  7. 7. Parameter optimisation – Models have parameters which should be tuned • Support vector machines – gamma, cost, epsilon
  8. 8. QSAR: Feature selection and parameter optimisation • Aim: To find a robust method of simultaneously optimising the parameters and performing feature selection • Prediction of solubility for drug-like molecules – David Palmer – Support vector machines • gamma, cost, epsilon – 127 descriptors • Large search space – stochastic algorithm required
  9. 9. Ant Colony Optimisation (ACO) • Insipired by behaviour of ants foraging for food • Ants lay down pheromones, which influence the path taken by other ants. Meanwhile pheromones are evaporating. • Ants’ trails converge to shortest path between nest and food • Ant Colony Optimisation (ACO) – Marco Dorigo, PhD Thesis, 1992. • ACO for feature selection – Shen et al, JCIM, 2005, 45, 1024. • We have extended it to perform simultaneous parameter optimisation
  10. 10. Ant Colony Optimisation (ACO) • population of ants (typically 50 to 100) – each ant represents a model – i.e. a subset of descriptors and values for the parameters – each ant has a fitness score, e.g. 10-fold cross validation rmse • it is more likely that an ant will choose a particular descriptor/parameter value in the next iteration if – many ants have chosen it in this iteration (local search), or – many ants have chosen it in their best models to date (global search) • for descriptors/parameter values that are not chosen in the current or best models, the probability that they will be chosen decreases (evaporation)
  11. 11. Does it work?
  • PoojiPOOJITHA7

    May. 11, 2019

Unilever Centre seminar May 2006

Views

Total views

1,918

On Slideshare

0

From embeds

0

Number of embeds

37

Actions

Downloads

60

Shares

0

Comments

0

Likes

1

×