Importance of PROCESS is not less than PRODUCT5/27/2014 1
Computer Aided Drug Design:
QSAR Related Methods
Jahan B Ghasemi
DDSLab K N Toosi Univ of Tech.
Tehran, Iran
5/27/2014 Importance of PROCESS is not less than PRODUCT
Topics in this Talk
are:
General
Introduction
Some of These
QSAR ...
Importance of PROCESS is not less than PRODUCT
"Well begun is half done“ Aristotle
Renes Descartes in 1619 Quantitative
Me...
Importance of PROCESS is not less than
PRODUCT
Theory
Hypothesis
Confirmation
Observation
Theory
Hypothesis
Observation
Pa...
5/27/2014 Importance of PROCESS is not less than PRODUCT
What Do We Need to Know in a Successful
QSAR Modeling as a Drug D...
I- Math-Science or Informatique or Informatics
Aspect
Linear Algebra
Vectors, Matrices,
Tensors…
Homogenous and regular li...
5/27/2014 Importance of PROCESS is not less than PRODUCT
II-Bio-Science
Aspect
Chemistry
Organic Chemistry
Quantum/Molecul...
Combination
of I and II
OMICS
Bioinformatics
Proteomics
Metabolomics
Genomics
Metrics
Biometrics
Chemometrics
Technometric...
Bio-
Science
Part Start
Here:
5/27/2014 Importance of PROCESS is not less than PRODUCT 10
Chemical Space
(Gathering Information from All Involved Species)
Aggregation
Host-Guest
Complex
Receptor-
Inhibitor
Comple...
Chemical Space
Chemical Information
Information
due to
Macromolecule
Structure
Information
due to
Aggregation Structure
In...
To have
and use
Chemical
Space:
Extract and Convert
Chemical
Information
to
Numerical Values
We Are Calling
These Numerica...
Descriptors should
be associated with
the following
desirable features:
Easy Interpretation
Show Correlation with a Proper...
End Points to
Be Modeled
Chemical
properties
Boiling point
Retention time
Dielectric constant
Diffusion coefficient
Dissoc...
End Points to
Be Modeled
Biological
Properties
Bioconcentration
Biodegradation
Carcinogenicity
Drug metabolism and clearan...
There are more
than 5500 Mol.
Des. BUT!
Why do we need more
and more Molecular
Descriptors?
Each molecular descriptor take...
Molecular Descriptors
Cost to Generate:
Cheap Expensive
5/27/2014 Importance of PROCESS is not less than PRODUCT 18
Molecular
Descriptors
How to Calculate Molecular
Descriptors?
By Hand! By Software
Dragon SYBYL
PaDEL-
Descriptor
AdrianaC...
Molecular Descriptors
Classes!
Different
Classes?
Yes
How many?
Many classes
What are the bases of
Classification?
Based o...
Molecular
Descriptors
Do they have equal importance?
0D<1D<2D<2.5D<3D<4D…<nD
Low Information Content High Information Cont...
Now We Have Molecular Descriptors and Chemical,
Molecular or Information Space
But first define and introduce:
Objects=
Mo...
5/27/2014 Importance of PROCESS is not less than PRODUCT 23
Math-Science Part
Start Here: Using
a Very Efficient
Way to Sh...
Objects
as rows
Variables as Columns
1
2
3
.
.
.
.
.
.
.
.
.
.
n
1 2 3 . . . . . . . . . m
Objects
as rows
1
2
3
.
.
.
.
....
Preprocessing
On End Point
Vector y
nM unit
log Transformation
To Linearized the
Variation
To Have LFER
InterpretationMean...
Geometrical Interpretation of Information Matrix
Spaces
Row
Space
Column Space:
Object Map
Metrics
Distances
Euclidean
and...
Row Space!
Is it informative? How? What does it mean? How can we use it?
On
O1
O2
Each Point is a Vector!
m-dimensional sp...
Column Space
Objects Map Scientists(Chemists, Biologists..) are interest in!!!
Is it informative? How? What does it mean? ...
QSAR Model Building
Based on Molecular Geometry
2D-QSAR 2.5D-QSAR 3D-QSAR
5/27/2014 Importance of PROCESS is not less than...
QSAR Model
Building
Type of Mapping Function
A Crucial Decision
Linear
MLR kNN PLS
Nonlinear
ANN SVM
Linear+Non-
Linear
DT...
QSAR Model Building
Object Selection-Data Splitting-Train-Test Sets
To have Good 1- Representative and 2- Diversity
y-Base...
QSAR Model Building
Variable Selection
Filters
(Subjective)
Uninformative Variable Elimination (UVE)
Correlation Ranking (...
QSAR Model Building
Model Validation- There are different Criteria in the Literatures
Residual
Analysis
Analysis of
Varain...
Purposes OF
QSAR:
Rational
Identification of
New Leads with:
Pharmacological,
Biocidal or
Pesticidal
Activity.
Optimizatio...
Purposes OF
QSAR:
The Selection of
Compounds with
Optimal
Pharmacokinetic
Properties.
The Prediction of a
variety of Physi...
Purposes OF
QSAR:
The Identification
of Hazardous
Compounds at
Early Stages.
The Designing out
of Toxicity and
Side-Effect...
Original
Data Set
Curated
Dataset
Split into
training, test
and external
validation set
Multiple
Training
Sets
Y-Randomiza...
5/27/2014 Importance of PROCESS is not less than PRODUCT
ASmallQuestion!!!
Why is QSAR alive in spite of the existence of ...
5/27/2014 Importance of PROCESS is not less than PRODUCT 39
Thank You All!
1
2
a
d
c
b
Which one would
be the third point?
a, b, c or d?
1 and 2 have the largest distance.
They are firstly selected...
5/27/2014 Importance of PROCESS is not less than PRODUCT
Applicability Domain
41
Q Residuals and Hotelling T2
5/27/2014 Importance of PROCESS is not less than PRODUCT 42
5/27/2014 Importance of PROCESS is not less than PRODUCT 44
5/27/2014 Importance of PROCESS is not less than PRODUCT
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
2000
4000
6000
8000
10000...
Activity Descr 1 Descr 2 … Descr m
Y1 X11 X12 … X1m
Y2 X21 X22 … X2m
… … … … …
Yn Xn1 Xn2 … Xnm
Yi = a0 + a1 Xi1 + a2 Xi2 ...
nnn FqtqtqtY  2211
• t latent variables or scores
• q loading vectors
Partial Least Square (PLS)
Robust with respect...
48
Works on Similarity Principle
A compound in space close to, its kNN compounds from the training set and predicts the ac...
Artificial Neural Network (ANN)
495/27/2014 Importance of PROCESS is not less than PRODUCT
DescriptorsorOriginalSpace
Nonl...






otherwise
if


 
0
:Only the points outside the ε-tube are penalized in a
linear fashion
ε-Insensitive L...
Non-linear SVMs
 Datasets that are linearly separable with some noise work out great:
 But what are we going to do if th...
Non-linear SVMs: Feature spaces
 General idea: the original input space can always be mapped to some higher-
dimensional ...
Decision Trees as a Greedy Algorithm:
CART: Classification and regression Tree
Binary recursive partitioning tree
 Best F...
3D-QSAR
Notes
Advantages over 2D-QSAR
No reliance on experimental values
Can be applied to molecules with unusual substitu...
3D-QSAR
Comparative molecular field analysis (CoMFA) - Tripos
Build each molecule using modelling software
Identify the ac...
3D-QSAR
Method
NHCH3
OH
HO
HO
Active conformation
Build 3D
model
Define pharmacophore
5/27/2014 Importance of PROCESS is n...
3D-QSAR
•Place the pharmacophore into a lattice of grid points
Method
•Each grid point defines a point in space
Grid point...
3D-QSAR
Method
•Each grid point defines a point in space
Grid points
.
.
.
.
.
•Position molecule to match the pharmacopho...
3D-QSAR
•A probe atom is placed at each grid point in turn
Method
•Probe atom = a proton or sp3 hybridised carbocation
.
....
3D-QSAR
•A probe atom is placed at each grid point in turn
Method
•Measure the steric or electrostatic interaction of the ...
3D-QSAR
Method
Compound Biological Steric fields (S) Electrostatic fields (E)
activity at grid points (001-998) at grid po...
3D-QSAR
•Define fields using contour maps round a representative molecule
Method
5/27/2014 Importance of PROCESS is not le...
A procedure based on the information included in the
MIF
generating a handful of informative variables,
independent of t...
MACC2 transform
 The MACC transform has
maximum value of the products of
the two i and j field values, found
at each diff...
GRID interaction fields
calculated using the N1 probe:
positive (yellow) interactions
describe unfavorable and
negative (b...
5/27/2014 Importance of PROCESS is not less than PRODUCT 67
Each number are corresponds to
a specific distance of the fields
5/27/2014 Importance of PROCESS is not less than PRODUCT ...
5/27/2014 Importance of PROCESS is not less than PRODUCT 69
5/27/2014 Importance of PROCESS is not less than PRODUCT 70
5/27/2014 Importance of PROCESS is not less than PRODUCT 71
5/27/2014 Importance of PROCESS is not less than PRODUCT 72
One of the unique features of the MACC
transform is that it is possible to trace back the
variables that generated this "m...
Upcoming SlideShare
Loading in …5
×

Computer Aided Drug Design QSAR Related Methods

3,147 views

Published on

Presented at 5th Bioinformatics Conf in Univ of Tehran May 2014

Published in: Technology

Computer Aided Drug Design QSAR Related Methods

  1. 1. Importance of PROCESS is not less than PRODUCT5/27/2014 1
  2. 2. Computer Aided Drug Design: QSAR Related Methods Jahan B Ghasemi DDSLab K N Toosi Univ of Tech. Tehran, Iran
  3. 3. 5/27/2014 Importance of PROCESS is not less than PRODUCT Topics in this Talk are: General Introduction Some of These QSAR Steps: 3 Data Pre-Processing Normalization Standardization Variable Selection Subset Selection Outlier Detection Multivariate Analysis MLR PCA PLS SVM ANN CART Molecular Descriptors Constitutional Electronic Geometrical Hydrophobic Lipophilicity Solubility Steric Quantum Chemical Topological Molecular Structures OC1=CC=CC=C1 1D 2D 3D Statistical Evaluation R R2 Q2 MSE RMSE PRESS
  4. 4. Importance of PROCESS is not less than PRODUCT "Well begun is half done“ Aristotle Renes Descartes in 1619 Quantitative Measurement in Science Research Types Inductive Approach Deductive Approach Abductive Approach 5/27/2014 4 General Introduction
  5. 5. Importance of PROCESS is not less than PRODUCT Theory Hypothesis Confirmation Observation Theory Hypothesis Observation Pattern Induction is usually described as moving from the specific to the general, while deduction begins with the general and ends with the specific. Arguments based on laws, rules and accepted principles are generally used for Deductive Reasoning. Observations tend to be used for Inductive Arguments. 5/27/2014 -Metrics as soft-computing or soft-modeling are Inductive Research Approaches. Uncertainty Are humans natural logic reasoners? No!!! 5
  6. 6. 5/27/2014 Importance of PROCESS is not less than PRODUCT What Do We Need to Know in a Successful QSAR Modeling as a Drug Design Tool? 6
  7. 7. I- Math-Science or Informatique or Informatics Aspect Linear Algebra Vectors, Matrices, Tensors… Homogenous and regular linear and nonlinear simultaneous equations Graph Theory Maximal Subgraph Clique Detection Multivariate Statistical Analysis Column Space, Row SpacePattern Recognition (Dis)Similarity Distance Metrics, Euclidean, Manhattan, Mahalanobis Fingerprints, Tanimoto, Jaccard Supervised and Unsupervised Pattern Recognition Clustering, Agglomerative(bottom up), Divisive(top down) MLR, PCA, PLS Optimization Selection of the most informative variables, GA Selection of the most representative objects, KS Function minimization, Newton, Gauss-Newton, Marquradt-Levenberg Computer Computer Graphic HPC 5/27/2014 Importance of PROCESS is not less than PRODUCT 7
  8. 8. 5/27/2014 Importance of PROCESS is not less than PRODUCT II-Bio-Science Aspect Chemistry Organic Chemistry Quantum/Molecular Mechanics Forcefield, Conformer, Bioactive Conformer Medicinal Chemistry Biology Molecular Biology Systems Biology Pharmacology Pharmacokinetics Pharmacodynamics Toxicity ADMET 8
  9. 9. Combination of I and II OMICS Bioinformatics Proteomics Metabolomics Genomics Metrics Biometrics Chemometrics Technometrics Chem(o)informatics 5/27/2014 Importance of PROCESS is not less than PRODUCT 9 QSAR is related to the most of – OMICS and – METRICS routines
  10. 10. Bio- Science Part Start Here: 5/27/2014 Importance of PROCESS is not less than PRODUCT 10
  11. 11. Chemical Space (Gathering Information from All Involved Species) Aggregation Host-Guest Complex Receptor- Inhibitor Complex Macromolecules Protein Receptor Host Small Molecules Guest Ligand Inhibitor 5/27/2014 Importance of PROCESS is not less than PRODUCT 11
  12. 12. Chemical Space Chemical Information Information due to Macromolecule Structure Information due to Aggregation Structure Information Due to Small Molecule Structure 5/27/2014 Importance of PROCESS is not less than PRODUCT 12
  13. 13. To have and use Chemical Space: Extract and Convert Chemical Information to Numerical Values We Are Calling These Numerical Values: Molecular Descriptors 5/27/2014 Importance of PROCESS is not less than PRODUCT 13
  14. 14. Descriptors should be associated with the following desirable features: Easy Interpretation Show Correlation with a Property Discrimination of Isomers Independence Simplicity Not to be based on properties Not to be trivially related to other descriptors Allow for efficient construction Use familiar structural concepts Show gradual change with gradual change in structures 5/27/2014Importance of PROCESS is not less than PRODUCT
  15. 15. End Points to Be Modeled Chemical properties Boiling point Retention time Dielectric constant Diffusion coefficient Dissociation constant Melting point Reactivity Solubility Stability Thermodynamic properties Viscosity 5/27/2014Importance of PROCESS is not less than PRODUCT
  16. 16. End Points to Be Modeled Biological Properties Bioconcentration Biodegradation Carcinogenicity Drug metabolism and clearance Inhibition constant Mutagenicity Permeability Blood brain barrier Skin Pharmacokinetics Receptor binding 5/27/2014Importance of PROCESS is not less than PRODUCT
  17. 17. There are more than 5500 Mol. Des. BUT! Why do we need more and more Molecular Descriptors? Each molecular descriptor takes into account a small part of the whole chemical information contained into the real molecule and, as a consequence, the number of descriptors is continuously increasing with the increasing request of deeper investigations on chemical and biological systems. Different descriptors have independent methods or perspectives to view a molecule, taking into account the various features of chemical structure. Molecular descriptors have now become some of the most important variables used in molecular modeling, and, consequently, managed by statistics, chemometrics, and chemoinformatics. 5/27/2014 Importance of PROCESS is not less than PRODUCT 17
  18. 18. Molecular Descriptors Cost to Generate: Cheap Expensive 5/27/2014 Importance of PROCESS is not less than PRODUCT 18
  19. 19. Molecular Descriptors How to Calculate Molecular Descriptors? By Hand! By Software Dragon SYBYL PaDEL- Descriptor AdrianaCode 5/27/2014 Importance of PROCESS is not less than PRODUCT 19
  20. 20. Molecular Descriptors Classes! Different Classes? Yes How many? Many classes What are the bases of Classification? Based of Dimensionality 0D-4D Geometric Constitutional Topological Quantum Chemical etc…. Based of Origin Theoretical Experimental Both! 5/27/2014 Importance of PROCESS is not less than PRODUCT 20
  21. 21. Molecular Descriptors Do they have equal importance? 0D<1D<2D<2.5D<3D<4D…<nD Low Information Content High Information Content 5/27/2014 Importance of PROCESS is not less than PRODUCT 21
  22. 22. Now We Have Molecular Descriptors and Chemical, Molecular or Information Space But first define and introduce: Objects= Molecules Variables= Descriptors Object to Variable ratio ≥ 4 Why? Least-Squares Need it! 5/27/2014 Importance of PROCESS is not less than PRODUCT 22
  23. 23. 5/27/2014 Importance of PROCESS is not less than PRODUCT 23 Math-Science Part Start Here: Using a Very Efficient Way to Show Chemical Information: Matrix-Vector
  24. 24. Objects as rows Variables as Columns 1 2 3 . . . . . . . . . . n 1 2 3 . . . . . . . . . m Objects as rows 1 2 3 . . . . . . . . . . n
  25. 25. Preprocessing On End Point Vector y nM unit log Transformation To Linearized the Variation To Have LFER InterpretationMean Centering Autoscaling On Molecular Descriptors Matrix X Mean Centering- Has its general purpose Autoscaling Has its general purpose Outlier Detection AD Dimensionality Reduction PCA 5/27/2014 Importance of PROCESS is not less than PRODUCT 25
  26. 26. Geometrical Interpretation of Information Matrix Spaces Row Space Column Space: Object Map Metrics Distances Euclidean and…. Classes Clusters Groups 5/27/2014 Importance of PROCESS is not less than PRODUCT 26
  27. 27. Row Space! Is it informative? How? What does it mean? How can we use it? On O1 O2 Each Point is a Vector! m-dimensional space Sm n- points pattern Pn Importance of PROCESS is not less than PRODUCT5/27/2014 27
  28. 28. Column Space Objects Map Scientists(Chemists, Biologists..) are interest in!!! Is it informative? How? What does it mean? How can we use it? Vn V1 V2 Class I or Group I Class II or Group II Each Point is a Vector! n-dimensional space Sn m- points pattern Pm Importance of PROCESS is not less than PRODUCT5/27/2014 28
  29. 29. QSAR Model Building Based on Molecular Geometry 2D-QSAR 2.5D-QSAR 3D-QSAR 5/27/2014 Importance of PROCESS is not less than PRODUCT 29
  30. 30. QSAR Model Building Type of Mapping Function A Crucial Decision Linear MLR kNN PLS Nonlinear ANN SVM Linear+Non- Linear DT + other Tree and Ensemble Methods 5/27/2014 Importance of PROCESS is not less than PRODUCT 30
  31. 31. QSAR Model Building Object Selection-Data Splitting-Train-Test Sets To have Good 1- Representative and 2- Diversity y-Based Method Randomly Evenly X-Based Methods Random Selection kNN Selection Similarity Principle KS,SOM, LMD, Duplex, MDC 5/27/2014 Importance of PROCESS is not less than PRODUCT 31
  32. 32. QSAR Model Building Variable Selection Filters (Subjective) Uninformative Variable Elimination (UVE) Correlation Ranking (CR) Wrappers (Objective) GA-PLS Embedded (Selection+Mapping Integrated) Stepwise Selection RM, ERM, FFD 5/27/2014 Importance of PROCESS is not less than PRODUCT 32
  33. 33. QSAR Model Building Model Validation- There are different Criteria in the Literatures Residual Analysis Analysis of Varaince Applicability Domain Residual Leverage Good Leverage Bad Leverage Q_Residual T2 _Hotelling Model Precision(Confidence Intervals of Model Parameters) Bootstrap Resampling Jackknife Resampling Model Accuracy(Predic tion Error) Internal Validation Cross Validation Leave One Out Leave Many Out Scrambling X- randomization y-randomization External Validation External and Fully Unseen or Independent Data Set 5/27/2014 Importance of PROCESS is not less than PRODUCT Final word on Validation: The external Independent Unseen Data Set Is Mandatory for a Successful QSAR Model: Do you know why? Local-X-Global or Induction Research has Uncertainty 33
  34. 34. Purposes OF QSAR: Rational Identification of New Leads with: Pharmacological, Biocidal or Pesticidal Activity. Optimization of New Leads with: Pharmacological, Biocidal or Pesticidal Activity. The Rational Design of: Surface-active agents, Perfumes, Dyes, and Fine Chemicals. 5/27/2014Importance of PROCESS is not less than PRODUCT
  35. 35. Purposes OF QSAR: The Selection of Compounds with Optimal Pharmacokinetic Properties. The Prediction of a variety of Physico- chemical Properties of Molecules. The Prediction of the Fate of Molecules. The Rationalization and Prediction of the Combined Effects of Molecules. 5/27/2014Importance of PROCESS is not less than PRODUCT
  36. 36. Purposes OF QSAR: The Identification of Hazardous Compounds at Early Stages. The Designing out of Toxicity and Side-Effects in New Compounds. The Prediction of Toxicity of Compounds to Humans. The Prediction of Toxicity to Environmental Species. 5/27/2014Importance of PROCESS is not less than PRODUCT
  37. 37. Original Data Set Curated Dataset Split into training, test and external validation set Multiple Training Sets Y-Randomization Combi-QSAR modeling Multiple Test Sets Activity Prediction Only Retain Models that pass both internal and external accuracy filters Validated Predictive models with High Internal and External Accuracy External Validation using Applicability Domain Virtual Screening Using Applicability Domain Experimental Validation The Most Rigorous and Currently Accepted QSAR Methodology 5/27/2014Importance of PROCESS is not less than PRODUCT
  38. 38. 5/27/2014 Importance of PROCESS is not less than PRODUCT ASmallQuestion!!! Why is QSAR alive in spite of the existence of very strong rivals like Docking, MDs, Pharmacophore, SB and LB methods? Modeling and taking into account all pharmacological phenomena is: Nearly or totally impossible even in high level and advanced research laboratories. 38
  39. 39. 5/27/2014 Importance of PROCESS is not less than PRODUCT 39 Thank You All!
  40. 40. 1 2 a d c b Which one would be the third point? a, b, c or d? 1 and 2 have the largest distance. They are firstly selected. Then distance between of all unselected points and all selected points calculated. Calculate distances 1a and 2a then min(1a,2a)= 2a. Calculate distances 1b and 2b then min(1b,2b)= 2b. Calculate distances 1c and 2c then min(1c,2c)= 1c. Calculate distances 1d and 2d then min(1d,2d)= 1d. Max(min(1a,2a),min(1b,2b),min(1c,2c),min(1d,2d))=1d Then the point d is selected as the Third Point and so on… 1a 2a 1b 2b 1c 2c1d 2d KSA Graphical Algorithm 5/27/2014 40Importance of PROCESS is not less than PRODUCT
  41. 41. 5/27/2014 Importance of PROCESS is not less than PRODUCT Applicability Domain 41
  42. 42. Q Residuals and Hotelling T2 5/27/2014 Importance of PROCESS is not less than PRODUCT 42
  43. 43. 5/27/2014 Importance of PROCESS is not less than PRODUCT 44
  44. 44. 5/27/2014 Importance of PROCESS is not less than PRODUCT -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 2000 4000 6000 8000 10000 12000 1 2 3 4 5 6 7 8 9 10 11 Original Data log Values 45
  45. 45. Activity Descr 1 Descr 2 … Descr m Y1 X11 X12 … X1m Y2 X21 X22 … X2m … … … … … Yn Xn1 Xn2 … Xnm Yi = a0 + a1 Xi1 + a2 Xi2 +…+ am Xim Don’t consider the nonlinearity effects Multiple Linear Regression (MLR) 465/27/2014 Importance of PROCESS is not less than PRODUCT
  46. 46. nnn FqtqtqtY  2211 • t latent variables or scores • q loading vectors Partial Least Square (PLS) Robust with respect to collinear descriptors Only one model optimization parameter (LV’s ) Fast computational 47
  47. 47. 48 Works on Similarity Principle A compound in space close to, its kNN compounds from the training set and predicts the activity class that is most highly represented among these neighbors. The k-NN scheme is sensitive: 1- Distance Metric 2- Number of training compounds 3- k can be optimized to yield best results. 5/27/2014 Importance of PROCESS is not less than PRODUCT The k-Nearest Neighbor Method kNN
  48. 48. Artificial Neural Network (ANN) 495/27/2014 Importance of PROCESS is not less than PRODUCT DescriptorsorOriginalSpace NonlinearorHiddenSpace PropertiesBeingPredicted
  49. 49.       otherwise if     0 :Only the points outside the ε-tube are penalized in a linear fashion ε-Insensitive Loss Function Support Vector Regression (SVR) Support Vector Classification (SVC) 505/27/2014 Importance of PROCESS is not less than PRODUCT
  50. 50. Non-linear SVMs  Datasets that are linearly separable with some noise work out great:  But what are we going to do if the dataset is just too hard?  How about… mapping data to a higher-dimensional space: 0 x 0 x 0 x x2 5/27/2014 Importance of PROCESS is not less than PRODUCT 51
  51. 51. Non-linear SVMs: Feature spaces  General idea: the original input space can always be mapped to some higher- dimensional feature space where the training set is separable: Φ: x → φ(x) 5/27/2014 Importance of PROCESS is not less than PRODUCT 52
  52. 52. Decision Trees as a Greedy Algorithm: CART: Classification and regression Tree Binary recursive partitioning tree  Best First  Left Right  Up down  Here the Variable to classify Audience! Here the First Variable is “Biologist or Not”? Why? We are in Bio-Dept. 535/27/2014 Importance of PROCESS is not less than PRODUCT
  53. 53. 3D-QSAR Notes Advantages over 2D-QSAR No reliance on experimental values Can be applied to molecules with unusual substituents Not restricted to molecules of the same structural class in (Pharmacophre 3D-QSAR case) Predictive capability 5/27/2014 Importance of PROCESS is not less than PRODUCT 54 No experimental constants or measurements are involved Properties are known as ‘Fields’ Steric field - defines the size and shape of the molecule Electrostatic field - defines electron rich/poor regions of molecule
  54. 54. 3D-QSAR Comparative molecular field analysis (CoMFA) - Tripos Build each molecule using modelling software Identify the active conformation for each molecule Identify the pharmacophore Method NHCH3 OH HO HO Active conformation Build 3D model Define pharmacophore 5/27/2014 Importance of PROCESS is not less than PRODUCT 55
  55. 55. 3D-QSAR Method NHCH3 OH HO HO Active conformation Build 3D model Define pharmacophore 5/27/2014 Importance of PROCESS is not less than PRODUCT 56 Comparative molecular field analysis (CoMFA) - Tripos Build each molecule using modelling software Identify the active conformation for each molecule Identify the pharmacophore
  56. 56. 3D-QSAR •Place the pharmacophore into a lattice of grid points Method •Each grid point defines a point in space Grid points . . . . . 5/27/2014 Importance of PROCESS is not less than PRODUCT 57
  57. 57. 3D-QSAR Method •Each grid point defines a point in space Grid points . . . . . •Position molecule to match the pharmacophore 5/27/2014 Importance of PROCESS is not less than PRODUCT 58
  58. 58. 3D-QSAR •A probe atom is placed at each grid point in turn Method •Probe atom = a proton or sp3 hybridised carbocation . . . . . Probe atom 5/27/2014 Importance of PROCESS is not less than PRODUCT 59
  59. 59. 3D-QSAR •A probe atom is placed at each grid point in turn Method •Measure the steric or electrostatic interaction of the probe atom with the molecule at each grid point . . . . . Probe atom 5/27/2014 Importance of PROCESS is not less than PRODUCT 60
  60. 60. 3D-QSAR Method Compound Biological Steric fields (S) Electrostatic fields (E) activity at grid points (001-998) at grid points (001-098) S001 S002 S003 S004 S005 etc E001 E002 E003 E004 E005 etc 1 5.1 2 6.8 3 5.3 4 6.4 5 6.1 Tabulate fields for each compound at each grid point Partial least squares analysis (PLS) QSAR equation Activity = aS001 + bS002 +……..mS998 + nE001 +…….+yE998 + z . . . . . 5/27/2014 Importance of PROCESS is not less than PRODUCT 62
  61. 61. 3D-QSAR •Define fields using contour maps round a representative molecule Method 5/27/2014 Importance of PROCESS is not less than PRODUCT 63
  62. 62. A procedure based on the information included in the MIF generating a handful of informative variables, independent of the location of the molecules within the grid Two main steps of the procedure of transformation:  Field filtering  Maximum auto-cross correlation(MACC2) encoding. 2 means distance between two points in the space. 2.5D-QSAR or GRIND methodology 5/27/2014 Importance of PROCESS is not less than PRODUCT 64
  63. 63. MACC2 transform  The MACC transform has maximum value of the products of the two i and j field values, found at each different rij distance.  Here the colors represent the activity of the compounds (blue inactive, red active)  33 means the energy products produced by two N1 probes  8 means the 8th variable of auto- correlogram 33 5/27/2014 Importance of PROCESS is not less than PRODUCT 65
  64. 64. GRID interaction fields calculated using the N1 probe: positive (yellow) interactions describe unfavorable and negative (blue) interactions describe favorable interactions they should have low energy values (representing highly favorable interactions) they should be as far as possible one from each other. 5/27/2014 Importance of PROCESS is not less than PRODUCT 66
  65. 65. 5/27/2014 Importance of PROCESS is not less than PRODUCT 67
  66. 66. Each number are corresponds to a specific distance of the fields 5/27/2014 Importance of PROCESS is not less than PRODUCT 68
  67. 67. 5/27/2014 Importance of PROCESS is not less than PRODUCT 69
  68. 68. 5/27/2014 Importance of PROCESS is not less than PRODUCT 70
  69. 69. 5/27/2014 Importance of PROCESS is not less than PRODUCT 71
  70. 70. 5/27/2014 Importance of PROCESS is not less than PRODUCT 72
  71. 71. One of the unique features of the MACC transform is that it is possible to trace back the variables that generated this "most intense" interaction. 5/27/2014 Importance of PROCESS is not less than PRODUCT 73 VRS

×