Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

3,147 views

Published on

Presented at 5th Bioinformatics Conf in Univ of Tehran May 2014

Published in:
Technology

No Downloads

Total views

3,147

On SlideShare

0

From Embeds

0

Number of Embeds

25

Shares

0

Downloads

148

Comments

0

Likes

7

No embeds

No notes for slide

- 1. Importance of PROCESS is not less than PRODUCT5/27/2014 1
- 2. Computer Aided Drug Design: QSAR Related Methods Jahan B Ghasemi DDSLab K N Toosi Univ of Tech. Tehran, Iran
- 3. 5/27/2014 Importance of PROCESS is not less than PRODUCT Topics in this Talk are: General Introduction Some of These QSAR Steps: 3 Data Pre-Processing Normalization Standardization Variable Selection Subset Selection Outlier Detection Multivariate Analysis MLR PCA PLS SVM ANN CART Molecular Descriptors Constitutional Electronic Geometrical Hydrophobic Lipophilicity Solubility Steric Quantum Chemical Topological Molecular Structures OC1=CC=CC=C1 1D 2D 3D Statistical Evaluation R R2 Q2 MSE RMSE PRESS
- 4. Importance of PROCESS is not less than PRODUCT "Well begun is half done“ Aristotle Renes Descartes in 1619 Quantitative Measurement in Science Research Types Inductive Approach Deductive Approach Abductive Approach 5/27/2014 4 General Introduction
- 5. Importance of PROCESS is not less than PRODUCT Theory Hypothesis Confirmation Observation Theory Hypothesis Observation Pattern Induction is usually described as moving from the specific to the general, while deduction begins with the general and ends with the specific. Arguments based on laws, rules and accepted principles are generally used for Deductive Reasoning. Observations tend to be used for Inductive Arguments. 5/27/2014 -Metrics as soft-computing or soft-modeling are Inductive Research Approaches. Uncertainty Are humans natural logic reasoners? No!!! 5
- 6. 5/27/2014 Importance of PROCESS is not less than PRODUCT What Do We Need to Know in a Successful QSAR Modeling as a Drug Design Tool? 6
- 7. I- Math-Science or Informatique or Informatics Aspect Linear Algebra Vectors, Matrices, Tensors… Homogenous and regular linear and nonlinear simultaneous equations Graph Theory Maximal Subgraph Clique Detection Multivariate Statistical Analysis Column Space, Row SpacePattern Recognition (Dis)Similarity Distance Metrics, Euclidean, Manhattan, Mahalanobis Fingerprints, Tanimoto, Jaccard Supervised and Unsupervised Pattern Recognition Clustering, Agglomerative(bottom up), Divisive(top down) MLR, PCA, PLS Optimization Selection of the most informative variables, GA Selection of the most representative objects, KS Function minimization, Newton, Gauss-Newton, Marquradt-Levenberg Computer Computer Graphic HPC 5/27/2014 Importance of PROCESS is not less than PRODUCT 7
- 8. 5/27/2014 Importance of PROCESS is not less than PRODUCT II-Bio-Science Aspect Chemistry Organic Chemistry Quantum/Molecular Mechanics Forcefield, Conformer, Bioactive Conformer Medicinal Chemistry Biology Molecular Biology Systems Biology Pharmacology Pharmacokinetics Pharmacodynamics Toxicity ADMET 8
- 9. Combination of I and II OMICS Bioinformatics Proteomics Metabolomics Genomics Metrics Biometrics Chemometrics Technometrics Chem(o)informatics 5/27/2014 Importance of PROCESS is not less than PRODUCT 9 QSAR is related to the most of – OMICS and – METRICS routines
- 10. Bio- Science Part Start Here: 5/27/2014 Importance of PROCESS is not less than PRODUCT 10
- 11. Chemical Space (Gathering Information from All Involved Species) Aggregation Host-Guest Complex Receptor- Inhibitor Complex Macromolecules Protein Receptor Host Small Molecules Guest Ligand Inhibitor 5/27/2014 Importance of PROCESS is not less than PRODUCT 11
- 12. Chemical Space Chemical Information Information due to Macromolecule Structure Information due to Aggregation Structure Information Due to Small Molecule Structure 5/27/2014 Importance of PROCESS is not less than PRODUCT 12
- 13. To have and use Chemical Space: Extract and Convert Chemical Information to Numerical Values We Are Calling These Numerical Values: Molecular Descriptors 5/27/2014 Importance of PROCESS is not less than PRODUCT 13
- 14. Descriptors should be associated with the following desirable features: Easy Interpretation Show Correlation with a Property Discrimination of Isomers Independence Simplicity Not to be based on properties Not to be trivially related to other descriptors Allow for efficient construction Use familiar structural concepts Show gradual change with gradual change in structures 5/27/2014Importance of PROCESS is not less than PRODUCT
- 15. End Points to Be Modeled Chemical properties Boiling point Retention time Dielectric constant Diffusion coefficient Dissociation constant Melting point Reactivity Solubility Stability Thermodynamic properties Viscosity 5/27/2014Importance of PROCESS is not less than PRODUCT
- 16. End Points to Be Modeled Biological Properties Bioconcentration Biodegradation Carcinogenicity Drug metabolism and clearance Inhibition constant Mutagenicity Permeability Blood brain barrier Skin Pharmacokinetics Receptor binding 5/27/2014Importance of PROCESS is not less than PRODUCT
- 17. There are more than 5500 Mol. Des. BUT! Why do we need more and more Molecular Descriptors? Each molecular descriptor takes into account a small part of the whole chemical information contained into the real molecule and, as a consequence, the number of descriptors is continuously increasing with the increasing request of deeper investigations on chemical and biological systems. Different descriptors have independent methods or perspectives to view a molecule, taking into account the various features of chemical structure. Molecular descriptors have now become some of the most important variables used in molecular modeling, and, consequently, managed by statistics, chemometrics, and chemoinformatics. 5/27/2014 Importance of PROCESS is not less than PRODUCT 17
- 18. Molecular Descriptors Cost to Generate: Cheap Expensive 5/27/2014 Importance of PROCESS is not less than PRODUCT 18
- 19. Molecular Descriptors How to Calculate Molecular Descriptors? By Hand! By Software Dragon SYBYL PaDEL- Descriptor AdrianaCode 5/27/2014 Importance of PROCESS is not less than PRODUCT 19
- 20. Molecular Descriptors Classes! Different Classes? Yes How many? Many classes What are the bases of Classification? Based of Dimensionality 0D-4D Geometric Constitutional Topological Quantum Chemical etc…. Based of Origin Theoretical Experimental Both! 5/27/2014 Importance of PROCESS is not less than PRODUCT 20
- 21. Molecular Descriptors Do they have equal importance? 0D<1D<2D<2.5D<3D<4D…<nD Low Information Content High Information Content 5/27/2014 Importance of PROCESS is not less than PRODUCT 21
- 22. Now We Have Molecular Descriptors and Chemical, Molecular or Information Space But first define and introduce: Objects= Molecules Variables= Descriptors Object to Variable ratio ≥ 4 Why? Least-Squares Need it! 5/27/2014 Importance of PROCESS is not less than PRODUCT 22
- 23. 5/27/2014 Importance of PROCESS is not less than PRODUCT 23 Math-Science Part Start Here: Using a Very Efficient Way to Show Chemical Information: Matrix-Vector
- 24. Objects as rows Variables as Columns 1 2 3 . . . . . . . . . . n 1 2 3 . . . . . . . . . m Objects as rows 1 2 3 . . . . . . . . . . n
- 25. Preprocessing On End Point Vector y nM unit log Transformation To Linearized the Variation To Have LFER InterpretationMean Centering Autoscaling On Molecular Descriptors Matrix X Mean Centering- Has its general purpose Autoscaling Has its general purpose Outlier Detection AD Dimensionality Reduction PCA 5/27/2014 Importance of PROCESS is not less than PRODUCT 25
- 26. Geometrical Interpretation of Information Matrix Spaces Row Space Column Space: Object Map Metrics Distances Euclidean and…. Classes Clusters Groups 5/27/2014 Importance of PROCESS is not less than PRODUCT 26
- 27. Row Space! Is it informative? How? What does it mean? How can we use it? On O1 O2 Each Point is a Vector! m-dimensional space Sm n- points pattern Pn Importance of PROCESS is not less than PRODUCT5/27/2014 27
- 28. Column Space Objects Map Scientists(Chemists, Biologists..) are interest in!!! Is it informative? How? What does it mean? How can we use it? Vn V1 V2 Class I or Group I Class II or Group II Each Point is a Vector! n-dimensional space Sn m- points pattern Pm Importance of PROCESS is not less than PRODUCT5/27/2014 28
- 29. QSAR Model Building Based on Molecular Geometry 2D-QSAR 2.5D-QSAR 3D-QSAR 5/27/2014 Importance of PROCESS is not less than PRODUCT 29
- 30. QSAR Model Building Type of Mapping Function A Crucial Decision Linear MLR kNN PLS Nonlinear ANN SVM Linear+Non- Linear DT + other Tree and Ensemble Methods 5/27/2014 Importance of PROCESS is not less than PRODUCT 30
- 31. QSAR Model Building Object Selection-Data Splitting-Train-Test Sets To have Good 1- Representative and 2- Diversity y-Based Method Randomly Evenly X-Based Methods Random Selection kNN Selection Similarity Principle KS,SOM, LMD, Duplex, MDC 5/27/2014 Importance of PROCESS is not less than PRODUCT 31
- 32. QSAR Model Building Variable Selection Filters (Subjective) Uninformative Variable Elimination (UVE) Correlation Ranking (CR) Wrappers (Objective) GA-PLS Embedded (Selection+Mapping Integrated) Stepwise Selection RM, ERM, FFD 5/27/2014 Importance of PROCESS is not less than PRODUCT 32
- 33. QSAR Model Building Model Validation- There are different Criteria in the Literatures Residual Analysis Analysis of Varaince Applicability Domain Residual Leverage Good Leverage Bad Leverage Q_Residual T2 _Hotelling Model Precision(Confidence Intervals of Model Parameters) Bootstrap Resampling Jackknife Resampling Model Accuracy(Predic tion Error) Internal Validation Cross Validation Leave One Out Leave Many Out Scrambling X- randomization y-randomization External Validation External and Fully Unseen or Independent Data Set 5/27/2014 Importance of PROCESS is not less than PRODUCT Final word on Validation: The external Independent Unseen Data Set Is Mandatory for a Successful QSAR Model: Do you know why? Local-X-Global or Induction Research has Uncertainty 33
- 34. Purposes OF QSAR: Rational Identification of New Leads with: Pharmacological, Biocidal or Pesticidal Activity. Optimization of New Leads with: Pharmacological, Biocidal or Pesticidal Activity. The Rational Design of: Surface-active agents, Perfumes, Dyes, and Fine Chemicals. 5/27/2014Importance of PROCESS is not less than PRODUCT
- 35. Purposes OF QSAR: The Selection of Compounds with Optimal Pharmacokinetic Properties. The Prediction of a variety of Physico- chemical Properties of Molecules. The Prediction of the Fate of Molecules. The Rationalization and Prediction of the Combined Effects of Molecules. 5/27/2014Importance of PROCESS is not less than PRODUCT
- 36. Purposes OF QSAR: The Identification of Hazardous Compounds at Early Stages. The Designing out of Toxicity and Side-Effects in New Compounds. The Prediction of Toxicity of Compounds to Humans. The Prediction of Toxicity to Environmental Species. 5/27/2014Importance of PROCESS is not less than PRODUCT
- 37. Original Data Set Curated Dataset Split into training, test and external validation set Multiple Training Sets Y-Randomization Combi-QSAR modeling Multiple Test Sets Activity Prediction Only Retain Models that pass both internal and external accuracy filters Validated Predictive models with High Internal and External Accuracy External Validation using Applicability Domain Virtual Screening Using Applicability Domain Experimental Validation The Most Rigorous and Currently Accepted QSAR Methodology 5/27/2014Importance of PROCESS is not less than PRODUCT
- 38. 5/27/2014 Importance of PROCESS is not less than PRODUCT ASmallQuestion!!! Why is QSAR alive in spite of the existence of very strong rivals like Docking, MDs, Pharmacophore, SB and LB methods? Modeling and taking into account all pharmacological phenomena is: Nearly or totally impossible even in high level and advanced research laboratories. 38
- 39. 5/27/2014 Importance of PROCESS is not less than PRODUCT 39 Thank You All!
- 40. 1 2 a d c b Which one would be the third point? a, b, c or d? 1 and 2 have the largest distance. They are firstly selected. Then distance between of all unselected points and all selected points calculated. Calculate distances 1a and 2a then min(1a,2a)= 2a. Calculate distances 1b and 2b then min(1b,2b)= 2b. Calculate distances 1c and 2c then min(1c,2c)= 1c. Calculate distances 1d and 2d then min(1d,2d)= 1d. Max(min(1a,2a),min(1b,2b),min(1c,2c),min(1d,2d))=1d Then the point d is selected as the Third Point and so on… 1a 2a 1b 2b 1c 2c1d 2d KSA Graphical Algorithm 5/27/2014 40Importance of PROCESS is not less than PRODUCT
- 41. 5/27/2014 Importance of PROCESS is not less than PRODUCT Applicability Domain 41
- 42. Q Residuals and Hotelling T2 5/27/2014 Importance of PROCESS is not less than PRODUCT 42
- 43. 5/27/2014 Importance of PROCESS is not less than PRODUCT 44
- 44. 5/27/2014 Importance of PROCESS is not less than PRODUCT -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 2000 4000 6000 8000 10000 12000 1 2 3 4 5 6 7 8 9 10 11 Original Data log Values 45
- 45. Activity Descr 1 Descr 2 … Descr m Y1 X11 X12 … X1m Y2 X21 X22 … X2m … … … … … Yn Xn1 Xn2 … Xnm Yi = a0 + a1 Xi1 + a2 Xi2 +…+ am Xim Don’t consider the nonlinearity effects Multiple Linear Regression (MLR) 465/27/2014 Importance of PROCESS is not less than PRODUCT
- 46. nnn FqtqtqtY 2211 • t latent variables or scores • q loading vectors Partial Least Square (PLS) Robust with respect to collinear descriptors Only one model optimization parameter (LV’s ) Fast computational 47
- 47. 48 Works on Similarity Principle A compound in space close to, its kNN compounds from the training set and predicts the activity class that is most highly represented among these neighbors. The k-NN scheme is sensitive: 1- Distance Metric 2- Number of training compounds 3- k can be optimized to yield best results. 5/27/2014 Importance of PROCESS is not less than PRODUCT The k-Nearest Neighbor Method kNN
- 48. Artificial Neural Network (ANN) 495/27/2014 Importance of PROCESS is not less than PRODUCT DescriptorsorOriginalSpace NonlinearorHiddenSpace PropertiesBeingPredicted
- 49. otherwise if 0 :Only the points outside the ε-tube are penalized in a linear fashion ε-Insensitive Loss Function Support Vector Regression (SVR) Support Vector Classification (SVC) 505/27/2014 Importance of PROCESS is not less than PRODUCT
- 50. Non-linear SVMs Datasets that are linearly separable with some noise work out great: But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional space: 0 x 0 x 0 x x2 5/27/2014 Importance of PROCESS is not less than PRODUCT 51
- 51. Non-linear SVMs: Feature spaces General idea: the original input space can always be mapped to some higher- dimensional feature space where the training set is separable: Φ: x → φ(x) 5/27/2014 Importance of PROCESS is not less than PRODUCT 52
- 52. Decision Trees as a Greedy Algorithm: CART: Classification and regression Tree Binary recursive partitioning tree Best First Left Right Up down Here the Variable to classify Audience! Here the First Variable is “Biologist or Not”? Why? We are in Bio-Dept. 535/27/2014 Importance of PROCESS is not less than PRODUCT
- 53. 3D-QSAR Notes Advantages over 2D-QSAR No reliance on experimental values Can be applied to molecules with unusual substituents Not restricted to molecules of the same structural class in (Pharmacophre 3D-QSAR case) Predictive capability 5/27/2014 Importance of PROCESS is not less than PRODUCT 54 No experimental constants or measurements are involved Properties are known as ‘Fields’ Steric field - defines the size and shape of the molecule Electrostatic field - defines electron rich/poor regions of molecule
- 54. 3D-QSAR Comparative molecular field analysis (CoMFA) - Tripos Build each molecule using modelling software Identify the active conformation for each molecule Identify the pharmacophore Method NHCH3 OH HO HO Active conformation Build 3D model Define pharmacophore 5/27/2014 Importance of PROCESS is not less than PRODUCT 55
- 55. 3D-QSAR Method NHCH3 OH HO HO Active conformation Build 3D model Define pharmacophore 5/27/2014 Importance of PROCESS is not less than PRODUCT 56 Comparative molecular field analysis (CoMFA) - Tripos Build each molecule using modelling software Identify the active conformation for each molecule Identify the pharmacophore
- 56. 3D-QSAR •Place the pharmacophore into a lattice of grid points Method •Each grid point defines a point in space Grid points . . . . . 5/27/2014 Importance of PROCESS is not less than PRODUCT 57
- 57. 3D-QSAR Method •Each grid point defines a point in space Grid points . . . . . •Position molecule to match the pharmacophore 5/27/2014 Importance of PROCESS is not less than PRODUCT 58
- 58. 3D-QSAR •A probe atom is placed at each grid point in turn Method •Probe atom = a proton or sp3 hybridised carbocation . . . . . Probe atom 5/27/2014 Importance of PROCESS is not less than PRODUCT 59
- 59. 3D-QSAR •A probe atom is placed at each grid point in turn Method •Measure the steric or electrostatic interaction of the probe atom with the molecule at each grid point . . . . . Probe atom 5/27/2014 Importance of PROCESS is not less than PRODUCT 60
- 60. 3D-QSAR Method Compound Biological Steric fields (S) Electrostatic fields (E) activity at grid points (001-998) at grid points (001-098) S001 S002 S003 S004 S005 etc E001 E002 E003 E004 E005 etc 1 5.1 2 6.8 3 5.3 4 6.4 5 6.1 Tabulate fields for each compound at each grid point Partial least squares analysis (PLS) QSAR equation Activity = aS001 + bS002 +……..mS998 + nE001 +…….+yE998 + z . . . . . 5/27/2014 Importance of PROCESS is not less than PRODUCT 62
- 61. 3D-QSAR •Define fields using contour maps round a representative molecule Method 5/27/2014 Importance of PROCESS is not less than PRODUCT 63
- 62. A procedure based on the information included in the MIF generating a handful of informative variables, independent of the location of the molecules within the grid Two main steps of the procedure of transformation: Field filtering Maximum auto-cross correlation(MACC2) encoding. 2 means distance between two points in the space. 2.5D-QSAR or GRIND methodology 5/27/2014 Importance of PROCESS is not less than PRODUCT 64
- 63. MACC2 transform The MACC transform has maximum value of the products of the two i and j field values, found at each different rij distance. Here the colors represent the activity of the compounds (blue inactive, red active) 33 means the energy products produced by two N1 probes 8 means the 8th variable of auto- correlogram 33 5/27/2014 Importance of PROCESS is not less than PRODUCT 65
- 64. GRID interaction fields calculated using the N1 probe: positive (yellow) interactions describe unfavorable and negative (blue) interactions describe favorable interactions they should have low energy values (representing highly favorable interactions) they should be as far as possible one from each other. 5/27/2014 Importance of PROCESS is not less than PRODUCT 66
- 65. 5/27/2014 Importance of PROCESS is not less than PRODUCT 67
- 66. Each number are corresponds to a specific distance of the fields 5/27/2014 Importance of PROCESS is not less than PRODUCT 68
- 67. 5/27/2014 Importance of PROCESS is not less than PRODUCT 69
- 68. 5/27/2014 Importance of PROCESS is not less than PRODUCT 70
- 69. 5/27/2014 Importance of PROCESS is not less than PRODUCT 71
- 70. 5/27/2014 Importance of PROCESS is not less than PRODUCT 72
- 71. One of the unique features of the MACC transform is that it is possible to trace back the variables that generated this "most intense" interaction. 5/27/2014 Importance of PROCESS is not less than PRODUCT 73 VRS

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment