An Introduction to Chemoinformatics for the postgraduate students of Agriculture
Chemoinformatics and Applications in Agrochemical Discovery C. Devakumar and Rajesh Kumar Division of Agricultural Chemicals IARI, New Delhi-110012 [email_address]
Chemical Space Virtual Real Mode “ Easy” Difficult Access 10 60 (?) 0 Virtual 10 7 10 22 Existing Small Molecules Stars
Chemical Space: Small Molecules in Organic Chemistry <ul><li>Understanding chemical space </li></ul><ul><li>Small molecules: </li></ul><ul><ul><li>chemical synthesis </li></ul></ul><ul><ul><li>drug design </li></ul></ul><ul><ul><li>chemical genomics, </li></ul></ul><ul><ul><li>systems biology </li></ul></ul><ul><ul><li>nanotechnology </li></ul></ul><ul><ul><li>And others </li></ul></ul>
Cost to develop and time to market of various products
Registration of safer chemicals Proportion of pesticide active ingredients that are considered to be safer (biological chemicals and reduced-risk conventional chemicals) has steadily increased over the last several years. Source: EPA, 1999.
Plant biotechnology opens new markets / solutions
The development of the agrochemical in vivo screening
Overall Outline <ul><li>Introduction </li></ul><ul><li>Molecular Representations </li></ul><ul><li>Chemical Data and Databases </li></ul><ul><li>Molecular Similarity </li></ul><ul><li>Chemical Reactions </li></ul><ul><li>Machine Learning and Other Predictive Methods </li></ul><ul><li>Molecular Docking and Drug Discovery </li></ul>
What is Chemoinformatics? <ul><li>It encompasses the design, creation, organisation, management, retrieval, analysis, dissemination, visualization and use of chemical information </li></ul><ul><li>It is the mixing of information resources to transform data into information and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization </li></ul>
What is Chemoinformatics? <ul><li>“ the set of computer algorithms and tools to store and analyse chemical data in the context of drug discovery and design projects” </li></ul><ul><li>Chemoinformatics is the application of informatics methods to solve chemical problems </li></ul>
Resources <ul><li>Books: </li></ul><ul><li>J. Gasteiger, T. E. and Engel, T. (Editors) (2003). Chemoinformatics : A Textbook. Wiley. </li></ul><ul><li>A.R. Leach and V. J. Gillet (2005). An Introduction to Chemoinformatics. Springer. </li></ul><ul><li>Journal: </li></ul><ul><li>Journal of Chemical Information and Modeling </li></ul><ul><li>Web: </li></ul><ul><li>http://cdb.ics.uci.edu </li></ul><ul><li>and many more……… </li></ul>
History of Chemoinformatics The first, and still the core, journal for the subject, the Journal of Chemical Documentation , started in 1961 (the name Changed to the Journal of Chemical Information and computer Science in 1975) The first book appeared in 1971 (Lynch, Harrison, Town and Ash, Computer Handling of Chemical Structure Information) The first international conference on the subject was held in 1973 at Noordwijkerhout and every three years since 1987
<ul><li>Chemoinformatics encompasses the design, creation, organisation, management, retrieval, analysis, dissemination, visualization and use of chemical information </li></ul>Chemoinformatics…. ' Cheminformatics combines the scientific working fields of chemistry and computer science for example in the area of chemical graph theory and mining the chemical space . It is to be expected that the chemical space contains at least 10 62 molecules
Is it Cheminformatics or Chemoinformatics? <ul><li>Cheminformatics, molecular informatics, chemical </li></ul><ul><li>informatics, or even Chemo bioinformatics </li></ul>1.94 1,63,000+ 3,17,000+ 2006 2.41 2,72,096 6,58,298 2005 1.41 60,439 85,435 2204 1.77 32,872 58,143 2203 2.12 16,000 34,000 2002 2.75 2,910 8,010 2001 0.05 684 39 2000 Ratio Chemoinformatics Cheminformatics Year
Why We Need Chemoinformatics? 1) An enormous amount of data and maintenance of data 2) Can we gain enough knowledge from the known data to make predictions for those cases where the required information is not available? 3) Relationships between the structure of a compound and its biological activity, or for the influence of reaction conditions on chemical reactivity.
Advances in theoretical and computational chemistry now allow chemists to model chemical compounds “in silico” with ever-increasing accuracy. Molecular properties now becoming accessible through computation include molecular shape, electronic structure, physical properties, chemical reactivity, protein folding, structures of materials and surfaces, catalytic activity, and biochemical activities.
integrates a comprehensive knowledge of chemistry with an extensive understanding of information technology . The intersection of chemistry and information technology embraces an expanding territory ; computational modeling of individual molecules, thermodynamic methods of estimating chemical properties, methods of predicting biological activity of hypothetical compounds, and organization and classification of chemical information. Chemoinformatics
Schematic representation of a crowded cell. An array of different molecules can function independently under extremely crowded conditions, partly because of judicious distributions of oppositely charged polar groups on the molecular surfaces. However, such systems are in some ways extremely fragile. For example, a mutation that alters just one amino acid in the haemoglobin molecule can stimulate massive aggregation and give rise to a fatal genetic disease, sickle-cell anaemia. More generally, many disorders of old age, most famously Alzheimer’s disease, result from the increasingly facile conversion of normally soluble proteins into intractable deposits that occur particularly as we get older Many of these aggregation processes involve the reversion of the unique biologically active forms of polypeptide chains into a generic and non-functional ‘chemical’ form
Additional computational challenges lie in indexing and classifying the infinite population of chemical compounds that could be synthesized or are already known. Specific indexing and search problems include how to find a compound that might block a specific biological target; how to predict the most efficient synthetic strategy for a desired compound from available precursors; how to employ results of bioactivity tests on a family of molecules to design improved versions;
Currently combinatorial chemists are developing new methods of synthesizing libraries of related compounds on an unprecedented scale. Such libraries can be used to produce huge arrays of materials for investigation of biochemical, catalytic, or material properties. Systems are required to design, catalog, and search these libraries, assess test results in a meaningful way, and integrate new information with existing chemical databases. Investigations into information storage at the molecular level are underway, bringing to full circle the link between chemistry and information technology.
The Scope of Chemoinformatics Representations and Structure Searching Substructure Searching Similarity Searching, Clustering, and Diversity Analysis Searching Databases Computer-aided Structure Elucidation 3D Substructure Searching QSAR and Docking
Structure and applications of chemoinformatics Database design and programming Representation and searching of chemical structures Structure, substructure & similarity searching in 2D & 3D Markush and reaction searching Representation and searching of biological databases chemoinformatics software Data analysis techniques Clustering; Evolutionary algorithms; Graph theory; Neural networks; Chemical information sources Cheminformatics applications Techniques used to design bioactive compounds Molecular simulation and design Drug discovery process; QSAR; Combi-chem; SBDD Spectroscopy and crystallography in cheminformatics
<ul><ul><li>Kinds of chemistry databases </li></ul></ul><ul><li>• Small-molecule databases </li></ul><ul><li>– Databases of commercially-available compounds (e.g. ACD, http://www.mdl.com/products/experiment/available_chem_dir/index.jsp) </li></ul><ul><li>– Proprietary chemical structure databases </li></ul><ul><li>– Literature databases </li></ul><ul><li>– Patent databases </li></ul><ul><li>– Small project-specific databases </li></ul><ul><li>• Protein databases </li></ul><ul><li>– Public, online databases (e.g. PDB, http://www.pdb.org) </li></ul><ul><li>– Proprietary and project-specific databases </li></ul>
Accelrys -Large chemoinformatics company ACD/Labs - analytical informatics & predictions BCI - 2D fingerprinting, clustering toolkits & software Bioreason - HTS data analysis software Cambridgesoft - 2D drawing tools & E-notebooks CAS - produce Scifinder Scholar searching software ChemAxon - Java based toolkits and software Daylight- 2D representation & searching software Leadscope - 2D structure and property tools Lion Bioscience - produce LeadNavigator MDL - Large chemoinformatics company Openeye - Fast 3D docking, structure generation, toolkits Quantum Pharmaceuticals - prediction, docking, screening Sage Informatics - ChemTK 2D analysis software Tripos -Large chemoinformatics company Software Companies
Journal of Chemical Information and Computer Sciences Journal of Computer-Aided Molecular Design Journal of Molecular Graphics and Modelling Journal of Medicinal Chemistry NetSci (online journal) Scientific Computing World Bio-IT World Drug Discovery Today Journals & Magazines Newsletters, Mailing Lists & Other Hubs Chemical Informatics Letters- Monthly newsletter CHMINF-L (Indiana)- Email discussion list Chemoinf Yahoo Group -Email discussion list Chemistry Software Yahoo Group Cheminformatics.org Lots of links and QSAR datasets Reactive Reports Chemistry Web Magazine
SMILES (Simplified Molecular Input Line Entry Specification) Acetaminophen Aliphatic- Capital Aromatic-Small Ring-By giving no. Double bonds- “=” sign Parentheses-branching in the molecule Sources of 3D structures information <ul><li>X-ray crystallography </li></ul><ul><li>NMR spectroscopy </li></ul>1 c1c(O)ccc(NC(=O)C)c1 SMILES Representation
<ul><ul><li>DRAWING AND DEPICTING 2D STRUCTURES </li></ul></ul>Web-based drawing tools JME (http://www.molinspiration.com/cgi-bin/properties) is a clean, simple Java drawing tool. Draw your structure and click on the smiley face to show the SMILES. Marvin Sketch is a Java applet that allows you to draw structures, and export them as SMILES, MDL MOL files or others. Web-based depiction tools Daylight Depiction Tool (http://www.daylight.com/daycgi/depict) is a very simple to use tool that allows you to enter a SMILES string and will then produce a 2D structure diagram from it. CACTVS GIF generator has a more complex interface, but allows many more options for producing GIF picture files of SMILES or other format structures. The quality of the images is superior to the daylight tool. MDL Chime (http://www.mdlchime.com) is a browser-based plugin that can display both 2D and interactive 3D structures in web pages.
Concord from Tripos, Inc. One of the first 3D structure generation programs, and is still being refined and developed. It generates single, minimal-energy structures from input 2D structures. The program can input and output a variety of file formats. http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/concord.html Corina from the Gasteiger group. It is similar to Concord. http://www2.chemie.uni-erlangen.de/software/corina/free_struct.html Omega from OpenEye is the latest release. It offers very fast generation of multiple low-energy conformers. http://www.eyesopen.com/products/applications/omega.html 3D Structure generation and minimization
<ul><ul><li>Depiction Tools for 3D structures </li></ul></ul>MDL Chime is a web browser plug-in that allows 2D and 3D structures to be viewed in web pages. It can be used to visualize both proteins and small molecules, and includes some limited ability to create molecular surfaces. It is excellent for communicating structures via the web and for use in writing web-based chemoinformatics software. http://www.mdlchime.com ArgusLab is a free molecular modeling program that has a fairly extensive set of options for 3D visualization, calculation of surfaces and properties, minimization, and molecular docking. http://www.arguslab.com.
Data Analysis Methods <ul><li>Unsupervised-artificial neural networks, genetic algorithms </li></ul><ul><li>Supervised - inductive learning methods statistics, pattern recognition methods </li></ul><ul><li>Quantum mechanical calculations </li></ul><ul><li>Additive schemes </li></ul>Methods for Calculating Physical and Chemical Data
Chemistry Based Data Mining And Exploration Chemical(s) of concern Chemical Specific data Structural analogue Property analogue Biological or mechanistic analogue Data bases Data mining Structure searchable Structure activity relationships
<ul><li>Quantitative analysis of chemical data relied exclusively on </li></ul><ul><li>Multilinear regression analysis. </li></ul><ul><li>Artificial neural networks </li></ul>Chemometrics An artificial neural network (ANN) or commonly just neural network (NN) is an interconnected group of artificial neurons that uses a mathematical model or computational model for information processing based on a connectionist approach to computation. Input Hidden Hidden Output
Computer-Assisted Structure Elucidation (CASE) <ul><li>A field of exercise for artificial intelligence techniques. </li></ul>Computer-Assisted Synthesis Design (CASD) In 1969 Corey and Wipke worked for the development of a synthesis design system. <ul><li>The DENDRAL project, initiated in 1964 at Stanford University </li></ul><ul><li>Substructure searching </li></ul><ul><li>Similarity searching </li></ul>
Applications of Chemoinformatics <ul><li>1. Chemical Information </li></ul><ul><ul><li>Storage and retrieval of chemical structures and associated </li></ul></ul><ul><ul><li> data to manage the flood of data </li></ul></ul><ul><ul><li>Dissemination of data on the internet </li></ul></ul><ul><ul><li>Cross-linking of data to information </li></ul></ul><ul><li>2. All fields of chemistry </li></ul><ul><ul><li>Prediction of the physical, chemical, or biological properties </li></ul></ul><ul><ul><li>of compounds </li></ul></ul>
<ul><ul><li>identification of new lead structures </li></ul></ul><ul><ul><li>optimization of lead structures </li></ul></ul><ul><ul><li>establishment of quantitative structure-activity relationships </li></ul></ul><ul><ul><li>comparison of chemical libraries </li></ul></ul><ul><ul><li>definition and analysis of structural diversity </li></ul></ul><ul><ul><li>planning of chemical libraries </li></ul></ul>3. Bioactive molecules
<ul><ul><li>analysis of high-throughput data </li></ul></ul><ul><ul><li>docking of a ligand into a receptor </li></ul></ul><ul><ul><li>prediction of the metabolism of xenobiotics </li></ul></ul><ul><ul><li>analysis of biochemical pathways </li></ul></ul>Contd…… <ul><li>Prediction of the course and products of organic reactions </li></ul><ul><li>Design of organic synthesis </li></ul>4. Organic Chemistry
<ul><ul><li>Analysis of data from analytical chemistry to make predictions on the quality, origin, and age of the investigated objects </li></ul></ul><ul><ul><li>Elucidation of the structure of a compound based on spectroscopic data </li></ul></ul>5. Analytical Chemistry Teaching Chemoinformatics <ul><li>Chemists have to become more efficient in planning their </li></ul><ul><li>experiments, have to extract more knowledge from their </li></ul><ul><li>data </li></ul>
Toxicity Prediction for chemical Q Global toxicity model Supporting information Toxicity prediction Hypothesis generation Analogue search Chemical class assignment Class based SAR model Weight of evidence of toxicity presentation Data collection Q
University of Barcelona, Spain University of Erlangen-Nürnberg, Germany Bioinformatics Institute Of India , Chandigarh Georgia Institute of Technology University of Sheffield (Willett) - MSc/PhD programs University of Erlangen (Gasteiger) UCSF (Kuntz) University of Texas (Pearlman) Yale (Jorgensen) University of Michigan (Crippen) Indiana University (Wiggins) - MSc program Cambridge Unilever (Glen, Goodman, Murray-Rust) Scripps - Molecular Graphics lab Institutes are Offering Courses on Chemoinformatics
SAR Application DRUG DESIGN ENVIRONMENTAL PROTECTION Maximum activity Prediction of toxicity Minimize toxicity <ul><li>Single therapeutic target </li></ul><ul><li>Drug like chemical </li></ul><ul><li>Some toxicity anticipated </li></ul><ul><li>Multiple unknown targets </li></ul><ul><li>Diverse Structures </li></ul><ul><li>Human and ecosystems </li></ul>
DESIGN OF INSECTICIDE SYNERGISTS FURAPIOLE ANALOGUES log S F = 0.319 R M + 0.445σ R + 0.248B 1 + 0.034B4 - 0.966 n s r F 14 0.057 0.950 21.04
DESIGN OF INSECTICIDE SYNERGISTS SESAMOL ETHERS log S F = 0.153D 2 + 0.240D 1 - 1.711 σ I - 0.429R M + 0.070L - 0.384 n s r F 29 0.087 0.938 33.72
DILLAPIOLE SIDE CHAIN ANALOGUES log S F = n s r F 0.467 - 0.105 D - 1.537 R M 2 - 0.980σ R 17 0.046 0.948 38.84 0.305 - 0.1 I0 D - 1. I 14 R M 2 - 1.626 σ R + 0.012 B 4 17 0.045 0.955 31.37 0.071- 0.120 D - 0.619 R 2 M - 2.066 σ R + 0.080 B 4 - 0.003 L 2 17 0.045 0.958 24.86 0.053-0.134D - 0.216 R 2 M – 1.290 σ R + 0.135B 4 + 0.006L 2 - 0.67 σ I 17 0.046 0.961 20.30
Unique research platform – Network of complementary technologies to meet the challenges in compound discovery
Discovery of the target proteins of novel fungicides
De novo target discovery by functional genomics and the steps aiming to develop and perform high throughput biochemical tests
Gene expression profiling, a revolutionary tool in herbicide discovery Gene Expression Profiling (GEP) with DNA microarrays (chips) is a new technology used to measure changes in the entire transcriptome, i.e. full complement of active genes, of an organism in a single experiment. A catalogue of genetic fingerprints of the plant Arabidopsis thaliana , is created and each fingerprint being characteristic for a single herbicidal MoA is then used to rapidly classify herbicidal compounds from UHTVS according to their MoA. Helps to identify the affected metabolic pathway and the MoA of pro-drugs, which cannot be elucidated by conventional biochemical methods. GEP provides insight into the interactions of any herbicidal compound with the entire plant metabolism with unprecedented accuracy and completeness.