Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

II-PIC 2017: Drug Discovery of Novel Molecules using Chemical Data Mining tool

692 views

Published on

Muthukumarasamy Karthikeyan (CSIR-National Chemical Laboratory, India)
Surojit Sadhu (Advent Informatics, India)
Virtual screening (VS) and chemical data extracted from evidence based sources are the backbone of computational drug discovery workflow, an indispensable component in all drug design programs. It involves a host of modelling techniques from simple similarity search methods to advanced algorithms for finding the accurate bioactive conformation of a molecule to bind to its corresponding target. Chemoinformatics supports virtual screening at multiple levels during the lead optimization stage by suggesting suitable filters for numerous screenings by utilizing the power of data integration from multiple sources and derived knowledge that is essential for decision support in drug discovery and development. It is therefore pertinent to develop tools, data and emerging methods in chemoinformatics to fully understand their role and applications in virtual screening. Recently we have developed chemical informatics tools to assist drug discovery by chemical data extraction from literature, virtual library design, analysis and screening methods on selected case studies.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

II-PIC 2017: Drug Discovery of Novel Molecules using Chemical Data Mining tool

  1. 1. Muthukumarasamy Karthikeyan, Ph.D CSIR-National Chemical Laboratory. Chem. Engg. and Proc. Devp. Div. & Digital Inf. Res. Centre, Pune – 411 008, INDIA November 2-3, 2017 – The International Indian Patent Information Conference, Bangalore, India Drug Discovery of Novel Molecules using Chemical Data Mining tool
  2. 2. Outline • Chemoinformatics methods to harvest chemical data from textual / images of published literature (Journal, Patents and other scientific reports) • 50 + years of Data, Tools and Methods • 30+ million publications (index by PubMed, CAS etc) • 10 million patents (Source for Chemical Data-mining for Drug Discovery) • Drug Discovery & Pharmaceuticals (NCEs, Side Effects & Risk , Toxicity Assessment)
  3. 3. Chemoinformatics for Drug Design • Identification of new lead structures • Optimization of lead structures • Establishment of quantitative structure-activity relationships • Comparison of chemical libraries • Definition and analysis of structural diversity • Planning of chemical libraries • Analysis of high-throughput data • Docking of a ligand into a receptor • de novo design of ligands • Modeling of ADME-Tox properties • Prediction of the metabolism of xenobiotics • Analysis of biochemical pathways
  4. 4. Powerful Molecular Mining Tools • So far chemistry has produced an enormous amount of data and is rapidly increasing. • More than 70+ million chemical compounds are known and this number is increasing by several millions each year.
  5. 5. Chemical Structure Representation & Challenges • Machine readable chemical structure representations • Building connection tables that represent molecules • Building databases of chemical structures and reactions. • Challenges: Search (Exact, Similar, Substructure, Superstructure) • Markush Representation of molecules for patent applications • Mapping the chemical-biological space
  6. 6. Chemical Mining Relevant Data for Drug Discovery SPC : Structure Property Correlation INTRINSIC PROPERTIES MolarVolume Connectivity Indices Charge Distribution MolecularWeight Polar surfaceArea CHEMICAL PROPERTIES pKa Log P Solubility Stability BIOLOGICAL PROPERTIES Activity Toxicity Biotransformation Pharmacokinetics Others Synthesis & Processes Spectral / Characterization
  7. 7. Molecular Descriptors (2D & 3D) • Physical, Chemical, or Biological properties cannot be directly calculated from the structure of a compound. • Representing the structure of the compound by structure descriptors or ‘Features’, and, then, to establish a relationship (Models, Equations..) between the structure descriptors • A variety of structure descriptors has been developed encoding 1D, 2D, or 3D structure information ..(Practical Chemoinformatics)
  8. 8. Structure Property Relationships molecular structure property Structure Descriptors (Features) // representation model building
  9. 9. 11/4/2017 9 2-ethyl-1H-1,3-benzodiazole Sub-structure , Similar, Exact structure Search
  10. 10. Virtual Screening Efforts
  11. 11. Textmining and Datamining using Chemoinformatics  Text mining has every potential to transform the way we look at data.  It converts unstructured data to structured form  Helps detect patterns, hidden connections from chemistry specific data.  Automation can address data explosion problem by collecting and consolidating it..
  12. 12. Textmining Workflow Tokenization Stemming POS Tagging Named Entity Recognition  Remove non-scientific terms  STOPWORDS  PUNCTUATIONS  SENTENCE BOUNDARY  TOKEN DETECTION  Finding root of word  REDUCTION IN WORD SPACE DIMENSION  Best Part Of Speech is determined  Often CONTEXT DEPENDENT  Recognition of entities  e.g. PROTEIN, GENES, DRUGS  Methods: Statistical or Dictionary based
  13. 13. Case study on Chemical Data-mining Literature 66,000 articles Common Drugs Proteins Genes specific classes of terms Protein Gene Disease Drugs Binding DB PharmGKB CTD Co-occurence based Connections Corpus building 10100010100010101010 10100010100010101010 10100010100010101010 10100010100010101010 Document fingerprints using corpus Document clustering using ED and TC Chemical Text Mining
  14. 14. Scaffold space 800,000 scaffolds (Pubchem ~70 million)
  15. 15. Scaffold Identification
  16. 16. Frequent Hitters
  17. 17. Scaffold Space Analysis (Opportunities in IP)
  18. 18. Lead Identification (Computer vs Medicinal Chemist) Opportunities in IP Scaffold Space
  19. 19. Scaffold Mining (Textmining Patents and Publications)
  20. 20. Framework of Active DPP4 inhibitors
  21. 21. Traditional Design Modern Design ChemScreener : Virtual Screening
  22. 22. ChemScreener : Virtual Screening (Input-A: All molecules (Drugs, Leads, Any ~50 entries (structures) -> output: Scaffolds, FGs, Linkers -> Build (ph-model1) based on structural similarity. Input-B: Scaffolds, FGs, Linkers-> Output: VL ~150 m) QSAR-Binary : ML (Weka) / LR5 / Fingerprints (TPC) (filters) QSAR-Continuous: (Neural Network, PCA, PLS, PCR, kNN) Primary-Hits (2D->3D) (Descriptor Space) Pharmacophore Search (3D DB) (Shape & Charge HBA, HBD,..) Secondary-Hits (Ready for Docking on Validated Targets) Final Hits (Binding Energy) -> Molecular Dynamics Experimental (Synthesis (if new), Bioactivity Studies on selected targets) Feedback mechanism -> Improve predictive studies
  23. 23. Scaffold identification Scaffoldpedia Scaffolds Virtual Library Screening Filters Bioactive compound collection Ch 4 Tx 4 Ph 4 Pharmacophore Modeling Docking ChemScreener : Virtual Screening
  24. 24. 0 10 20 30 40 50 60 70 80 90 Antianginals Antidiabetics Anti- hypertensives Anti-psychotics Anti-virals Hypolipidemics Anti-Diarrhoeals Anti-mitotics Hypnotics immunosuppresa nts Noofscaffolds Scaffold distribution per THC Novel compound Discovery (Virtual Screening)
  25. 25. 0 5 10 15 20 25 30 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151 154 157 160 163 166 Pharmacophore FP (anti-diabetics)
  26. 26. 0 5 10 15 20 25 30 1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307 324 341 358 375 392 409 426 443 460 477 494 511 528 545 562 579 596 613 630 647 664 681 698 715 732 749 766 783 800 817 834 851 868 885 902 919 936 953 970 987 1004 Toxicophores FP (Anti-diabetics)
  27. 27. 0 5 10 15 20 25 30 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 Chemophores FP (Anti-diabetics)
  28. 28. Compute Cure for Cancer Initiative (BigData)
  29. 29. Textmining/Datamining (High performance Computing)
  30. 30. Text-mining 2 Network Analytics
  31. 31. • OpenBabel • CDK • RDKit • C++ • JAVA • Python • Perl • Ruby ISBN 978-81-322-1780-0 Open source Chemoinformatics Tools Chemoinformatics for Drug Discovery.-
  32. 32. • Open Source Tools, Techniques and Data in Chemoinformatics.- • Chemoinformatics Approach for the Design and Screening of focused virtual libraries.- • Machine Learning Methods in Chemoinformatics for Drug Discovery.- • Docking and pharmacophore modeling for virtual screening.- • Active site directed pose prediction programs for efficient filtering of molecules.- • Representation, fingerprinting and modeling of chemical reactions.- • Predictive methods for Organic Spectral data Simulation.- • Chemical Text mining for Lead Discovery.- Integration of Automated Work flow in Chemoinformatics for drug discovery.- • Cloud computing Infrastructure for Chemoinformatics. 36 Practical Cheminformatics (Do It Yourself)
  33. 33. 11/4/2017 M Karthikeyan, CSIR- 37
  34. 34. Acknowledgement 38 Director, CSIR-NCL-Pune Collaborator: Prof. Renu Vyas (ADT University) Conference Organizer & Sponsors
  35. 35. Questions? karthincl@gmail.com (send your queries) +91 020-2590 2483 / +91 9767427981 (whatsapp) skype: karthincl http://moltable.ncl.res.in/ (pl. visit the link for updates on recent publications and patents!)

×