SlideShare a Scribd company logo
1 of 80
@giribio
Girinath G. Pillai, PhD @giribio
Mining small molecules
for Drug Discovery
Girinath G. Pillai, PhD
@giribio
@giribio
Girinath G. Pillai, PhD @giribio
Note & Disclaimer
● We are not yet completely ready with AI/ML in Drug Discovery (it takes time
like Human Genome Project)
● Slides contains contents/pictures/videos taken from web, articles, lectures,
tutorials and its respective authors own their copyrights.
● No conflict of interest
Technical Slides : slideshare.net/giribio
Handson Videos : youtube.com/giribio (Autodock, Modeller, PaDel, QSAR, MD, etc)
Workflows & Notebooks : github.com/giribio
2
@giribio
Girinath G. Pillai, PhD @giribio
Agenda
What to expect?
➔ FDA & Drug Approvals
➔ Drug Discovery
➔ Chemical Representation
➔ Chemical Databases
➔ Herbal/Natural Compound Databases
3
@giribio
Girinath G. Pillai, PhD @giribio
FDA Approvals - Timeline
4
@giribio
Girinath G. Pillai, PhD @giribio
MW Trends
5
@giribio
Girinath G. Pillai, PhD @giribio
HBD Trends
6
@giribio
Girinath G. Pillai, PhD @giribio
HBA Trends
7
@giribio
Girinath G. Pillai, PhD @giribio
How to reduce high cost of Drugs?
8
@giribio
Girinath G. Pillai, PhD @giribio
9
1960’s to 1980’s
RA Prentis et al, Br. J. Clin. Pharm. 1988
T Kennedy Drug Discov. Today 1997
Cause of Failure - Oral Drugs (Early)
@giribio
Girinath G. Pillai, PhD @giribio
10
Estimation of ADME at early stages
Generate models to predict ADME/PKs
2000’s to 2010’s
MJ Waring et al, Nat. Rev. Drug Discov. 2015
J Arrowsmith & P Miller Nat. Rev. Drug Discov. 2013
Cause of Failure - Oral Drugs (Late)
@giribio
Girinath G. Pillai, PhD @giribio
➔ Identify chemistries with an
● optimal balance of properties
➔ Quickly identify situations when
● such a balance is not possible
➔ Fail fast, fail cheap
➔ Only when confident
➔ Avoid missed opportunities
11
The Objectives of Drug Discovery
Multi-parameter optimisation
@giribio
Girinath G. Pillai, PhD @giribio
➔ Captopril Capoten®, Bristol Myers-Squibb
➔ Dorzolamide Trusopt®, Merck
➔ Zanamivir Relenza®, Gilead Sciences
➔ Aliskiren Tekturna®, Novartis
➔ Boceprevir Schering-Plough
➔ Nolatrexed dihydrochloride Thymitaq®, Agouron
➔ LY-517717 Lilly/Protherics
➔ Acetyl-CoA carboxylase Inhibitor Nimbus, US
➔ Rupintrivir AG7088, Agouron
➔ NVP-AUY922 Novartis
➔ Vemurafenib Plexxikon
➔ Venetoclax AbbVie, Genentech
➔ Erdafitinib Johnson & Johnson
➔ Verubecestat Merck 12
Nat Rev Drug Discov. 2016 Sep;15(9):605-19
Curr Top Med Chem. 2010;10(1):127-41
http://practicalfragments.blogspot.com/
Selected Success Stories of CADD
@giribio
Girinath G. Pillai, PhD @giribio
Representing Chemicals
➢ Trivial name, e.g. Baking Soda, Aspirin, Citric Acid, etc.
○ Identifies the compound, but gives no (or little) information about
what it consists of
➢ Chemical formula, e.g. C6
H12
O6
.
○ Specifies the type and quantity of the atoms in the compound, but
not its structure (i.e. how the atoms are connected by bonds)
➢ Systematic name, e.g. 1,2-dibromo-3-chloropropane.
○ Identifies the atoms present and how they are connected by bonds.
13
@giribio
Girinath G. Pillai, PhD @giribio
What are Small Molecules?
➢ A small molecule is defined as a low molecular weight organic
compound.
➢ Most drugs are small molecules to allow passage over cell
membranes and oral bioavailability.
➢ They are also able to bind to proteins and enzymes, thereby
altering function, which can lead to a therapeutic effect.
14
@giribio
Girinath G. Pillai, PhD @giribio
Digital Representations
➢ How do we communicate structural information between
humans and the computer?
○ – Line notations, e.g. Wiswesser Line Notation (and later SMILES)
➢ How do we represent the atoms and bonds in a molecule
internally in a computer?
○ – Atom lookup and connection tables
➢ Trivial name : proline
➢ Systematic names : pyrrolidine‐2‐carboxylic acid
15
@giribio
Girinath G. Pillai, PhD @giribio
Examples
Some common
small molecules:
Amino Acids
16
@giribio
Girinath G. Pillai, PhD @giribio
Molecular Representations
➢ 1D: SMILES Strings
➢ 2D: Graph of Bonds
➢ 2.5D: Surfaces
➢ 3D: Atomic Coordinates
➢ 4D: Temporal Evolution
17
@giribio
Girinath G. Pillai, PhD @giribio
Chemical - File Formats
➢ .mol
➢ .mol2
➢ .sd / sdf
➢ .pdb
➢ .skc/.cdx/.mrv/…
➢ .smi CCCC
18
@giribio
Girinath G. Pillai, PhD @giribio
19
Issues for ML:
● arbitrary size
● arbitrary order
Ideal features:
● general
● compact
● unique
● invariant *
● smooth
● fast
010110101010001011100100010001111110
ML methods need a computer-friendly way to input the atomistic system:
easy for us
easy for CPU
* invariants are determined by the physics of the quantity to predict from the descriptor!
Chemical Compounds for Machines
@giribio
Girinath G. Pillai, PhD @giribio
20
010110101010001011100100010001111110
ML methods need a computer-friendly way to input the atomistic system:
Global
Descriptor
110100011110000110010111111110
110100011110001011100001111110
010110101010001011100001111110
Local/Atomic
Descriptor
Descriptors for Chemistry
@giribio
Girinath G. Pillai, PhD @giribio
Database?
Database is an “organized collection of information.”
Information in a database can be in any format, including texts, numbers, images,
audios, videos, and many others (and combination of these)
Information must be “organized” for efficient retrieval.
21
@giribio
Girinath G. Pillai, PhD @giribio
Database Categories
Primary databases contain experimentally-derived data that are directly
submitted by researchers (also called “primary data”). In essence, these
databases serve as archives that keep original data. aka archival databases.
Secondary databases contain secondary data, which are derived from analyzing
and interpreting primary data. These databases often provide value-added
information related to the primary data, by using information from other
databases and scientific literature.
Essentially, secondary databases serve as reference libraries for the scientific
community, providing highly curated reviews about primary data. aka curated
databases, or knowledgebase.
22
@giribio
Girinath G. Pillai, PhD @giribio
Data Provenance
“data provenance” refers to a record trail that describes the origin or source of a
piece of data and the process by which it entered in a database.
Simply put, data provenance deals with the questions
“where the data came from” and
“how and why the data is in its present place”.
Although the data provenance information is critical in the reliability of a data
source (and its data), this information is not easy to manage
23
@giribio
Girinath G. Pillai, PhD @giribio
Small Molecule Databases
24
Collection of molecules
reported, published and
proposed
Chemical DBs
Collection of approved
drugs with its
pharmacological and
target details
FDA Drugs
Collection of
compounds/extracts
from plants, animals, etc
Natural Products
Collection of
purchasable as well as
on-demand synthesis
molecules
Commercial DBs
@giribio
Girinath G. Pillai, PhD @giribio
Approved Drug Database
➢ DrugBank: comprehensive information on drug molecules
○ Combined detailed drug data (ie. chemical, pharmacological and pharmaceutical) data with
comprehensive drug target (ie. sequence, structure, and pathway) information Allows
searching for similar compounds
➢ SuperDrug : 2500 3D-structures of active ingredients of essential marketed
drugs.
25
@giribio
Girinath G. Pillai, PhD @giribio
Chemical Database
➢ PubChem: chemical information repository at the U.S. NIH
○ NCBI maintains with three types of information namely, substance, compound, vendor
details, pharmacology and BioAssays
➢ ChEMBL: literature-extracted biological activity information
○ Curated database of small molecules includes interactions and functional effects of small
molecules binding to their macromolecular targets, and series of drug discovery databases. 1
million bioactive (small drug-like molecules) compounds with 8200 drug targets
➢ Zinc15
○ Curated collection of commercially available 21 million chemical compounds, with 3D
coordinates. Ready to dock and 3D formats
➢ BindingDB
○ contains 910,836 binding data, for 6,263 protein targets and 378,980 small molecules.
➢ ChemSpider: a chemical database integrated with RSC’s publishing process
○ Collection of chemical compounds includes the conversion of chemical names to chemical
structures, the generation of SMILES and InChI strings as well as the prediction of many
physicochemical parameters 26
@giribio
Girinath G. Pillai, PhD @giribio
Chemical Databases (contd…)
➢ MCule
○ Commercial database of small molecules
➢ Cambridge Structural Database - CSD
○ Repository for small molecule crystal structures in CIF format The CSD is compiled and
maintained by the CCDC.
➢ ChemIDPlus
○ Database of compounds and structures by US National Library of Medicine
➢ ChemBank
○ Public, web based informatics environment created by the Broad Institute's Chemical Biology
Program Includes freely available data derived from small molecules and small molecule
screens, and resources for studying the data
➢ Ligand Expo
○ Formerly Ligand Depot Provides chemical and structural information about small molecules
within the structure entries of the Protein Data Bank
27
@giribio
Girinath G. Pillai, PhD @giribio
Natural Product Database
➢ COCONUT
○ Open database of all natural products
➢ TCM
○ Free small molecular database on traditional Chinese medicine, for virtual screening It is
currently the world's largest TCM database, and contains 170 000 compounds
➢ IMPPAT
○ Database of Indian Medicinal Plants, Phytochemistry and Therapeutics
➢ AromaDb
○ Database of medicinal and aromatic plant’s aroma molecules with phytochemistry and
therapeutic potentials
➢ Dr Duke's Phytochemical and Ethnobotanical
○ Databases facilitate in depth plant, chemical, bioactivity, and ethnobotany searches using
scientific or common names
➢ CMAUP
○ Database of collective molecular activities of useful plants
28
@giribio
Girinath G. Pillai, PhD @giribio
Other Chemical Databases
➢ ChEBI ..(Chemical Entities of Biological Interest)
○ Freely available dictionary of molecular entities focused on ‘ chemical compounds provided
by the European Bioinformatics Institute
➢ KEGG DRUG
○ Comprehensive drug information resource for approved drugs in Japan, USA, and Europe
unified based on the chemical structures and/or the chemical components, and associated
with target, metabolizing enzyme, and other molecular interaction network information
Provided by the Kyoto Encyclopedia of Genes and Genomes
29
@giribio
Girinath G. Pillai, PhD @giribio
Commercial DBs
➢ Enamine
➢ MolPort
➢ WuXi
➢ ChemDiv
➢ etc etc...
30
@giribio
Girinath G. Pillai, PhD @giribio
NCI
31
@giribio
Girinath G. Pillai, PhD @giribio
Input - Search
32
@giribio
Girinath G. Pillai, PhD @giribio
Filters
33
@giribio
Girinath G. Pillai, PhD @giribio
Hit Lists
34
@giribio
Girinath G. Pillai, PhD @giribio
Details of a Hit
35
@giribio
Girinath G. Pillai, PhD @giribio
3D Viewer
36
@giribio
Girinath G. Pillai, PhD @giribio
Download NCI DB
37
@giribio
Girinath G. Pillai, PhD @giribio
NCI – DTP – Order Compounds
38
@giribio
Girinath G. Pillai, PhD @giribio
NCI – DTP – Cell Line Screening
39
@giribio
Girinath G. Pillai, PhD @giribio
CCDC - CSD
40
@giribio
Girinath G. Pillai, PhD @giribio
Access Structures
41
@giribio
Girinath G. Pillai, PhD @giribio
CCDC – CSD - Aspirin
42
@giribio
Girinath G. Pillai, PhD @giribio
PubChem
43
@giribio
Girinath G. Pillai, PhD @giribio
Data Sources
44
@giribio
Girinath G. Pillai, PhD @giribio
Summary
45
@giribio
Girinath G. Pillai, PhD @giribio
ZINC15
46
@giribio
Girinath G. Pillai, PhD @giribio
Tranches
47
@giribio
Girinath G. Pillai, PhD @giribio
Query
48
@giribio
Girinath G. Pillai, PhD @giribio
Filters
49
@giribio
Girinath G. Pillai, PhD @giribio
Summary
50
@giribio
Girinath G. Pillai, PhD @giribio
ChemIDPlus : ToxNet
51
@giribio
Girinath G. Pillai, PhD @giribio
Query
52
@giribio
Girinath G. Pillai, PhD @giribio
Resources
53
@giribio
Girinath G. Pillai, PhD @giribio
Data Sources
54
@giribio
Girinath G. Pillai, PhD @giribio
Drug Bank
55
@giribio
Girinath G. Pillai, PhD @giribio
Details
56
@giribio
Girinath G. Pillai, PhD @giribio
ChEMBL
57
@giribio
Girinath G. Pillai, PhD @giribio
Filters / List
58
@giribio
Girinath G. Pillai, PhD @giribio
Summary
59
@giribio
Girinath G. Pillai, PhD @giribio
Binding DB
60
@giribio
Girinath G. Pillai, PhD @giribio
Details
61
@giribio
Girinath G. Pillai, PhD @giribio
Binding DB - Browser Extension
62
@giribio
Girinath G. Pillai, PhD @giribio
Data Link to Binding DB
63
@giribio
Girinath G. Pillai, PhD @giribio
KEGG Compound DB
64
@giribio
Girinath G. Pillai, PhD @giribio
Compound Details
65
@giribio
Girinath G. Pillai, PhD @giribio
SuperDrug2
66
@giribio
Girinath G. Pillai, PhD @giribio
COCONUT Database
67
@giribio
Girinath G. Pillai, PhD @giribio
HIT - Herbal - Target
68
@giribio
Girinath G. Pillai, PhD @giribio
Compound Summary
69
@giribio
Girinath G. Pillai, PhD @giribio
Super Natural II Database
70
@giribio
Girinath G. Pillai, PhD @giribio
Summary
71
@giribio
Girinath G. Pillai, PhD @giribio
NPACT
72
@giribio
Girinath G. Pillai, PhD @giribio
Pharmaceutical Product Databases
73
@giribio
Girinath G. Pillai, PhD @giribio
Toxic Substance Databases
74
@giribio
Girinath G. Pillai, PhD @giribio
Metabolomic Databases
75
@giribio
Girinath G. Pillai, PhD @giribio
Metabolic Pathway Databases
76
@giribio
Girinath G. Pillai, PhD @giribio
Some Questions!
➢ Predict drug-like molecules? Toxicity?
○ New Strategies
➢ How can we search efficiently? Intelligently?
○ New data structures and algorithms
○ Optimizing old structures
➢ How can we understand this much data?
○ Cluster and visualize millions of data points
○ Define commercially accessible space.
➢ Are there other useful things we can do with this?
○ Discover new polymers, etc.
○ Wonder about the origin of life.
○ Combinatorially combine all known chemicals.
77
@giribio
Girinath G. Pillai, PhD @giribio
CONCLUSION
Data Search is an art, dig data as much as you can.
Initially consider 25% score/qlty & 75% diversity as the
size of the lead reduces consider 75% score/qlty & 25%
diversity.
Consider Enrichment factors
Good synergetics between human expertise &
computational tools
Avoid Missed Opportunities
Understand significance of parameters/properties
Evaluate and decide the tool/approach
Check reliability of data used 78
@giribio
Girinath G. Pillai, PhD @giribio
Until you
try yourself and
train others
you cannot be
an expert
If you think you
finished
collecting all
Data! It's wrong.
Go and find
more data
@giribio
Girinath G. Pillai, PhD @giribio
THANKS
Do you have any questions?
www.zastrain.com
@giribio
80
We do accept research interns

More Related Content

What's hot

What's hot (20)

Presentation on insilico drug design and virtual screening
Presentation on insilico drug design and virtual screeningPresentation on insilico drug design and virtual screening
Presentation on insilico drug design and virtual screening
 
Structural elucidation of quercetin
Structural elucidation of quercetinStructural elucidation of quercetin
Structural elucidation of quercetin
 
Solid phase peptide synthesis
Solid phase peptide synthesisSolid phase peptide synthesis
Solid phase peptide synthesis
 
Pharmacophore mapping and virtual screening(CADD) ppt.pptx
Pharmacophore mapping and virtual screening(CADD) ppt.pptxPharmacophore mapping and virtual screening(CADD) ppt.pptx
Pharmacophore mapping and virtual screening(CADD) ppt.pptx
 
structure elucidation of Steroids and flavoniods
structure elucidation of Steroids and flavoniodsstructure elucidation of Steroids and flavoniods
structure elucidation of Steroids and flavoniods
 
PHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENT
PHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENTPHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENT
PHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENT
 
Flavanoid
FlavanoidFlavanoid
Flavanoid
 
“Docking Studies and Drug Design”
“Docking  Studies and Drug Design”“Docking  Studies and Drug Design”
“Docking Studies and Drug Design”
 
solid phase synthesis Presentation by komal
solid phase synthesis Presentation by komalsolid phase synthesis Presentation by komal
solid phase synthesis Presentation by komal
 
Pharmacophore modeling and docking techniques
Pharmacophore modeling and docking techniquesPharmacophore modeling and docking techniques
Pharmacophore modeling and docking techniques
 
Steroids
SteroidsSteroids
Steroids
 
Alkaloids
AlkaloidsAlkaloids
Alkaloids
 
Pharmacophore identification
Pharmacophore identificationPharmacophore identification
Pharmacophore identification
 
Drug design
Drug designDrug design
Drug design
 
Structure based in silico virtual screening
Structure based in silico virtual screeningStructure based in silico virtual screening
Structure based in silico virtual screening
 
Md simulation
Md simulationMd simulation
Md simulation
 
Energy minimization
Energy minimizationEnergy minimization
Energy minimization
 
Alkaloids
AlkaloidsAlkaloids
Alkaloids
 
22.pharmacophore
22.pharmacophore22.pharmacophore
22.pharmacophore
 
3 D QSAR Approaches and Contour Map Analysis
3 D QSAR Approaches and Contour Map Analysis3 D QSAR Approaches and Contour Map Analysis
3 D QSAR Approaches and Contour Map Analysis
 

Similar to Mining Small Molecules for Drug Discovery

Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
Sunghwan Kim
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug Discovery
Sunghwan Kim
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
Barry Smith
 
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
RobertGHunter
 

Similar to Mining Small Molecules for Drug Discovery (20)

Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
Digging out Structures for Repurposing: Non-competitive Intelligence ...
Digging out Structures for Repurposing: Non-competitive Intelligence        ...Digging out Structures for Repurposing: Non-competitive Intelligence        ...
Digging out Structures for Repurposing: Non-competitive Intelligence ...
 
Friedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nrFriedberg lab-overview-grad-students-2019-nr
Friedberg lab-overview-grad-students-2019-nr
 
Boyce cshals-2012-linked-sp ls-final
Boyce cshals-2012-linked-sp ls-finalBoyce cshals-2012-linked-sp ls-final
Boyce cshals-2012-linked-sp ls-final
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific Thinking
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug Discovery
 
Drug and Chemical Databases 2018 - Drug Discovery
Drug and Chemical Databases 2018 - Drug DiscoveryDrug and Chemical Databases 2018 - Drug Discovery
Drug and Chemical Databases 2018 - Drug Discovery
 
Osp 1st sep2015 OSDD
Osp 1st sep2015 OSDDOsp 1st sep2015 OSDD
Osp 1st sep2015 OSDD
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Biologics information in PubChem
Biologics information in PubChemBiologics information in PubChem
Biologics information in PubChem
 
Make better drug discovery decisions through collaborative analytics cdd we...
Make better drug discovery decisions through collaborative analytics   cdd we...Make better drug discovery decisions through collaborative analytics   cdd we...
Make better drug discovery decisions through collaborative analytics cdd we...
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data Chemistry
 
Dyadic C1 Family Office Opportunity
Dyadic C1 Family Office OpportunityDyadic C1 Family Office Opportunity
Dyadic C1 Family Office Opportunity
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Drug discovery Using Bioinformatic
Drug discovery Using BioinformaticDrug discovery Using Bioinformatic
Drug discovery Using Bioinformatic
 
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
Predictive in vitro & in silico Methods for Precision Medicine- Robert G. Hun...
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChem
 

More from Girinath Pillai

More from Girinath Pillai (15)

Basics of Quantum and Computational Chemistry
Basics of Quantum and Computational ChemistryBasics of Quantum and Computational Chemistry
Basics of Quantum and Computational Chemistry
 
KNIME in Life Science, Cheminformatics and Computational Chemistry
KNIME in Life Science, Cheminformatics and Computational ChemistryKNIME in Life Science, Cheminformatics and Computational Chemistry
KNIME in Life Science, Cheminformatics and Computational Chemistry
 
Visualisation Techniques in Drug Design
Visualisation Techniques in Drug DesignVisualisation Techniques in Drug Design
Visualisation Techniques in Drug Design
 
Autodock Made Easy with MGL Tools - Molecular Docking
Autodock Made Easy with MGL Tools - Molecular DockingAutodock Made Easy with MGL Tools - Molecular Docking
Autodock Made Easy with MGL Tools - Molecular Docking
 
Anaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange InstallationAnaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange Installation
 
Machine Learning in Chemistry and Drug Candidate Selection
Machine Learning in Chemistry and Drug Candidate SelectionMachine Learning in Chemistry and Drug Candidate Selection
Machine Learning in Chemistry and Drug Candidate Selection
 
Protein Minimization required or not for Molecular Docking
Protein Minimization required or not  for Molecular DockingProtein Minimization required or not  for Molecular Docking
Protein Minimization required or not for Molecular Docking
 
Ligand Optimization required or not for Molecular Docking
Ligand Optimization required or not  for Molecular DockingLigand Optimization required or not  for Molecular Docking
Ligand Optimization required or not for Molecular Docking
 
How 3D structures to be considered for 3D QSAR?
How 3D structures to be considered for  3D QSAR?How 3D structures to be considered for  3D QSAR?
How 3D structures to be considered for 3D QSAR?
 
Molecular Dynamics for Beginners : Detailed Overview
Molecular Dynamics for Beginners : Detailed OverviewMolecular Dynamics for Beginners : Detailed Overview
Molecular Dynamics for Beginners : Detailed Overview
 
Target Identification - Gene Disease and Protein Target Prediction
Target Identification - Gene Disease and Protein Target PredictionTarget Identification - Gene Disease and Protein Target Prediction
Target Identification - Gene Disease and Protein Target Prediction
 
Role of Drug Design in Medicinal Chemistry
Role of Drug Design in Medicinal ChemistryRole of Drug Design in Medicinal Chemistry
Role of Drug Design in Medicinal Chemistry
 
Startup Idea Pitch & Inspiring Presentation
Startup Idea Pitch & Inspiring PresentationStartup Idea Pitch & Inspiring Presentation
Startup Idea Pitch & Inspiring Presentation
 
Why Drug Design and Computational Methods are important?
Why Drug Design and Computational Methods are important?Why Drug Design and Computational Methods are important?
Why Drug Design and Computational Methods are important?
 
Computer Aided Drug Design and Discovery : An Overview (2006)
Computer Aided Drug Design and Discovery : An Overview (2006)Computer Aided Drug Design and Discovery : An Overview (2006)
Computer Aided Drug Design and Discovery : An Overview (2006)
 

Recently uploaded

Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
jyothisaisri
 

Recently uploaded (20)

Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 
VILLAGE ATTACHMENT For rural agriculture PPT.pptx
VILLAGE ATTACHMENT For rural agriculture  PPT.pptxVILLAGE ATTACHMENT For rural agriculture  PPT.pptx
VILLAGE ATTACHMENT For rural agriculture PPT.pptx
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
RACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptxRACEMIzATION AND ISOMERISATION completed.pptx
RACEMIzATION AND ISOMERISATION completed.pptx
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana LahariERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
Tuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notesTuberculosis (TB)-Notes.pdf microbiology notes
Tuberculosis (TB)-Notes.pdf microbiology notes
 
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanPlasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 RpWASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
 
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptxBiochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
Biochemistry and Biomolecules - Science - 9th Grade by Slidesgo.pptx
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
MSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfMSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdf
 

Mining Small Molecules for Drug Discovery

  • 1. @giribio Girinath G. Pillai, PhD @giribio Mining small molecules for Drug Discovery Girinath G. Pillai, PhD @giribio
  • 2. @giribio Girinath G. Pillai, PhD @giribio Note & Disclaimer ● We are not yet completely ready with AI/ML in Drug Discovery (it takes time like Human Genome Project) ● Slides contains contents/pictures/videos taken from web, articles, lectures, tutorials and its respective authors own their copyrights. ● No conflict of interest Technical Slides : slideshare.net/giribio Handson Videos : youtube.com/giribio (Autodock, Modeller, PaDel, QSAR, MD, etc) Workflows & Notebooks : github.com/giribio 2
  • 3. @giribio Girinath G. Pillai, PhD @giribio Agenda What to expect? ➔ FDA & Drug Approvals ➔ Drug Discovery ➔ Chemical Representation ➔ Chemical Databases ➔ Herbal/Natural Compound Databases 3
  • 4. @giribio Girinath G. Pillai, PhD @giribio FDA Approvals - Timeline 4
  • 5. @giribio Girinath G. Pillai, PhD @giribio MW Trends 5
  • 6. @giribio Girinath G. Pillai, PhD @giribio HBD Trends 6
  • 7. @giribio Girinath G. Pillai, PhD @giribio HBA Trends 7
  • 8. @giribio Girinath G. Pillai, PhD @giribio How to reduce high cost of Drugs? 8
  • 9. @giribio Girinath G. Pillai, PhD @giribio 9 1960’s to 1980’s RA Prentis et al, Br. J. Clin. Pharm. 1988 T Kennedy Drug Discov. Today 1997 Cause of Failure - Oral Drugs (Early)
  • 10. @giribio Girinath G. Pillai, PhD @giribio 10 Estimation of ADME at early stages Generate models to predict ADME/PKs 2000’s to 2010’s MJ Waring et al, Nat. Rev. Drug Discov. 2015 J Arrowsmith & P Miller Nat. Rev. Drug Discov. 2013 Cause of Failure - Oral Drugs (Late)
  • 11. @giribio Girinath G. Pillai, PhD @giribio ➔ Identify chemistries with an ● optimal balance of properties ➔ Quickly identify situations when ● such a balance is not possible ➔ Fail fast, fail cheap ➔ Only when confident ➔ Avoid missed opportunities 11 The Objectives of Drug Discovery Multi-parameter optimisation
  • 12. @giribio Girinath G. Pillai, PhD @giribio ➔ Captopril Capoten®, Bristol Myers-Squibb ➔ Dorzolamide Trusopt®, Merck ➔ Zanamivir Relenza®, Gilead Sciences ➔ Aliskiren Tekturna®, Novartis ➔ Boceprevir Schering-Plough ➔ Nolatrexed dihydrochloride Thymitaq®, Agouron ➔ LY-517717 Lilly/Protherics ➔ Acetyl-CoA carboxylase Inhibitor Nimbus, US ➔ Rupintrivir AG7088, Agouron ➔ NVP-AUY922 Novartis ➔ Vemurafenib Plexxikon ➔ Venetoclax AbbVie, Genentech ➔ Erdafitinib Johnson & Johnson ➔ Verubecestat Merck 12 Nat Rev Drug Discov. 2016 Sep;15(9):605-19 Curr Top Med Chem. 2010;10(1):127-41 http://practicalfragments.blogspot.com/ Selected Success Stories of CADD
  • 13. @giribio Girinath G. Pillai, PhD @giribio Representing Chemicals ➢ Trivial name, e.g. Baking Soda, Aspirin, Citric Acid, etc. ○ Identifies the compound, but gives no (or little) information about what it consists of ➢ Chemical formula, e.g. C6 H12 O6 . ○ Specifies the type and quantity of the atoms in the compound, but not its structure (i.e. how the atoms are connected by bonds) ➢ Systematic name, e.g. 1,2-dibromo-3-chloropropane. ○ Identifies the atoms present and how they are connected by bonds. 13
  • 14. @giribio Girinath G. Pillai, PhD @giribio What are Small Molecules? ➢ A small molecule is defined as a low molecular weight organic compound. ➢ Most drugs are small molecules to allow passage over cell membranes and oral bioavailability. ➢ They are also able to bind to proteins and enzymes, thereby altering function, which can lead to a therapeutic effect. 14
  • 15. @giribio Girinath G. Pillai, PhD @giribio Digital Representations ➢ How do we communicate structural information between humans and the computer? ○ – Line notations, e.g. Wiswesser Line Notation (and later SMILES) ➢ How do we represent the atoms and bonds in a molecule internally in a computer? ○ – Atom lookup and connection tables ➢ Trivial name : proline ➢ Systematic names : pyrrolidine‐2‐carboxylic acid 15
  • 16. @giribio Girinath G. Pillai, PhD @giribio Examples Some common small molecules: Amino Acids 16
  • 17. @giribio Girinath G. Pillai, PhD @giribio Molecular Representations ➢ 1D: SMILES Strings ➢ 2D: Graph of Bonds ➢ 2.5D: Surfaces ➢ 3D: Atomic Coordinates ➢ 4D: Temporal Evolution 17
  • 18. @giribio Girinath G. Pillai, PhD @giribio Chemical - File Formats ➢ .mol ➢ .mol2 ➢ .sd / sdf ➢ .pdb ➢ .skc/.cdx/.mrv/… ➢ .smi CCCC 18
  • 19. @giribio Girinath G. Pillai, PhD @giribio 19 Issues for ML: ● arbitrary size ● arbitrary order Ideal features: ● general ● compact ● unique ● invariant * ● smooth ● fast 010110101010001011100100010001111110 ML methods need a computer-friendly way to input the atomistic system: easy for us easy for CPU * invariants are determined by the physics of the quantity to predict from the descriptor! Chemical Compounds for Machines
  • 20. @giribio Girinath G. Pillai, PhD @giribio 20 010110101010001011100100010001111110 ML methods need a computer-friendly way to input the atomistic system: Global Descriptor 110100011110000110010111111110 110100011110001011100001111110 010110101010001011100001111110 Local/Atomic Descriptor Descriptors for Chemistry
  • 21. @giribio Girinath G. Pillai, PhD @giribio Database? Database is an “organized collection of information.” Information in a database can be in any format, including texts, numbers, images, audios, videos, and many others (and combination of these) Information must be “organized” for efficient retrieval. 21
  • 22. @giribio Girinath G. Pillai, PhD @giribio Database Categories Primary databases contain experimentally-derived data that are directly submitted by researchers (also called “primary data”). In essence, these databases serve as archives that keep original data. aka archival databases. Secondary databases contain secondary data, which are derived from analyzing and interpreting primary data. These databases often provide value-added information related to the primary data, by using information from other databases and scientific literature. Essentially, secondary databases serve as reference libraries for the scientific community, providing highly curated reviews about primary data. aka curated databases, or knowledgebase. 22
  • 23. @giribio Girinath G. Pillai, PhD @giribio Data Provenance “data provenance” refers to a record trail that describes the origin or source of a piece of data and the process by which it entered in a database. Simply put, data provenance deals with the questions “where the data came from” and “how and why the data is in its present place”. Although the data provenance information is critical in the reliability of a data source (and its data), this information is not easy to manage 23
  • 24. @giribio Girinath G. Pillai, PhD @giribio Small Molecule Databases 24 Collection of molecules reported, published and proposed Chemical DBs Collection of approved drugs with its pharmacological and target details FDA Drugs Collection of compounds/extracts from plants, animals, etc Natural Products Collection of purchasable as well as on-demand synthesis molecules Commercial DBs
  • 25. @giribio Girinath G. Pillai, PhD @giribio Approved Drug Database ➢ DrugBank: comprehensive information on drug molecules ○ Combined detailed drug data (ie. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (ie. sequence, structure, and pathway) information Allows searching for similar compounds ➢ SuperDrug : 2500 3D-structures of active ingredients of essential marketed drugs. 25
  • 26. @giribio Girinath G. Pillai, PhD @giribio Chemical Database ➢ PubChem: chemical information repository at the U.S. NIH ○ NCBI maintains with three types of information namely, substance, compound, vendor details, pharmacology and BioAssays ➢ ChEMBL: literature-extracted biological activity information ○ Curated database of small molecules includes interactions and functional effects of small molecules binding to their macromolecular targets, and series of drug discovery databases. 1 million bioactive (small drug-like molecules) compounds with 8200 drug targets ➢ Zinc15 ○ Curated collection of commercially available 21 million chemical compounds, with 3D coordinates. Ready to dock and 3D formats ➢ BindingDB ○ contains 910,836 binding data, for 6,263 protein targets and 378,980 small molecules. ➢ ChemSpider: a chemical database integrated with RSC’s publishing process ○ Collection of chemical compounds includes the conversion of chemical names to chemical structures, the generation of SMILES and InChI strings as well as the prediction of many physicochemical parameters 26
  • 27. @giribio Girinath G. Pillai, PhD @giribio Chemical Databases (contd…) ➢ MCule ○ Commercial database of small molecules ➢ Cambridge Structural Database - CSD ○ Repository for small molecule crystal structures in CIF format The CSD is compiled and maintained by the CCDC. ➢ ChemIDPlus ○ Database of compounds and structures by US National Library of Medicine ➢ ChemBank ○ Public, web based informatics environment created by the Broad Institute's Chemical Biology Program Includes freely available data derived from small molecules and small molecule screens, and resources for studying the data ➢ Ligand Expo ○ Formerly Ligand Depot Provides chemical and structural information about small molecules within the structure entries of the Protein Data Bank 27
  • 28. @giribio Girinath G. Pillai, PhD @giribio Natural Product Database ➢ COCONUT ○ Open database of all natural products ➢ TCM ○ Free small molecular database on traditional Chinese medicine, for virtual screening It is currently the world's largest TCM database, and contains 170 000 compounds ➢ IMPPAT ○ Database of Indian Medicinal Plants, Phytochemistry and Therapeutics ➢ AromaDb ○ Database of medicinal and aromatic plant’s aroma molecules with phytochemistry and therapeutic potentials ➢ Dr Duke's Phytochemical and Ethnobotanical ○ Databases facilitate in depth plant, chemical, bioactivity, and ethnobotany searches using scientific or common names ➢ CMAUP ○ Database of collective molecular activities of useful plants 28
  • 29. @giribio Girinath G. Pillai, PhD @giribio Other Chemical Databases ➢ ChEBI ..(Chemical Entities of Biological Interest) ○ Freely available dictionary of molecular entities focused on ‘ chemical compounds provided by the European Bioinformatics Institute ➢ KEGG DRUG ○ Comprehensive drug information resource for approved drugs in Japan, USA, and Europe unified based on the chemical structures and/or the chemical components, and associated with target, metabolizing enzyme, and other molecular interaction network information Provided by the Kyoto Encyclopedia of Genes and Genomes 29
  • 30. @giribio Girinath G. Pillai, PhD @giribio Commercial DBs ➢ Enamine ➢ MolPort ➢ WuXi ➢ ChemDiv ➢ etc etc... 30
  • 31. @giribio Girinath G. Pillai, PhD @giribio NCI 31
  • 32. @giribio Girinath G. Pillai, PhD @giribio Input - Search 32
  • 33. @giribio Girinath G. Pillai, PhD @giribio Filters 33
  • 34. @giribio Girinath G. Pillai, PhD @giribio Hit Lists 34
  • 35. @giribio Girinath G. Pillai, PhD @giribio Details of a Hit 35
  • 36. @giribio Girinath G. Pillai, PhD @giribio 3D Viewer 36
  • 37. @giribio Girinath G. Pillai, PhD @giribio Download NCI DB 37
  • 38. @giribio Girinath G. Pillai, PhD @giribio NCI – DTP – Order Compounds 38
  • 39. @giribio Girinath G. Pillai, PhD @giribio NCI – DTP – Cell Line Screening 39
  • 40. @giribio Girinath G. Pillai, PhD @giribio CCDC - CSD 40
  • 41. @giribio Girinath G. Pillai, PhD @giribio Access Structures 41
  • 42. @giribio Girinath G. Pillai, PhD @giribio CCDC – CSD - Aspirin 42
  • 43. @giribio Girinath G. Pillai, PhD @giribio PubChem 43
  • 44. @giribio Girinath G. Pillai, PhD @giribio Data Sources 44
  • 45. @giribio Girinath G. Pillai, PhD @giribio Summary 45
  • 46. @giribio Girinath G. Pillai, PhD @giribio ZINC15 46
  • 47. @giribio Girinath G. Pillai, PhD @giribio Tranches 47
  • 48. @giribio Girinath G. Pillai, PhD @giribio Query 48
  • 49. @giribio Girinath G. Pillai, PhD @giribio Filters 49
  • 50. @giribio Girinath G. Pillai, PhD @giribio Summary 50
  • 51. @giribio Girinath G. Pillai, PhD @giribio ChemIDPlus : ToxNet 51
  • 52. @giribio Girinath G. Pillai, PhD @giribio Query 52
  • 53. @giribio Girinath G. Pillai, PhD @giribio Resources 53
  • 54. @giribio Girinath G. Pillai, PhD @giribio Data Sources 54
  • 55. @giribio Girinath G. Pillai, PhD @giribio Drug Bank 55
  • 56. @giribio Girinath G. Pillai, PhD @giribio Details 56
  • 57. @giribio Girinath G. Pillai, PhD @giribio ChEMBL 57
  • 58. @giribio Girinath G. Pillai, PhD @giribio Filters / List 58
  • 59. @giribio Girinath G. Pillai, PhD @giribio Summary 59
  • 60. @giribio Girinath G. Pillai, PhD @giribio Binding DB 60
  • 61. @giribio Girinath G. Pillai, PhD @giribio Details 61
  • 62. @giribio Girinath G. Pillai, PhD @giribio Binding DB - Browser Extension 62
  • 63. @giribio Girinath G. Pillai, PhD @giribio Data Link to Binding DB 63
  • 64. @giribio Girinath G. Pillai, PhD @giribio KEGG Compound DB 64
  • 65. @giribio Girinath G. Pillai, PhD @giribio Compound Details 65
  • 66. @giribio Girinath G. Pillai, PhD @giribio SuperDrug2 66
  • 67. @giribio Girinath G. Pillai, PhD @giribio COCONUT Database 67
  • 68. @giribio Girinath G. Pillai, PhD @giribio HIT - Herbal - Target 68
  • 69. @giribio Girinath G. Pillai, PhD @giribio Compound Summary 69
  • 70. @giribio Girinath G. Pillai, PhD @giribio Super Natural II Database 70
  • 71. @giribio Girinath G. Pillai, PhD @giribio Summary 71
  • 72. @giribio Girinath G. Pillai, PhD @giribio NPACT 72
  • 73. @giribio Girinath G. Pillai, PhD @giribio Pharmaceutical Product Databases 73
  • 74. @giribio Girinath G. Pillai, PhD @giribio Toxic Substance Databases 74
  • 75. @giribio Girinath G. Pillai, PhD @giribio Metabolomic Databases 75
  • 76. @giribio Girinath G. Pillai, PhD @giribio Metabolic Pathway Databases 76
  • 77. @giribio Girinath G. Pillai, PhD @giribio Some Questions! ➢ Predict drug-like molecules? Toxicity? ○ New Strategies ➢ How can we search efficiently? Intelligently? ○ New data structures and algorithms ○ Optimizing old structures ➢ How can we understand this much data? ○ Cluster and visualize millions of data points ○ Define commercially accessible space. ➢ Are there other useful things we can do with this? ○ Discover new polymers, etc. ○ Wonder about the origin of life. ○ Combinatorially combine all known chemicals. 77
  • 78. @giribio Girinath G. Pillai, PhD @giribio CONCLUSION Data Search is an art, dig data as much as you can. Initially consider 25% score/qlty & 75% diversity as the size of the lead reduces consider 75% score/qlty & 25% diversity. Consider Enrichment factors Good synergetics between human expertise & computational tools Avoid Missed Opportunities Understand significance of parameters/properties Evaluate and decide the tool/approach Check reliability of data used 78
  • 79. @giribio Girinath G. Pillai, PhD @giribio Until you try yourself and train others you cannot be an expert If you think you finished collecting all Data! It's wrong. Go and find more data
  • 80. @giribio Girinath G. Pillai, PhD @giribio THANKS Do you have any questions? www.zastrain.com @giribio 80 We do accept research interns