Mining Small Molecules for Drug Discovery

@giribio
Girinath G. Pillai, PhD @giribio
Mining small molecules
for Drug Discovery
Girinath G. Pillai, PhD
@giribio

@giribio
Note & Disclaimer
● We are not yet completely ready with AI/ML in Drug Discovery (it takes time
like Human Genome Project)
● Slides contains contents/pictures/videos taken from web, articles, lectures,
tutorials and its respective authors own their copyrights.
● No conﬂict of interest
Technical Slides : slideshare.net/giribio
Handson Videos : youtube.com/giribio (Autodock, Modeller, PaDel, QSAR, MD, etc)
Workﬂows & Notebooks : github.com/giribio
2

@giribio
Agenda
What to expect?
➔ FDA & Drug Approvals
➔ Drug Discovery
➔ Chemical Representation
➔ Chemical Databases
➔ Herbal/Natural Compound Databases
3

@giribio
FDA Approvals - Timeline
4

@giribio
MW Trends
5

@giribio
HBD Trends
6

@giribio
HBA Trends
7

@giribio
How to reduce high cost of Drugs?
8

@giribio
9
1960’s to 1980’s
RA Prentis et al, Br. J. Clin. Pharm. 1988
T Kennedy Drug Discov. Today 1997
Cause of Failure - Oral Drugs (Early)

@giribio
10
Estimation of ADME at early stages
Generate models to predict ADME/PKs
2000’s to 2010’s
MJ Waring et al, Nat. Rev. Drug Discov. 2015
J Arrowsmith & P Miller Nat. Rev. Drug Discov. 2013
Cause of Failure - Oral Drugs (Late)

@giribio
➔ Identify chemistries with an
● optimal balance of properties
➔ Quickly identify situations when
● such a balance is not possible
➔ Fail fast, fail cheap
➔ Only when conﬁdent
➔ Avoid missed opportunities
11
The Objectives of Drug Discovery
Multi-parameter optimisation

@giribio
➔ Captopril Capoten®, Bristol Myers-Squibb
➔ Dorzolamide Trusopt®, Merck
➔ Zanamivir Relenza®, Gilead Sciences
➔ Aliskiren Tekturna®, Novartis
➔ Boceprevir Schering-Plough
➔ Nolatrexed dihydrochloride Thymitaq®, Agouron
➔ LY-517717 Lilly/Protherics
➔ Acetyl-CoA carboxylase Inhibitor Nimbus, US
➔ Rupintrivir AG7088, Agouron
➔ NVP-AUY922 Novartis
➔ Vemurafenib Plexxikon
➔ Venetoclax AbbVie, Genentech
➔ Erdaﬁtinib Johnson & Johnson
➔ Verubecestat Merck 12
Nat Rev Drug Discov. 2016 Sep;15(9):605-19
Curr Top Med Chem. 2010;10(1):127-41
http://practicalfragments.blogspot.com/
Selected Success Stories of CADD

@giribio
Representing Chemicals
➢ Trivial name, e.g. Baking Soda, Aspirin, Citric Acid, etc.
○ Identifies the compound, but gives no (or little) information about
what it consists of
➢ Chemical formula, e.g. C6
H12
O6
.
○ Specifies the type and quantity of the atoms in the compound, but
not its structure (i.e. how the atoms are connected by bonds)
➢ Systematic name, e.g. 1,2-dibromo-3-chloropropane.
○ Identifies the atoms present and how they are connected by bonds.
13

@giribio
What are Small Molecules?
➢ A small molecule is deﬁned as a low molecular weight organic
compound.
➢ Most drugs are small molecules to allow passage over cell
membranes and oral bioavailability.
➢ They are also able to bind to proteins and enzymes, thereby
altering function, which can lead to a therapeutic effect.
14

@giribio
Digital Representations
➢ How do we communicate structural information between
humans and the computer?
○ – Line notations, e.g. Wiswesser Line Notation (and later SMILES)
➢ How do we represent the atoms and bonds in a molecule
internally in a computer?
○ – Atom lookup and connection tables
➢ Trivial name : proline
➢ Systematic names : pyrrolidine‐2‐carboxylic acid
15

@giribio
Examples
Some common
small molecules:
Amino Acids
16

@giribio
Molecular Representations
➢ 1D: SMILES Strings
➢ 2D: Graph of Bonds
➢ 2.5D: Surfaces
➢ 3D: Atomic Coordinates
➢ 4D: Temporal Evolution
17

@giribio
Chemical - File Formats
➢ .mol
➢ .mol2
➢ .sd / sdf
➢ .pdb
➢ .skc/.cdx/.mrv/…
➢ .smi CCCC
18

@giribio
19
Issues for ML:
● arbitrary size
● arbitrary order
Ideal features:
● general
● compact
● unique
● invariant *
● smooth
● fast
010110101010001011100100010001111110
ML methods need a computer-friendly way to input the atomistic system:
easy for us
easy for CPU
* invariants are determined by the physics of the quantity to predict from the descriptor!
Chemical Compounds for Machines

@giribio
20
010110101010001011100100010001111110
ML methods need a computer-friendly way to input the atomistic system:
Global
Descriptor
110100011110000110010111111110
110100011110001011100001111110
010110101010001011100001111110
Local/Atomic
Descriptor
Descriptors for Chemistry

@giribio
Database?
Database is an “organized collection of information.”
Information in a database can be in any format, including texts, numbers, images,
audios, videos, and many others (and combination of these)
Information must be “organized” for eﬃcient retrieval.
21

@giribio
Database Categories
Primary databases contain experimentally-derived data that are directly
submitted by researchers (also called “primary data”). In essence, these
databases serve as archives that keep original data. aka archival databases.
Secondary databases contain secondary data, which are derived from analyzing
and interpreting primary data. These databases often provide value-added
information related to the primary data, by using information from other
databases and scientiﬁc literature.
Essentially, secondary databases serve as reference libraries for the scientiﬁc
community, providing highly curated reviews about primary data. aka curated
databases, or knowledgebase.
22

@giribio
Data Provenance
“data provenance” refers to a record trail that describes the origin or source of a
piece of data and the process by which it entered in a database.
Simply put, data provenance deals with the questions
“where the data came from” and
“how and why the data is in its present place”.
Although the data provenance information is critical in the reliability of a data
source (and its data), this information is not easy to manage
23

@giribio
Small Molecule Databases
24
Collection of molecules
reported, published and
proposed
Chemical DBs
Collection of approved
drugs with its
pharmacological and
target details
FDA Drugs
Collection of
compounds/extracts
from plants, animals, etc
Natural Products
Collection of
purchasable as well as
on-demand synthesis
molecules
Commercial DBs

@giribio
Approved Drug Database
➢ DrugBank: comprehensive information on drug molecules
○ Combined detailed drug data (ie. chemical, pharmacological and pharmaceutical) data with
comprehensive drug target (ie. sequence, structure, and pathway) information Allows
searching for similar compounds
➢ SuperDrug : 2500 3D-structures of active ingredients of essential marketed
drugs.
25

@giribio
Chemical Database
➢ PubChem: chemical information repository at the U.S. NIH
○ NCBI maintains with three types of information namely, substance, compound, vendor
details, pharmacology and BioAssays
➢ ChEMBL: literature-extracted biological activity information
○ Curated database of small molecules includes interactions and functional effects of small
molecules binding to their macromolecular targets, and series of drug discovery databases. 1
million bioactive (small drug-like molecules) compounds with 8200 drug targets
➢ Zinc15
○ Curated collection of commercially available 21 million chemical compounds, with 3D
coordinates. Ready to dock and 3D formats
➢ BindingDB
○ contains 910,836 binding data, for 6,263 protein targets and 378,980 small molecules.
➢ ChemSpider: a chemical database integrated with RSC’s publishing process
○ Collection of chemical compounds includes the conversion of chemical names to chemical
structures, the generation of SMILES and InChI strings as well as the prediction of many
physicochemical parameters 26

@giribio
Chemical Databases (contd…)
➢ MCule
○ Commercial database of small molecules
➢ Cambridge Structural Database - CSD
○ Repository for small molecule crystal structures in CIF format The CSD is compiled and
maintained by the CCDC.
➢ ChemIDPlus
○ Database of compounds and structures by US National Library of Medicine
➢ ChemBank
○ Public, web based informatics environment created by the Broad Institute's Chemical Biology
Program Includes freely available data derived from small molecules and small molecule
screens, and resources for studying the data
➢ Ligand Expo
○ Formerly Ligand Depot Provides chemical and structural information about small molecules
within the structure entries of the Protein Data Bank
27

@giribio
Natural Product Database
➢ COCONUT
○ Open database of all natural products
➢ TCM
○ Free small molecular database on traditional Chinese medicine, for virtual screening It is
currently the world's largest TCM database, and contains 170 000 compounds
➢ IMPPAT
○ Database of Indian Medicinal Plants, Phytochemistry and Therapeutics
➢ AromaDb
○ Database of medicinal and aromatic plant’s aroma molecules with phytochemistry and
therapeutic potentials
➢ Dr Duke's Phytochemical and Ethnobotanical
○ Databases facilitate in depth plant, chemical, bioactivity, and ethnobotany searches using
scientiﬁc or common names
➢ CMAUP
○ Database of collective molecular activities of useful plants
28

@giribio
Other Chemical Databases
➢ ChEBI ..(Chemical Entities of Biological Interest)
○ Freely available dictionary of molecular entities focused on ‘ chemical compounds provided
by the European Bioinformatics Institute
➢ KEGG DRUG
○ Comprehensive drug information resource for approved drugs in Japan, USA, and Europe
uniﬁed based on the chemical structures and/or the chemical components, and associated
with target, metabolizing enzyme, and other molecular interaction network information
Provided by the Kyoto Encyclopedia of Genes and Genomes
29

@giribio
Commercial DBs
➢ Enamine
➢ MolPort
➢ WuXi
➢ ChemDiv
➢ etc etc...
30

@giribio
NCI
31

@giribio
Input - Search
32

@giribio
Filters
33

@giribio
Hit Lists
34

@giribio
Details of a Hit
35

@giribio
3D Viewer
36

@giribio
Download NCI DB
37

@giribio
NCI – DTP – Order Compounds
38

@giribio
NCI – DTP – Cell Line Screening
39

@giribio
CCDC - CSD
40

@giribio
Access Structures
41

@giribio
CCDC – CSD - Aspirin
42

@giribio
PubChem
43

@giribio
Data Sources
44

@giribio
Summary
45

@giribio
ZINC15
46

@giribio
Tranches
47

@giribio
Query
48

@giribio
Filters
49

@giribio
Summary
50

@giribio
ChemIDPlus : ToxNet
51

@giribio
Query
52

@giribio
Resources
53

@giribio
Data Sources
54

@giribio
Drug Bank
55

@giribio
Details
56

@giribio
ChEMBL
57

@giribio
Filters / List
58

@giribio
Summary
59

@giribio
Binding DB
60

@giribio
Details
61

@giribio
Binding DB - Browser Extension
62

@giribio
Data Link to Binding DB
63

@giribio
KEGG Compound DB
64

@giribio
Compound Details
65

@giribio
SuperDrug2
66

@giribio
COCONUT Database
67

@giribio
HIT - Herbal - Target
68

@giribio
Compound Summary
69

@giribio
Super Natural II Database
70

@giribio
Summary
71

@giribio
NPACT
72

@giribio
Pharmaceutical Product Databases
73

@giribio
Toxic Substance Databases
74

@giribio
Metabolomic Databases
75

@giribio
Metabolic Pathway Databases
76

@giribio
Some Questions!
➢ Predict drug-like molecules? Toxicity?
○ New Strategies
➢ How can we search eﬃciently? Intelligently?
○ New data structures and algorithms
○ Optimizing old structures
➢ How can we understand this much data?
○ Cluster and visualize millions of data points
○ Deﬁne commercially accessible space.
➢ Are there other useful things we can do with this?
○ Discover new polymers, etc.
○ Wonder about the origin of life.
○ Combinatorially combine all known chemicals.
77

@giribio
CONCLUSION
Data Search is an art, dig data as much as you can.
Initially consider 25% score/qlty & 75% diversity as the
size of the lead reduces consider 75% score/qlty & 25%
diversity.
Consider Enrichment factors
Good synergetics between human expertise &
computational tools
Avoid Missed Opportunities
Understand signiﬁcance of parameters/properties
Evaluate and decide the tool/approach
Check reliability of data used 78

@giribio
Until you
try yourself and
train others
you cannot be
an expert
If you think you
finished
collecting all
Data! It's wrong.
Go and find
more data

@giribio
THANKS
Do you have any questions?
www.zastrain.com
@giribio
80
We do accept research interns

Mining Small Molecules for Drug Discovery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mining Small Molecules for Drug Discovery

Similar to Mining Small Molecules for Drug Discovery (20)

More from Girinath Pillai

More from Girinath Pillai (12)

Recently uploaded

Recently uploaded (20)

Mining Small Molecules for Drug Discovery