Details on several key chemical, natural products and commercial databases well-curated for Drug Discovery studies. Importance of pharmacokinetics and ADME in drug candidate selection in the hit-to-lead process of optimisation.
1. @giribio
Girinath G. Pillai, PhD @giribio
Mining small molecules
for Drug Discovery
Girinath G. Pillai, PhD
@giribio
2. @giribio
Girinath G. Pillai, PhD @giribio
Note & Disclaimer
● We are not yet completely ready with AI/ML in Drug Discovery (it takes time
like Human Genome Project)
● Slides contains contents/pictures/videos taken from web, articles, lectures,
tutorials and its respective authors own their copyrights.
● No conflict of interest
Technical Slides : slideshare.net/giribio
Handson Videos : youtube.com/giribio (Autodock, Modeller, PaDel, QSAR, MD, etc)
Workflows & Notebooks : github.com/giribio
2
3. @giribio
Girinath G. Pillai, PhD @giribio
Agenda
What to expect?
➔ FDA & Drug Approvals
➔ Drug Discovery
➔ Chemical Representation
➔ Chemical Databases
➔ Herbal/Natural Compound Databases
3
9. @giribio
Girinath G. Pillai, PhD @giribio
9
1960’s to 1980’s
RA Prentis et al, Br. J. Clin. Pharm. 1988
T Kennedy Drug Discov. Today 1997
Cause of Failure - Oral Drugs (Early)
10. @giribio
Girinath G. Pillai, PhD @giribio
10
Estimation of ADME at early stages
Generate models to predict ADME/PKs
2000’s to 2010’s
MJ Waring et al, Nat. Rev. Drug Discov. 2015
J Arrowsmith & P Miller Nat. Rev. Drug Discov. 2013
Cause of Failure - Oral Drugs (Late)
11. @giribio
Girinath G. Pillai, PhD @giribio
➔ Identify chemistries with an
● optimal balance of properties
➔ Quickly identify situations when
● such a balance is not possible
➔ Fail fast, fail cheap
➔ Only when confident
➔ Avoid missed opportunities
11
The Objectives of Drug Discovery
Multi-parameter optimisation
13. @giribio
Girinath G. Pillai, PhD @giribio
Representing Chemicals
➢ Trivial name, e.g. Baking Soda, Aspirin, Citric Acid, etc.
○ Identifies the compound, but gives no (or little) information about
what it consists of
➢ Chemical formula, e.g. C6
H12
O6
.
○ Specifies the type and quantity of the atoms in the compound, but
not its structure (i.e. how the atoms are connected by bonds)
➢ Systematic name, e.g. 1,2-dibromo-3-chloropropane.
○ Identifies the atoms present and how they are connected by bonds.
13
14. @giribio
Girinath G. Pillai, PhD @giribio
What are Small Molecules?
➢ A small molecule is defined as a low molecular weight organic
compound.
➢ Most drugs are small molecules to allow passage over cell
membranes and oral bioavailability.
➢ They are also able to bind to proteins and enzymes, thereby
altering function, which can lead to a therapeutic effect.
14
15. @giribio
Girinath G. Pillai, PhD @giribio
Digital Representations
➢ How do we communicate structural information between
humans and the computer?
○ – Line notations, e.g. Wiswesser Line Notation (and later SMILES)
➢ How do we represent the atoms and bonds in a molecule
internally in a computer?
○ – Atom lookup and connection tables
➢ Trivial name : proline
➢ Systematic names : pyrrolidine‐2‐carboxylic acid
15
19. @giribio
Girinath G. Pillai, PhD @giribio
19
Issues for ML:
● arbitrary size
● arbitrary order
Ideal features:
● general
● compact
● unique
● invariant *
● smooth
● fast
010110101010001011100100010001111110
ML methods need a computer-friendly way to input the atomistic system:
easy for us
easy for CPU
* invariants are determined by the physics of the quantity to predict from the descriptor!
Chemical Compounds for Machines
20. @giribio
Girinath G. Pillai, PhD @giribio
20
010110101010001011100100010001111110
ML methods need a computer-friendly way to input the atomistic system:
Global
Descriptor
110100011110000110010111111110
110100011110001011100001111110
010110101010001011100001111110
Local/Atomic
Descriptor
Descriptors for Chemistry
21. @giribio
Girinath G. Pillai, PhD @giribio
Database?
Database is an “organized collection of information.”
Information in a database can be in any format, including texts, numbers, images,
audios, videos, and many others (and combination of these)
Information must be “organized” for efficient retrieval.
21
22. @giribio
Girinath G. Pillai, PhD @giribio
Database Categories
Primary databases contain experimentally-derived data that are directly
submitted by researchers (also called “primary data”). In essence, these
databases serve as archives that keep original data. aka archival databases.
Secondary databases contain secondary data, which are derived from analyzing
and interpreting primary data. These databases often provide value-added
information related to the primary data, by using information from other
databases and scientific literature.
Essentially, secondary databases serve as reference libraries for the scientific
community, providing highly curated reviews about primary data. aka curated
databases, or knowledgebase.
22
23. @giribio
Girinath G. Pillai, PhD @giribio
Data Provenance
“data provenance” refers to a record trail that describes the origin or source of a
piece of data and the process by which it entered in a database.
Simply put, data provenance deals with the questions
“where the data came from” and
“how and why the data is in its present place”.
Although the data provenance information is critical in the reliability of a data
source (and its data), this information is not easy to manage
23
24. @giribio
Girinath G. Pillai, PhD @giribio
Small Molecule Databases
24
Collection of molecules
reported, published and
proposed
Chemical DBs
Collection of approved
drugs with its
pharmacological and
target details
FDA Drugs
Collection of
compounds/extracts
from plants, animals, etc
Natural Products
Collection of
purchasable as well as
on-demand synthesis
molecules
Commercial DBs
25. @giribio
Girinath G. Pillai, PhD @giribio
Approved Drug Database
➢ DrugBank: comprehensive information on drug molecules
○ Combined detailed drug data (ie. chemical, pharmacological and pharmaceutical) data with
comprehensive drug target (ie. sequence, structure, and pathway) information Allows
searching for similar compounds
➢ SuperDrug : 2500 3D-structures of active ingredients of essential marketed
drugs.
25
26. @giribio
Girinath G. Pillai, PhD @giribio
Chemical Database
➢ PubChem: chemical information repository at the U.S. NIH
○ NCBI maintains with three types of information namely, substance, compound, vendor
details, pharmacology and BioAssays
➢ ChEMBL: literature-extracted biological activity information
○ Curated database of small molecules includes interactions and functional effects of small
molecules binding to their macromolecular targets, and series of drug discovery databases. 1
million bioactive (small drug-like molecules) compounds with 8200 drug targets
➢ Zinc15
○ Curated collection of commercially available 21 million chemical compounds, with 3D
coordinates. Ready to dock and 3D formats
➢ BindingDB
○ contains 910,836 binding data, for 6,263 protein targets and 378,980 small molecules.
➢ ChemSpider: a chemical database integrated with RSC’s publishing process
○ Collection of chemical compounds includes the conversion of chemical names to chemical
structures, the generation of SMILES and InChI strings as well as the prediction of many
physicochemical parameters 26
27. @giribio
Girinath G. Pillai, PhD @giribio
Chemical Databases (contd…)
➢ MCule
○ Commercial database of small molecules
➢ Cambridge Structural Database - CSD
○ Repository for small molecule crystal structures in CIF format The CSD is compiled and
maintained by the CCDC.
➢ ChemIDPlus
○ Database of compounds and structures by US National Library of Medicine
➢ ChemBank
○ Public, web based informatics environment created by the Broad Institute's Chemical Biology
Program Includes freely available data derived from small molecules and small molecule
screens, and resources for studying the data
➢ Ligand Expo
○ Formerly Ligand Depot Provides chemical and structural information about small molecules
within the structure entries of the Protein Data Bank
27
28. @giribio
Girinath G. Pillai, PhD @giribio
Natural Product Database
➢ COCONUT
○ Open database of all natural products
➢ TCM
○ Free small molecular database on traditional Chinese medicine, for virtual screening It is
currently the world's largest TCM database, and contains 170 000 compounds
➢ IMPPAT
○ Database of Indian Medicinal Plants, Phytochemistry and Therapeutics
➢ AromaDb
○ Database of medicinal and aromatic plant’s aroma molecules with phytochemistry and
therapeutic potentials
➢ Dr Duke's Phytochemical and Ethnobotanical
○ Databases facilitate in depth plant, chemical, bioactivity, and ethnobotany searches using
scientific or common names
➢ CMAUP
○ Database of collective molecular activities of useful plants
28
29. @giribio
Girinath G. Pillai, PhD @giribio
Other Chemical Databases
➢ ChEBI ..(Chemical Entities of Biological Interest)
○ Freely available dictionary of molecular entities focused on ‘ chemical compounds provided
by the European Bioinformatics Institute
➢ KEGG DRUG
○ Comprehensive drug information resource for approved drugs in Japan, USA, and Europe
unified based on the chemical structures and/or the chemical components, and associated
with target, metabolizing enzyme, and other molecular interaction network information
Provided by the Kyoto Encyclopedia of Genes and Genomes
29
77. @giribio
Girinath G. Pillai, PhD @giribio
Some Questions!
➢ Predict drug-like molecules? Toxicity?
○ New Strategies
➢ How can we search efficiently? Intelligently?
○ New data structures and algorithms
○ Optimizing old structures
➢ How can we understand this much data?
○ Cluster and visualize millions of data points
○ Define commercially accessible space.
➢ Are there other useful things we can do with this?
○ Discover new polymers, etc.
○ Wonder about the origin of life.
○ Combinatorially combine all known chemicals.
77
78. @giribio
Girinath G. Pillai, PhD @giribio
CONCLUSION
Data Search is an art, dig data as much as you can.
Initially consider 25% score/qlty & 75% diversity as the
size of the lead reduces consider 75% score/qlty & 25%
diversity.
Consider Enrichment factors
Good synergetics between human expertise &
computational tools
Avoid Missed Opportunities
Understand significance of parameters/properties
Evaluate and decide the tool/approach
Check reliability of data used 78
79. @giribio
Girinath G. Pillai, PhD @giribio
Until you
try yourself and
train others
you cannot be
an expert
If you think you
finished
collecting all
Data! It's wrong.
Go and find
more data
80. @giribio
Girinath G. Pillai, PhD @giribio
THANKS
Do you have any questions?
www.zastrain.com
@giribio
80
We do accept research interns