Chemical database preparation ppt

Chemical Database Preparation
for Compound Acquisition or
Virtual Screening
Lalit Samant
Research Officer
B J WADIA HOSPITAL FOR CHILDREN

Virtual Screening
• AIM:-
1. HTS
2. Biologically active
3. Rapid
4. Effective

Cont.
• The progression HTS hits = > HTS actives = > lead
series = > drug candidate = > launched drug has
shifted the focus from good-quality candidate
drugs to good-quality leads (10).
• A set of simple property filters known as the “rule
of five” (Ro5) (11) is implemented in the
pharmaceutical industry to restrict small-
molecule synthesis in the property space
defined by ClogP (octanol/water partition
coefficient), molecular weight etc.

Conditions to consider for Library
Desig
• Many library design programs based on
combinatorial chemistry or compound
acquisition are now Ro5 compliant.
• Smaller compounds are easier to optimize
toward the drug candidate status, and
leadlikeness has become an established con-
cept in drug discovery

Materials
1. Software to convert chemical structures based on standard file
formats (e.g., SDF, mol2) into canonical isomeric SMILES (15,16), or
equivalent representations of chemical structures
2. Software to handle canonical isomeric SMILES (or equivalent)
and provide chemicalfingerprints, e.g., Daylight (19), Unity (20), Mesa
Analytics and Computing (21), Barnard Chemical Information ([22];
3. Software to compute chemical properties from structures; e.g., to
calculate the octanol/ water partition coefficient, LogP with CLogP ,
KowWIN , or ALogPS
4. Software to cluster chemical structures from fingerprints or from
computed properties.

Cont.
5. Software to convert SMILES (or equivalent)
into appropriate three-dimensional (3D)–
coordinate systems using CONCORD
6. Software to appropriately handle D-optimal
design based on multidimensional spaces.

Methods
1. Assembling the Collection(s)
large pharmaceutical companies have acquired
compound collections, Reals , that contain a
significant number of molecules, including
marketed drugs and other high-activity
compounds. Reals-a valuable resource that is
routinely screened against novel targets.

Cont. Assembling
• such collections of structures must include existing sets
of commercially available chemicals, or Tangibles—
termed this way because one can conceivably acquire
them or synthesize them in-house using tractable
chemistry .
• Thus, any collection prepared for virtual or HTS would
sample both the in-house and the “external” chemical
spaces. In addition to the Reals and the Tangibles, one
can also define the Virtuals—an extremely large set of
molecules (1060–10200) that cannot all be made, at
least with current chemistry, but that can essentially be
used as “resource” for virtual screening.

Methods
2. Cleaning up the collection
There is no “perfect” chemical database, unless
it contains rather simple (e.g., NaCl, H2O) or a
rather small number of molecules. The user
needs to spend a significant effort in cleaning up
the collection, whether it includes Virtuals,
Reals, or Tangibles.

Cleaning up Cont.
2.1 Removing Garbage From the Collection
2.2 Verifying Integrity of Molecular Structure
2.3. Generation of Unique, Normalized SMILES

3. Filtering for Lead-Likeness
• After cleanup, the collection can be processed
to remove compounds that do not have
leadlike properties.
• It is advisable to cluster the remaining
“nonleadlike” set and to include a
representative set of these compounds (up to
30%), because they are likely to capture
additional chemotypes.

suggestions for exclusions according to
leadlikeness are as follows:
1. More than four rings.
2. More than three fused aromatic rings (avoid polyaromatic rings, because they
are likely to be processed by cytochrome P450 enzymes and yield epoxides and
other carcinogens).
3. HDO more than 4; HDO ≤ 5 is one of the Ro5 criteria, but 80% of drugs have HDO
less than 3
4. More than four halogens, except fluorine (avoid “pesticides”). A notable
exception is the crop-protectant business; in such situations, the collection must
be processed with entirely different criteria.
5. More than two CF3 groups (avoid highly halogenated molecules).
6. The removal of compounds that contain fragments responsible for
cytotoxicity

Important Note:-
• collection may t require different processing
criteria for different targets and discovery
goals;
• Eg- targets located in the lung require a
different pharmacokinetic profile,
• E.g., for inhalation therapy, compared with
targets located in the urinary tract that may
require good aqueous solubility at pH = 5.0

Methods cont.
3.4. Searching for Similarity If Known Active
Molecules are Available

3.5. Exploring Alternative Structures
The user should seek alternative structures by
modifying the canonical isomericSMILES, because
these may occur in solution or at the ligand-
receptor interface
a. Tautomerism,
b. Acid/base equilibria
c. chiral centers
Exploring alternative structures is advisable prior to
processing any collection with computational
means, such as for diversity analysis

3.6 Generating 3D Structures
• exploring one or more conformers per
molecule.- Very Essential

3.7. Selecting Chemical Structure Representatives
Screening compounds that are similar to known actives
increases the likelihood of finding new active compounds, but
it may not lead to different chemotypes, a highly desirable
situation in the industrial context. The severity of this
situation is increased if the original actives are covered by
third-party patents or if the lead chemotype is toxic.
Clustering methods aim at grouping molecules into “families”
(clusters) of related structures that are perceived—at a given
resolution— to be different from other chemical families.
With clustering, the end user has the ability to select one or
more representatives from each family. SMD methods aim at
sampling various areas of chemical space and selecting
representatives from each area.

3.7.1 Chemical descriptors
• Chemical descriptors are used to encode
chemical structures and properties of com-
pounds: 2D/3D binary fingerprints or counts
of different substructural features, or per-
haps (computed) physicochemical properties
(e.g., molecular weight, CLogP, HDO, HAC), as
well as other types of steric, electronic,
electrostatic, topological, or hydro- gen-
bonding descriptors.

3.7.2. Similarity (Dissimilarity)
Measure
• Chemical similarity is used to quantify the “distance”
between a pair of compounds (dissimilarity, or 1 −
similarity), or how related the two compounds are
(similarity).
• The basic tenet of chemical similarity is that molecules
exhibiting similar features are expected to have similar
biological activity (46).
• Similarity is, by definition, related to a particular
framework: that of a descriptor system (a metric by
which to judge similarity), as well as that of an object,
or class of objects, reference point with which objects
can be compared is needed (47).
• Similarity depends on the choice of molecular descrip-
tors (48), the choice of the weighting scheme(s), and
the similarity coefficient.

3.7.3. Clustering Algorithms
• Clustering algorithms can be classified using many criteria
and also implemented in different ways (29–32).
Hierarchical clustering methods have been traditionally
used to a greater extent, in part owing to computational
simplicity. More recently, chemical structure classifications
are examining nonhierarchical methods. In practice, the
indi- vidual choice of different factors (descriptors,
similarity measure, clustering algorithm) depends also on
the hardware and software resources available, the size
and diversity of the collection that must be clustered, and
not ultimately on the user experience in pro- ducing a
useful classification that has the ability to predict property
values.

3.7.4. Statistical Molecular Design
• SMD can be applied to rationally select
collection representatives, as illustrated for
building block selection in combinatorial
synthesis planning (55).

3.8. Assembling List of Compounds for
Acquisition or Virtual Screening
• Once provided with an output from one or
several methods for compound selection, the
now-selected collection representatives are
almost ready to be submitted for acquisition
or for virtual screening. The end user is
encouraged to allow non leadlike molecules to
be reentered into the candidate pool.
• An additional random, perhaps nonleadlike
selection (up to 30%) can, and should, be
entered in the final list of compounds.

Summery
1. Assemble the collection starting from in-house and on-line databases.
2. Clean up the collection by removing “garbage,” verifying structural
integrity, and making sure that only unique structures are screened.
3. Perform property filtering to remove unwanted structures based on
substructures, property profiling, or various scoring schemes; the
collection can become the virtual screening set at this stage, or it can be
further subdivided in a target- and project-dependent manner.
4. Use similarity to given actives to seek compounds with related
properties.
5. Explore the possible stereoisomers, tautomers, and protonation state
6. Generate the 3D structures in preparation for virtual screening, or for
computation of 3Ddescriptors.
7. Use clustering or SMD to select compound representatives for
acquisition.
8. Add a random subset to the final list of compounds. The final list can
now be submitted for compound acquisition or virtual screening.

Chemical database preparation ppt

More Related Content

What's hot

Similar to Chemical database preparation ppt

Recently uploaded

Chemical database preparation ppt