3. INTRODUCTION
Chemoinformatics is the combination of chemistry and
information technology, is required for the processing
and analysis of chemical data.
Design, creation, organization, management, analysis,
visualization and use of chemical information.
It is the application of informatics methods to solve
chemical problems.
It is relevant to biologists because chemistry data are
important in many areas of molecular biology, e.g, in the
study of protein interactions.
4. The term chemoinformatics was defined by F.K.
Brown in 1998.
It combines the scientific working fields
of chemistry, computer science and information
science.
For example in the areas of topology, chemical
graph theory, information retrieval and data
mining in the chemical space.
It can also be applied to data analysis for various
industries like paper , pulp and dyes industries.
5. REPRESENTATION OF MOLECULES
1D– computed/experimental global properties
2D – the chemical structure diagram
3D – atomic coordinate data
7. MOL FORMAT
An MDL Molfile is a file format for holding information
about the atoms, bonds, connectivity and coordinates of
a molecule.
The format was created by MDL Information Systems
(MDL) [Molecular Design Limited, Inc.].
It is also supported by some computational software
such as Mathematica.
The molfile consists of - Header information ,the
Connection Table (CT) containing atom information,
then bond connections and types.
8. SDF FORMAT
SDF is one of a family of chemical-data file formats
developed by MDL.
SDF stands for structure-data file.
SDF files actually wrap the molfile (MDL Molfile) format.
Multiple compounds are delimited by lines consisting of
four dollar signs ($$$$).
A feature of the SDF format is its ability to include
associated data.
9. SMILES FORMAT
The Simplified Molecular Input Line Entry Specification
(SMILES) is a line notation for molecules.
SMILES strings include connectivity but do not include 2D
or 3D coordinates.
Hydrogen atoms are not represented.
Other atoms are represented by their element symbols C,
N,O, S, Cl, Br, and I.
The symbol "=" represents double bonds and "#"
represents triple bonds, branching is indicated by () and rings
are indicated by pairs of digits.
11. PUBCHEM DATABASE
https://pubchem.ncbi.nlm.nih.gov/
PubChem is a database of chemical molecules and their
activities against biological assays.
Is maintained by the NCBI.
It contains substance descriptions and small molecules
with more than 1000 atoms and 1000 bonds.
Searching - chemical structure, name
fragments, chemical formula, molecular weight.
13. CHEMBANK DATABASE
http://chembank.broadinstitute.org/
It is a public, web-based informatics environment.
Created by the Broad Institute’s Chemical Biology
Program.
It stores information of small molecules and biomedically
relevant assays.
It is intended to guide chemists synthesizing novel
compounds or libraries, to assist biologists searching for
small molecules.
15. CHEMBL DATABASE
https://www.ebi.ac.uk/chembl/
ChEMBLdb is a manually curated chemical
database of bioactive molecules.
It is maintained by the European Bioinformatics
Institute (EBI), of the EMBL.
ChEMBL database contains compound bioactivity
data against drug targets.
The latest version 2 (ChEMBL_02).
17. DRUGBANK DATABASE
https://www.drugbank.ca/
The DrugBank database is a comprehensive, freely
accessible, online database.
Containing information on drugs and drug targets.
It is widely used by the drug industry, medicinal
chemists, pharmacists, physicians, students.
The latest release of the database (version 5.0.11).
It contains 11,002 drug entries.
19. APPLICATIONS
Storage and retrieval- The primary application is the
storage, indexing and search of information relating to
compounds.
Virtual libraries- Chemical data can related to real or
virtual molecules,Virtual libraries of classes of
compounds: drugs, natural products, diversity-oriented
synthetic products.
Virtual screening-It involves computationally screening in
silico libraries of compounds.
Quantitative structure-activity relationship (QSAR)-Used
to predict the activity of compounds from their structures.
20. REFERENCES
An introduction to cheminformatics, A. R. Leach, V. J.
Gillet.
Cheminformatics, Johann Gasteiger and Thomas
Engel (Eds).
Molecular modelling – Principles and Applications, A.
R. Leach.
Gasteiger, Editor, Handbook of Chemoinformatics -
From Data to Knowledge, Wiley-VCH, Weinheim.