Chemoinformatic File Format.pptx

2
Understanding chemical space
Small molecules:
chemical synthesis
drug design
 chemical genomics,
systems biology
 nanotechnology
And others

1. Introduction
2. Molecular Representations
3. Chemical Data and Databases
4. Molecular Similarity
5. Chemical Reactions
6. Machine Learning and Other Predictive
Methods
7. Molecular Docking and Drug Discovery

• It encompasses the design, creation,
organisation, management, retrieval, analysis,
dissemination, visualization and use of chemical
information
• It is the mixing of information resources to
transform data into information and information
into knowledge, for the intended purpose of
making better decisions faster in the arena of
drug lead identification and optimization

• “the set of computer algorithms and tools
to store and analyse chemical data in the
context of drug discovery and design
projects”
• Chemoinformatics is the application of
informatics methods to solve chemical
problems

Books:
J. Gasteiger, T. E. and Engel, T. (Editors) (2003).
Chemoinformatics:ATextbook. Wiley.
A.R. Leach and V. J. Gillet (2005). An Introduction to
Chemoinformatics. Springer.
Journal:
Journal of Chemical Information and Modeling
Web:
http://cdb.ics.uci.edu
and many more………

The first, and still the core, journal for the subject, the Journal of Chemical
Documentation, started in 1961 (the name Changed to
the Journal of Chemical Information and computer Science in 1975)
The first book appeared in 1971 (Lynch, Harrison, Town and Ash,
Computer Handling of Chemical Structure Information)
The first international conference on the subject was held in 1973
at Noordwijkerhout and every three years since 1987

 Chemoinformatics encompasses the design, creation, organisation, management,
retrieval, analysis, dissemination, visualization and use of chemical information
'Cheminformatics combines the scientific working fields of chemistry and
computer science for example in the area of chemical graph theory and
mining the chemical space. It is to be expected that the chemical space
contains at least 1062 molecules

Year Cheminformatics Chemoinformatics Ratio
2000 39 684 0.05
2001 8,010 2,910 2.75
2002 34,000 16,000 2.12
2203 58,143 32,872 1.77
2204 85,435 60,439 1.41
2005 6,58,298 2,72,096 2.41
2006 3,17,000+ 1,63,000+ 1.94
• Cheminformatics, molecular informatics, chemical
informatics, or even Chemo bioinformatics

1) An enormous amount of data and maintenance of data
2)Can we gain enough knowledge from the known data to make
predictions for those cases where the required information is
not available?
3) Relationships between the structure of a compound and its
biological activity, or for the influence of reaction conditions
on chemical reactivity.

Advances in theoretical and computational chemistry now allow
chemists to model chemical compounds “in silico” with ever-
increasing accuracy.
Molecular properties now becoming accessible through
computation include molecular shape, electronic structure,
physical properties, chemical reactivity, protein folding, structures
of materials and surfaces, catalytic activity, and biochemical
activities.

integrates a comprehensive knowledge of chemistry with
an extensive understanding of information technology.
The intersection of chemistry and information
technology embraces an expanding territory;
computational modeling of individual molecules,
thermodynamic methods of estimating chemical
properties, methods of predicting biological activity of
hypothetical compounds, and organization and
classification of chemical information.

Schematic representationof a crowded cel . An
array of differentmoleculescan function
independentlyunderextremelycrowded
conditions, partly because of judicious
distributions of oppositelychargedpolargroups
onthe molecularsurfaces. H
owever, such
systems are insome ways extremely fragile.
Forexample, a mutationthat alters just one
amino acidinthe haemoglobin molecule can
stimulate massive aggregationandgive rise to
a fatalgenetic disease, sickle-cel anaemia.
Moregeneraly, manydisorders of oldage, most
famouslyAlzheimer’s disease, result fromthe
increasingly facile conversionof normal y
solubleproteinsintointractabledepositsthat
occur particularly as we get older M
any of these
aggregation processesinvolvethereversionof
the unique biological y active forms of
polypeptidechainsintoa genericand non-
functional ‘ chemical’ form

Additional computational challenges lie in indexing and
classifying the infinite population of chemical compounds that
could be synthesized or are already known.
Specific indexing and search problems include
how to find a compound that might block a specific biological
target;
how to predict the most efficient synthetic strategy for a desired
compound from available precursors;
how to employ results of bioactivity tests on a family of
molecules to design improved versions;

Currently combinatorial chemists are developing new methods of
synthesizing libraries of related compounds on an unprecedented
scale.
Such libraries can be used to produce huge arrays of materials for
investigation of biochemical, catalytic, or material properties.
Systems are required to design, catalog, and search these libraries,
assess test results in a meaningful way, and integrate new
information with existing chemical databases.
Investigations into information storage at the molecular level are
underway, bringing to full circle the link between chemistry and
information technology.

22
Representations and Structure Searching
Substructure Searching
Similarity Searching, Clustering, and Diversity Analysis
Searching Databases
Computer-aided Structure Elucidation
3D Substructure Searching
QSAR and Docking

Structure and applications of chemoinformatics
Database design and programming
Representation and searching of chemical structures
Structure, substructure & similarity searching in 2D & 3D
Markush and reaction searching
Representation and searching of biological databases
chemoinformatics software
Data analysis techniques
Clustering;
Evolutionary algorithms;
Graph theory;
Neural networks;
Chemical information sources
Cheminformatics applications
Techniques used to design bioactive compounds
Molecular simulation and design
Drug discovery process; QSAR; Combi-chem; SBDD
Spectroscopy and crystallography in cheminformatics

• Small-molecule databases
–Databases of commercially-available compounds (e.g. ACD,
http://www.mdl.com/products/experiment/available_chem_dir/index.jsp)
– Proprietary chemical structure databases
– Literature databases
– Patent databases
– Small project-specific databases
• Protein databases
– Public, online databases (e.g. PDB, http://www.pdb.org)
– Proprietary and project-specific databases

Software Companies
Accelrys -Large chemoinformatics company
ACD/Labs - analytical informatics & predictions
BCI - 2D fingerprinting, clustering toolkits & software
Bioreason - HTS data analysis software
Cambridgesoft - 2D drawing tools & E-notebooks
CAS - produce Scifinder Scholar searching software
ChemAxon - Java based toolkits and software
Daylight- 2D representation & searching software
Leadscope - 2D structure and property tools
Lion Bioscience - produce LeadNavigator
MDL - Large chemoinformatics company
Openeye - Fast 3D docking, structure generation, toolkits
Quantum Pharmaceuticals - prediction, docking, screening
Sage Informatics - ChemTK 2D analysis software
Tripos-Large chemoinformatics company

Journal of Chemical Information and Computer Sciences
Journal of Computer-Aided Molecular Design
Journal of Molecular Graphics and Modelling
Journal of Medicinal Chemistry
NetSci (online journal)
Scientific Computing World
Bio-IT World
Drug Discovery Today
Newsletters, Mailing Lists & Other Hubs
Chemical Informatics Letters- Monthly newsletter
CHMINF-L (Indiana)- Email discussion list
Chemoinf Yahoo Group -Email discussion list
Chemistry Software Yahoo Group
Cheminformatics.org Lots of links and QSAR datasets
Reactive Reports Chemistry Web Magazine

O
11
H 2
8
9
10
O
3
H
N4
5
6
7
1
Aliphatic- Capital
Aromatic-Small
Ring-By giving no.
Double bonds- “=” sign
Parentheses-branching in the molecule
Acetaminophen
SMILES Representation
c1c(O)ccc(NC(=O)C)c1
Sources of 3D structures information
• X-ray crystallography
• NMR spectroscopy

Web-based drawing tools
JME (http://www.molinspiration.com/cgi-bin/properties) is a clean, simple Java drawing tool.
Draw your structure and click on the smiley face to show the SMILES.
Marvin Sketch is a Java applet that allows you to draw structures, and export them as
SMILES, MDL MOL files or others.
Web-based depiction tools
Daylight Depiction Tool (http://www.daylight.com/daycgi/depict) is a very simple to use tool
that allows you to enter a SMILES string and will then produce a 2D structure diagram from
it.
CACTVS GIF generator has a more complex interface, but allows many more options for
producing GIF picture files of SMILES or other format structures. The quality of the images
is superior to the daylight tool.
MDL Chime (http://www.mdlchime.com) is a browser-based plugin that can display both 2D
and interactive 3D structures in web pages.

Daylight DayCart – http://www.daylight.com/products/daycart.html
•Tripos Auspyx –
ttp://www.tripos.com/sciTech/inSilicoDisc/chemInfo/auspyx.html
•Accelrys Accord for Oracle –
http://www.accelrys.com/accord/oracle.html
•MDL Relational Chemistry Server –
http://www.mdl.com/products/isisdirect.html
• IDBS ActivityBase – http://www.id-bs.com/products/abase/
• Chemaxon JChem Cartridge – http://www.jchem.com

Concord from Tripos, Inc. One of the first 3D structure generation
programs, and is still being refined and developed. It generates single,
minimal-energy structures from input 2D structures. The program can
input and output a variety of file formats.
http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/concord.html
Corina from the Gasteiger group. It is similar to Concord.
http://www2.chemie.uni-erlangen.de/software/corina/free_struct.html
Omega from OpenEye is the latest release. It offers very fast generation
of multiple low-energy conformers.
http://www.eyesopen.com/products/applications/omega.html

MDL Chime is a web browser plug-in that allows 2D and 3D structures to be
viewed in web pages. It can be used to visualize both proteins and small
molecules, and includes some limited ability to create molecular surfaces. It
is excellent for communicating structures via the web and for use in writing
web-based chemoinformatics software. http://www.mdlchime.com
ArgusLab is a free molecular modeling program that has a fairly extensive
set of options for 3D visualization, calculation of surfaces and properties,
minimization, and molecular docking. http://www.arguslab.com.

Data Analysis Methods
 Unsupervised-artificial neural networks, genetic algorithms
Supervised - inductive learning methods statistics, pattern
recognition methods
• Quantum mechanical calculations
• Additive schemes

Chemical(s)
of concern
Chemical
Specific
data
Structural
analogue
Property
analogue
Biological or
mechanistic
analogue
Data bases
Data mining
Structure
searchable
Structure activity relationships

Input
• Quantitative analysis of chemical data relied exclusively on
Multilinear regression analysis.
• Artificial neural networks
An artificial neural network (ANN) or commonly just neural network (NN)
is an interconnected group of artificial neurons that uses a mathematical
model or computational model for information processing based on a
connectionist approach to computation.
Hidden
Hidden
Output

32
• A field of exercise for artificial intelligence techniques.
• The DENDRAL project, initiated in 1964 at Stanford University
Computer-Assisted Synthesis Design (CASD)
In 1969 Corey and Wipke worked for the development of a
synthesis design system.
1. Substructure searching
2. Similarity searching

33
1. Chemical Information
 Storage and retrieval of chemical structures and associated
data to manage the flood of data
 Dissemination of data on the internet
 Cross-linking of data to information
2. All fields of chemistry
• Prediction of the physical, chemical, or biological properties
of compounds

• identification of new lead structures
• optimization of lead structures
• establishment of quantitative structure-activity relationships
• comparison of chemical libraries
• definition and analysis of structural diversity
•planning of chemical libraries
37

Contd……
• analysis of high-throughput data
• docking of a ligand into a receptor
• prediction of the metabolism of xenobiotics
• analysis of biochemical pathways
35
4. Organic Chemistry
• Prediction of the course and products of organic reactions
• Design of organic synthesis

• Analysis of data from analytical chemistry to make predictions on
the quality, origin, and age of the investigated objects
• Elucidation of the structure of a compound based on spectroscopic
data
36
Teaching Chemoinformatics
 Chemists have to become more efficient in planning their
experiments, have to extract more knowledge from their
data

38
Global
toxicity
model
Supporting
information
Toxicity
prediction
Hypothesis
generation
Analogue
search
Chemical
class
assignment
Class based
SAR model
Weight of
evidence of
toxicity
presentation
Data collection
Q

University of Barcelona, Spain
University of Erlangen-Nürnberg, Germany
Bioinformatics Institute Of India , Chandigarh
Georgia Institute of Technology
University of Sheffield (Willett) - MSc/PhD programs
University of Erlangen (Gasteiger)
UCSF (Kuntz)
University of Texas (Pearlman)
Yale (Jorgensen)
University of Michigan (Crippen)
Indiana University (Wiggins) - MSc program
Cambridge Unilever (Glen, Goodman, Murray-Rust)
Scripps - Molecular Graphics lab
39

DRUG DESIGN ENVIRONMENTAL PROTECTION
Maximum activity Prediction of toxicity
Minimize toxicity
•Single therapeutic target
•Drug like chemical
•Some toxicity anticipated
•Multiple unknown targets
•Diverse Structures
•Human and ecosystems

Chemoinformatic File Format.pptx

More Related Content

What's hot

Similar to Chemoinformatic File Format.pptx

More from wadhava gurumeet

Recently uploaded

Chemoinformatic File Format.pptx