Chemoinformatics
P. Baldi, J. Chen, and S. J. Swamidass
School of Information and Computer Sciences
Institute for Genomics and Bioinformatics
University of California, Irvine
2
Overall Outline
1. Introduction
2. Molecular Representations
3. Chemical Data and Databases
4. Molecular Similarity
5. Chemical Reactions
6. Machine Learning and Other Predictive
Methods
7. Molecular Docking and Drug Discovery
3
1. Introduction
• What is Chemoinformatics
• Resources
• Brief Historical Perspective
• Chemical Space: Small Molecules
• Overview of Problems and Methods
4
What is Chemoinformatics?
• chemoinformatics encompasses the
design, creation, organisation,
management, retrieval, analysis,
dissemination, visualization and use of
chemical information
5
What is Chemoinformatics?
• "the mixing of information resources to
transform data into information and
information into knowledge, for the
intended purpose of making better
decisions faster in the arena of drug
lead identification and optimizaton"
6
What is Chemoinformatics?
• “the set of computer algorithms and
tools to store and analyse chemical
data in the context of drug discovery
and design projects”
• However: drug design/discovery is to
chemoinformatics like DNA/RNA/
protein sequencing is to bioinformatics
7
Resources
Books:
J. Gasteiger, T. E. and Engel, T. (Editors) (2003).
Chemoinformatics: A Textbook. Wiley.
A.R. Leach and V. J. Gillet (2005). An Introduction to
Chemoinformatics. Springer.
Journal:
Journal of Chemical Information and Modeling
Web:
http://cdb.ics.uci.edu
and many more………
8
Brief Historical Perspective
• Historical perspective: physics, chemistry
and biology
• Theorem:
computers/biology or computers/physics>>
computers/chemistry
• Proof:
Genbank, Swissprot, PDB, Web (CERN),
etc..
9
Caveat: Long Tradition
• Quantum Mechanics
• Docking
• Beilstein
• ACS
• Etc…
Gasteiger, J. (2006). "Chemoinformatics: a new
field with a long tradition." Anal Bioanal
Chem(384): 57-64.
10
Possible Causes
• Alchemy
• Industrial age and early commercial
applications of chemistry
• Concurrent development of modern
computers and modern biology
• Scientific differences (theory/process)
• Psychological perceptions (life/inert)
• ACM
11
Chemical Space: Small Molecules
in Organic Chemistry
• Understanding chemical space
• Small molecules:
– chemical synthesis
– drug design
– chemical genomics,
– systems biology
– nanotechnology
– etc
12
“A mathematician is a machine that converts coffee into theorems”
P. Erdos
13
Cholesterol
14
Aspirin
15
“A chemoinformatician is a machine …..…”
16
Chemical Space
Stars Small
Mol.
Existing 1022
107
Virtual 0 1060
(?)
Mode Real Virtual
Access Difficult “Easy”
17
Chemoinformatics
• Historical perspective: physics, chemistry and biology
• Understanding chemical space
• Small molecules (chemical synthesis, drug design,
chemical genomics, systems biology, nanotechnology)
• Predict physical, chemical, biological properties
(classification/regression)
• Build filters/tools to efficiently navigate chemical space to
discover new drugs, new reactions, new “galaxies”, etc.
18
Chemo/Bio Informatics
Two Key Ingredients
1. Data
2. Similarity Measures
Bioinformatics analogy and differences:
– Data (GenBank, Swissprot, PDB)
– Similarity (BLAST)
19
Computational/Predictive Methods
• Spetrum of methods:
– Quantum Mechanics
– ….
– Molecular Mechanics
– ….
– Machine Learning
20
Quantum Mechanics
Schrodinger’s Equation (time independent)
Hψ=Eψ
H=(-h2
/8π2
m)∂2
+V = Hamiltonian Operator
E= Energy
V =external potential (time independent)
ψ= ψ(x,t) =(complex) wave function = ψ(x)T(t)
(time independent case)
Ψ2
= Ψ* Ψ =probability density function (particle at
position x)
21
Schrodinger Equation
• Partial differential eigenvalue equation
• Where are the electrons and nuclei of a molecule in
space?
• Uncer a given set of conditions, what are their energies?
• Difficult to solve exactly as number of particle grows
(electron-electron interactions, etc)
• Approximate methods
– Ab initio
– Semi empirical
• 3D structures
• Reaction mechanisms, rates
22
Ab Initio
• Limited to tens of atoms and best performed
using a cluster or supercomputer
• Can be applied to organics, organo-metallics,
and molecular fragments (e.g. catalytic
components of an enzyme)
• Vacuum or implicit solvent environment
• Can be used to study ground, transition, and
excited states (certain methods)
• Specific implementations include: GAMESS,
GAUSSIAN, etc.
23
Semiempirical Methods
• Semiempirical methods use parameters that compensate
for neglecting some of the time consuming mathematical
terms in Schrodinger's equation, whereas ab initio
methods include all such terms.
• The parameters used by semiempirical methods can be
derived from experimental measurements or by
performing ab initio calculations on model
systems.Limited to hundreds of atoms
• Can be applied to organics, organo-metallics, and small
oligomers (peptide, nucleotide, saccharide)
• Can be used to study ground, transition, and excited
states (certain methods).
• Specific implementations include: AMPAC, MOPAC, and
ZINDO.
24
Molecular Mechanics
• Force field approximation
• Ignore electrons
• Calculate energy of a system as a function
of nuclear positions
25
Molecular Mechanics
Energy = Stretching Energy + Bending Energy + Torsion
Energy + Non-Bonded Interactions Energy
26
Stretching Energy
27
Bending Energy
28
Torsion Energy
29
Non-Bonded Energy
30
Statistical/Machine Learning
Methods
NNs and recursive NNs
GA
SGs
Graphical Models
Kernels
………
Representations are essential. Must either (1) deal
with non-standard data structures of variable
size; or (2) represent the data in a standard
vector format.

ISMB2006_Intro10099999999999999999999.ppt

  • 1.
    Chemoinformatics P. Baldi, J.Chen, and S. J. Swamidass School of Information and Computer Sciences Institute for Genomics and Bioinformatics University of California, Irvine
  • 2.
    2 Overall Outline 1. Introduction 2.Molecular Representations 3. Chemical Data and Databases 4. Molecular Similarity 5. Chemical Reactions 6. Machine Learning and Other Predictive Methods 7. Molecular Docking and Drug Discovery
  • 3.
    3 1. Introduction • Whatis Chemoinformatics • Resources • Brief Historical Perspective • Chemical Space: Small Molecules • Overview of Problems and Methods
  • 4.
    4 What is Chemoinformatics? •chemoinformatics encompasses the design, creation, organisation, management, retrieval, analysis, dissemination, visualization and use of chemical information
  • 5.
    5 What is Chemoinformatics? •"the mixing of information resources to transform data into information and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and optimizaton"
  • 6.
    6 What is Chemoinformatics? •“the set of computer algorithms and tools to store and analyse chemical data in the context of drug discovery and design projects” • However: drug design/discovery is to chemoinformatics like DNA/RNA/ protein sequencing is to bioinformatics
  • 7.
    7 Resources Books: J. Gasteiger, T.E. and Engel, T. (Editors) (2003). Chemoinformatics: A Textbook. Wiley. A.R. Leach and V. J. Gillet (2005). An Introduction to Chemoinformatics. Springer. Journal: Journal of Chemical Information and Modeling Web: http://cdb.ics.uci.edu and many more………
  • 8.
    8 Brief Historical Perspective •Historical perspective: physics, chemistry and biology • Theorem: computers/biology or computers/physics>> computers/chemistry • Proof: Genbank, Swissprot, PDB, Web (CERN), etc..
  • 9.
    9 Caveat: Long Tradition •Quantum Mechanics • Docking • Beilstein • ACS • Etc… Gasteiger, J. (2006). "Chemoinformatics: a new field with a long tradition." Anal Bioanal Chem(384): 57-64.
  • 10.
    10 Possible Causes • Alchemy •Industrial age and early commercial applications of chemistry • Concurrent development of modern computers and modern biology • Scientific differences (theory/process) • Psychological perceptions (life/inert) • ACM
  • 11.
    11 Chemical Space: SmallMolecules in Organic Chemistry • Understanding chemical space • Small molecules: – chemical synthesis – drug design – chemical genomics, – systems biology – nanotechnology – etc
  • 12.
    12 “A mathematician isa machine that converts coffee into theorems” P. Erdos
  • 13.
  • 14.
  • 15.
    15 “A chemoinformatician isa machine …..…”
  • 16.
    16 Chemical Space Stars Small Mol. Existing1022 107 Virtual 0 1060 (?) Mode Real Virtual Access Difficult “Easy”
  • 17.
    17 Chemoinformatics • Historical perspective:physics, chemistry and biology • Understanding chemical space • Small molecules (chemical synthesis, drug design, chemical genomics, systems biology, nanotechnology) • Predict physical, chemical, biological properties (classification/regression) • Build filters/tools to efficiently navigate chemical space to discover new drugs, new reactions, new “galaxies”, etc.
  • 18.
    18 Chemo/Bio Informatics Two KeyIngredients 1. Data 2. Similarity Measures Bioinformatics analogy and differences: – Data (GenBank, Swissprot, PDB) – Similarity (BLAST)
  • 19.
    19 Computational/Predictive Methods • Spetrumof methods: – Quantum Mechanics – …. – Molecular Mechanics – …. – Machine Learning
  • 20.
    20 Quantum Mechanics Schrodinger’s Equation(time independent) Hψ=Eψ H=(-h2 /8π2 m)∂2 +V = Hamiltonian Operator E= Energy V =external potential (time independent) ψ= ψ(x,t) =(complex) wave function = ψ(x)T(t) (time independent case) Ψ2 = Ψ* Ψ =probability density function (particle at position x)
  • 21.
    21 Schrodinger Equation • Partialdifferential eigenvalue equation • Where are the electrons and nuclei of a molecule in space? • Uncer a given set of conditions, what are their energies? • Difficult to solve exactly as number of particle grows (electron-electron interactions, etc) • Approximate methods – Ab initio – Semi empirical • 3D structures • Reaction mechanisms, rates
  • 22.
    22 Ab Initio • Limitedto tens of atoms and best performed using a cluster or supercomputer • Can be applied to organics, organo-metallics, and molecular fragments (e.g. catalytic components of an enzyme) • Vacuum or implicit solvent environment • Can be used to study ground, transition, and excited states (certain methods) • Specific implementations include: GAMESS, GAUSSIAN, etc.
  • 23.
    23 Semiempirical Methods • Semiempiricalmethods use parameters that compensate for neglecting some of the time consuming mathematical terms in Schrodinger's equation, whereas ab initio methods include all such terms. • The parameters used by semiempirical methods can be derived from experimental measurements or by performing ab initio calculations on model systems.Limited to hundreds of atoms • Can be applied to organics, organo-metallics, and small oligomers (peptide, nucleotide, saccharide) • Can be used to study ground, transition, and excited states (certain methods). • Specific implementations include: AMPAC, MOPAC, and ZINDO.
  • 24.
    24 Molecular Mechanics • Forcefield approximation • Ignore electrons • Calculate energy of a system as a function of nuclear positions
  • 25.
    25 Molecular Mechanics Energy =Stretching Energy + Bending Energy + Torsion Energy + Non-Bonded Interactions Energy
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    30 Statistical/Machine Learning Methods NNs andrecursive NNs GA SGs Graphical Models Kernels ……… Representations are essential. Must either (1) deal with non-standard data structures of variable size; or (2) represent the data in a standard vector format.

Editor's Notes

  • #29 6-12 Lennard Jones + coulomb 10-12 Lenard Jones for Hydrogen bonding VdW forces arise from temporal aspect and concernted fluctuatio ns C_ij missing in the Coulomb term Non bonded term applies only to atoms that are at least 3 bonds away in a molecule, or between different molecules Electrons have a non even distribution in a molecular system When distribution is summarized by one number located at the nucleus==partial atomic charges or net atomic charges Non bonded energy sum of interaction between point charges Accurate with enough point charges Assigning point charges to nuclei only is not enough No consensus on how to derive the perfect set of point charges Solvent –Poisson Boltzmann aproximation There is no such thing as a chemical bond!