+
Computational Structural
Biology
Methods & Applications
Antonio E. Serrano PhD MT
@xideral
xideral.com
2012
+
3D Structures of Proteins
 Site directed mutagenesis
 Mapping of disease-related
mutations
 The structure-based design of
specific inhibitors
 X-ray crystallography
 High-resolution electron
microscopy
 Nuclear magnetic resonance
spectroscopy
Methods Techniques
+
Protein Data Bank Holdings (2007)
• The remaining 3.3 million sequences exceed the number of known
3D structures by more than two orders of magnitude.
• The gap in structural knowledge must be bridged by computation.
+
Modeling Methods
1. Comparative: Homology modeling that uses related protein
family members as templates to model the structure of the
protein of interest. Can only be employed when a detectable
template of known structure is available.
2. Fold recognition: Threading methods are used that have low or
statistically insignificant sequence similarity to proteins of known
structure
3. De novo: Ab initio methods aim to predict purely from its primary
sequence, using principles of physics that govern protein folding
and/or using information derived from known structures but
without relying on any evolutionary relationship to known folds.
4. Integrative: Hybrid methods that combine information from a
varied set of computational and experimental sources.
+
Comparative Protein Structure
Method
a. Identification of suitable template structures related to the
target protein and the alignment of the target and template(s)
sequences
b. Modeling of the structurally conserved regions and the
prediction of structurally variable regions
c. Refinement of the initial model
d. Evaluation of the resulting model(s).
+
a. Identification of suitable template
 The sequence identity of the target-template alignment is the
most commonly used metric to quantify the similarity
 Is a good predictor of the quality of the resulting model
 It is thus crucial to consider the target-template sequence
identity level when selecting template structures as this will
have a critical impact on the quality of the resulting model and
hence, its potential applications.
 The overall accuracy of 40% or higher is almost always good,
deviate by less than 2Å RMSD from the experimentally
determined structure
+
a. Identification of suitable template
 Alignment based:
 PSI-Blast
 Hidden Markov Models
 SAM
 HMMER
 Profile-profile alignment
 FFAS03
 Profile.scan
 HHsearch
+
b. Modeling of the structurally
conserved regions
 All-atom model
 Two approaches for model building:
 Rigid fragment assembly approach: An initial model is constructed
from structurally conserved core regions of the template and from
structural fragments obtained from either aligned or unrelated
structures. Optimization procedure to refine its geometry and
stereochemistry
 Single optimization strategy: Attempts to maximize the satisfaction
of spatial restraints obtained from the target-template alignment,
known protein structures, and molecular mechanics force-fields.
May not require a separate refinement step. Specialized protocols to
enhance the accuracy of the non-conserved regions of the
alignment such as loops and/or side chains.
+
c. Model Refinement
 Energy minimization molecular mechanics force fields
 Molecular dynamics
 Monte Carlo
 Genetic algorithm-based sampling methods
d. Model evaluation
 Fold assessment: ” that seeks to ensure the calculated models possess
the correct fold and helps in detecting errors in template selection, fold
recognition, and target-template alignment
 Identify the model that is closest to the native structure out of a number of
alternative models. A combination of such assessments is usually
employed to select the most accurate model from amongst a set of
alternative models, generated based on different templates and/or
alignments..
+
Acurracy & Limitations of Comparative
Protein Structure Modeling
1. The availability of suitable
template structures
2. The ability of alignment
methods to calculate an
accurate alignment between
the target and template
sequences, even when the
relationship between them is
remote
3. The structural and functional
divergence between the
target and the template
 More than 50% sequence identity
: high accuracy models (1 Å root
mean square deviation, RMSD).
Inaccuracies are mainly found in
the packing of side chains and
loop regions.
 30 to 50% sequence identity:
Medium accuracy models where
the most frequent errors include
side-chain packing errors, slight
distortions of the protein core,
inaccurate loop modeling, and
sporadic alignment mistakes.
 Less than 30%: Low accuracy
models. Alignment errors
increase rapidly,
+
De novo Modeling Techniques
 Do not explicitly rely on whole known structures as templates.
Thus, the structure of any protein can be predicted by these de
novo methods.
 ab initio prediction: Subset of de novo methods that rely on
energy functions based solely on physicochemical interactions, not
on the PDB.
 Most of the successful de novo prediction methods that are
applicable to larger protein segments (up to ~150 residues). use
information from known protein structures.
 Assume that the native state of a protein is at the global free
energy minimum and carry out a large-scale search of
conformational space for protein tertiary structures that are
particularly low in free energy for the given amino acid sequence
+
De novo Modeling Techniques
 Rosetta method developed by Baker and coworkers uses an
ensemble of short structural fragments extracted from the PDB
 The Rosetta fragment assembly strategy has been successfully
applied to de novo structure prediction, as well as to modeling of
structurally variable regions (loops, inser- tions) in comparative
protein structure models.
 Have made tremendous progress over the last decade, and
several individual examples of highly accurate predictions have
been reported
 There are still significant limitations that restrict their application for
routine use:
 Computational demand is immense and therefore limits these methods
to relatively small systems.
 The overall quality of the resulting models decreases with the increasing
size of the protein
three-dimensional
atomic-level
experimental
protein
structure
determination
Comparative
protein
structure
modeling
+
Structural Genomics
 Is a worldwide initiative aimed at rapidly determining a large
number of protein structures using a high-throughput mode
 X-ray crystallography
 NMR spectroscopy.
 Each step of experimental structure determination has become:
 More efficient
 Less expensive
 More likely to succeed
 Structural genomics initiatives are making significant con- tribution
to both the scope and depth of our structural knowledge about
protein families.
 Although worldwide structural genomics initiatives only account for
~20% of the new structures,
+
Integrative (Hybrid) Modeling
Techniques
 Biological interactions remain
uncharacterized by traditional
structural biology techniques
such as X-ray crystallography
and NMR spectroscopy
 This gap is being bridged by
several approaches
 Stoichiometry and composition of
protein:
 Quantitative immunoblotting
 Mass spectrometry.
 The shape of the assembly can
be revealed by
 Electron microscopy
 Small angle X-ray scattering.
 The positions of the components
can be elucidated by:
 Cryoelectron microscopy
 Labeling techniques.
 Components interaction:
 Mass spectrometry
 Yeast two-hybrid
 Affinity purification.
 Relative orientations :
 Cryoelectron microscopy
 Hydrogen/ deuterium
exchange,
 Hydroxyl radical footprinting
 Chemical- crosslinking
+
Integrative structure determination
Electron
microscopy
Immunoelectron
microscopy
Affinity
purification
+
Applications
 Designing experiments for site-directed mutagenesis
 Protein engineering
 Predicting ligand binding sites
 Docking small molecules
 Structure-based drug discovery
 Studying the effect of mutations and SNPs
 Phasing X-ray diffraction data in molecular replacement
+
Protein Modeling Servers and
Software
+
Protein Model Databases
+
Cryoelectron microscopy
 Is emerging as a key technique for studying 3D structures of
multi-component macromolecular complexes with masses
>250 kDa
 Membrane proteins
 Cytoskeletal complexes
 Ribosomes
 Quasi spherical viruses
 Molecular chaperones
 Flagella
 Ion channels
 Oligomeric enzymes.

Modelling Proteins By Computational Structural Biology

  • 1.
    + Computational Structural Biology Methods &Applications Antonio E. Serrano PhD MT @xideral xideral.com 2012
  • 2.
    + 3D Structures ofProteins  Site directed mutagenesis  Mapping of disease-related mutations  The structure-based design of specific inhibitors  X-ray crystallography  High-resolution electron microscopy  Nuclear magnetic resonance spectroscopy Methods Techniques
  • 3.
    + Protein Data BankHoldings (2007) • The remaining 3.3 million sequences exceed the number of known 3D structures by more than two orders of magnitude. • The gap in structural knowledge must be bridged by computation.
  • 4.
    + Modeling Methods 1. Comparative:Homology modeling that uses related protein family members as templates to model the structure of the protein of interest. Can only be employed when a detectable template of known structure is available. 2. Fold recognition: Threading methods are used that have low or statistically insignificant sequence similarity to proteins of known structure 3. De novo: Ab initio methods aim to predict purely from its primary sequence, using principles of physics that govern protein folding and/or using information derived from known structures but without relying on any evolutionary relationship to known folds. 4. Integrative: Hybrid methods that combine information from a varied set of computational and experimental sources.
  • 5.
    + Comparative Protein Structure Method a.Identification of suitable template structures related to the target protein and the alignment of the target and template(s) sequences b. Modeling of the structurally conserved regions and the prediction of structurally variable regions c. Refinement of the initial model d. Evaluation of the resulting model(s).
  • 6.
    + a. Identification ofsuitable template  The sequence identity of the target-template alignment is the most commonly used metric to quantify the similarity  Is a good predictor of the quality of the resulting model  It is thus crucial to consider the target-template sequence identity level when selecting template structures as this will have a critical impact on the quality of the resulting model and hence, its potential applications.  The overall accuracy of 40% or higher is almost always good, deviate by less than 2Å RMSD from the experimentally determined structure
  • 7.
    + a. Identification ofsuitable template  Alignment based:  PSI-Blast  Hidden Markov Models  SAM  HMMER  Profile-profile alignment  FFAS03  Profile.scan  HHsearch
  • 8.
    + b. Modeling ofthe structurally conserved regions  All-atom model  Two approaches for model building:  Rigid fragment assembly approach: An initial model is constructed from structurally conserved core regions of the template and from structural fragments obtained from either aligned or unrelated structures. Optimization procedure to refine its geometry and stereochemistry  Single optimization strategy: Attempts to maximize the satisfaction of spatial restraints obtained from the target-template alignment, known protein structures, and molecular mechanics force-fields. May not require a separate refinement step. Specialized protocols to enhance the accuracy of the non-conserved regions of the alignment such as loops and/or side chains.
  • 9.
    + c. Model Refinement Energy minimization molecular mechanics force fields  Molecular dynamics  Monte Carlo  Genetic algorithm-based sampling methods d. Model evaluation  Fold assessment: ” that seeks to ensure the calculated models possess the correct fold and helps in detecting errors in template selection, fold recognition, and target-template alignment  Identify the model that is closest to the native structure out of a number of alternative models. A combination of such assessments is usually employed to select the most accurate model from amongst a set of alternative models, generated based on different templates and/or alignments..
  • 10.
    + Acurracy & Limitationsof Comparative Protein Structure Modeling 1. The availability of suitable template structures 2. The ability of alignment methods to calculate an accurate alignment between the target and template sequences, even when the relationship between them is remote 3. The structural and functional divergence between the target and the template  More than 50% sequence identity : high accuracy models (1 Å root mean square deviation, RMSD). Inaccuracies are mainly found in the packing of side chains and loop regions.  30 to 50% sequence identity: Medium accuracy models where the most frequent errors include side-chain packing errors, slight distortions of the protein core, inaccurate loop modeling, and sporadic alignment mistakes.  Less than 30%: Low accuracy models. Alignment errors increase rapidly,
  • 11.
    + De novo ModelingTechniques  Do not explicitly rely on whole known structures as templates. Thus, the structure of any protein can be predicted by these de novo methods.  ab initio prediction: Subset of de novo methods that rely on energy functions based solely on physicochemical interactions, not on the PDB.  Most of the successful de novo prediction methods that are applicable to larger protein segments (up to ~150 residues). use information from known protein structures.  Assume that the native state of a protein is at the global free energy minimum and carry out a large-scale search of conformational space for protein tertiary structures that are particularly low in free energy for the given amino acid sequence
  • 12.
    + De novo ModelingTechniques  Rosetta method developed by Baker and coworkers uses an ensemble of short structural fragments extracted from the PDB  The Rosetta fragment assembly strategy has been successfully applied to de novo structure prediction, as well as to modeling of structurally variable regions (loops, inser- tions) in comparative protein structure models.  Have made tremendous progress over the last decade, and several individual examples of highly accurate predictions have been reported  There are still significant limitations that restrict their application for routine use:  Computational demand is immense and therefore limits these methods to relatively small systems.  The overall quality of the resulting models decreases with the increasing size of the protein
  • 13.
  • 14.
    + Structural Genomics  Isa worldwide initiative aimed at rapidly determining a large number of protein structures using a high-throughput mode  X-ray crystallography  NMR spectroscopy.  Each step of experimental structure determination has become:  More efficient  Less expensive  More likely to succeed  Structural genomics initiatives are making significant con- tribution to both the scope and depth of our structural knowledge about protein families.  Although worldwide structural genomics initiatives only account for ~20% of the new structures,
  • 15.
    + Integrative (Hybrid) Modeling Techniques Biological interactions remain uncharacterized by traditional structural biology techniques such as X-ray crystallography and NMR spectroscopy  This gap is being bridged by several approaches  Stoichiometry and composition of protein:  Quantitative immunoblotting  Mass spectrometry.  The shape of the assembly can be revealed by  Electron microscopy  Small angle X-ray scattering.  The positions of the components can be elucidated by:  Cryoelectron microscopy  Labeling techniques.  Components interaction:  Mass spectrometry  Yeast two-hybrid  Affinity purification.  Relative orientations :  Cryoelectron microscopy  Hydrogen/ deuterium exchange,  Hydroxyl radical footprinting  Chemical- crosslinking
  • 16.
  • 17.
    + Applications  Designing experimentsfor site-directed mutagenesis  Protein engineering  Predicting ligand binding sites  Docking small molecules  Structure-based drug discovery  Studying the effect of mutations and SNPs  Phasing X-ray diffraction data in molecular replacement
  • 18.
  • 19.
  • 20.
    + Cryoelectron microscopy  Isemerging as a key technique for studying 3D structures of multi-component macromolecular complexes with masses >250 kDa  Membrane proteins  Cytoskeletal complexes  Ribosomes  Quasi spherical viruses  Molecular chaperones  Flagella  Ion channels  Oligomeric enzymes.