HOMOLOGY MODELING
CONTENTS
 INTODUCTION
 HISTORY
 Steps for homology modeling
1. Template selection
2. Sequence alignment
3. Backbone modeling
4. Loop modeling
5. Side chain refinement
6. Model refinement using energy function
7. Model evaluation
8. Comprehensive modeling programs
 Homology based databases
 Applications
INTRODUCTION
 There are 3 computational approach for protein 3D structural modeling
prediction.
 Ab initio prediction
 Threading
 Homology modeling
 Prediction of 3D structure of a target (unknown) protein
based on known structure of homologous protein.
 A model is desirable when either x-ray crystallography or NMR structure
of a protein, in time or on time.
Models are more accurate.
HOMOLOGY MODELING
 Also called as comparative modeling
Knowledge-based modeling
 Predicts 3D structure of protein on basis of sequence homology
 Principle:
Homology modeling is based on a reasonable assumption that two
homologous proteins will share very similar structure because a protein
fold is evolutionarily more conserved than its amino acid sequence .
 Target protein can be modeled with reasonable accuracy on
distantly related template
Target sequence Template sequence
Homologous sequences
Structure prepared
by NMR or X ray
crstallography
comparison
prediction
Aim
HISTORY
 The first homology modelling studies were done using wire and plastic models of
bonds and atoms as early as the 1960’s.
 The models were constructed by taking the coordinates of a known protein structure
and modified by hand for those amino acids that did not match the structure.
 In 1969 David Phillips, Brown and co-workers published the first paper regarding
homology modelling.
 They modelled a-lactalbumin based on the structure of hen-egg white lysozyme.
 The sequence identity between these two proteins was 39%. In addition both
proteins contained an identical pattern of cysteins suggesting a similar arrangement
of disulphide bonds.
 When the structure of a-lactalbumin was solved by X-ray crystallography it was
compared to the model and analysed.
 The model was essentially correct apart from the C-terminal ends.
7 STEPS OF HOMOLOGY MODELING
Template selection
Sequence alignment
Backbone model building
Loop modeling
Side chain refinement
Model refinement using energy function
Model evaluation
Template recognition and initial alignment
 Two proteins can be
regarded as homologous if %
identity falls in safe zone.
 At least 30%
Template selection involves
searching PDB for homologous
protein.
Search can be performed using:
BLAST FASTA SCANPS SSEARCH
This gives us probable set of templates.
Out of which we will select the one with Max identity
In case of multiple max identity choose highest resolution and
similar cofactors.
In case we get no similar seq.
Go for search by
1. PSI-BLAST
2. Threading
3. Structurally conserved regions
Sequence alignment correction
1. Full length of template and target are realigned to
obtain optimal alignment.
2. Incorrect alignment at this stage would lead to
incorrect model.
3. We may use multiple alignment algorithms such as
PARLINE and T-COFFEE.
4. Even after these manual alignment may be needed to
improve alignment.
5. Insertions and deletions can be made until gaps are as
small as possible.
Alignment correction
S
G
P
L
A
E
R
C
I V
C
R
M
P
E
V
C
R M
P
E
F-D-
-A-V
TEMPLATE
BEST ALIGNED
TARGET WITH
GAPS
TARGET
AFTER
SHIFTING
RESIDUES
CPISRTGASIFRCW (TEMPLATE)
CPISRTA----FRCW (TARGET) OPTION A
CPISRTGASIFRCW (TEMPLATE)
CPISRT----AFRCW (TARGET) OPTION B
BACKBONE REGENERATION
 MODELING Structurally Conserved Regions
 Here coordinated of template are copied onto target
1. If two aligned residues are identical:
Coordinates of side chain are copied along with main chain.
2.If two residues differ :
Coordinates are transferred but side chain atoms are
replaced automatically to preserve model residues.
Loop modeling
 Modeling of variable regions
 Gaps produced during seq. alignment if modeled will
create HOLES.
 There are two approaches to resolve gaps :for small gaps
1. Database searching
2. Ab ignitio methods
Database searching
Finding spare
parts
From known
protein
structures to fit
into the stem
Suitable match
is searched in
PDB
Lastly coordinate
copied from
match onto
anchoring points
Loop modelling
1. SEARCHING
DATABASE FOR
SUITABLE FRAGMENT
2. COORDINATE
COPYING
LOOP
Ab initio method
Generate random loops
Search reasonable phi psi angles in allowed
regions of RAMACHANDRON POLT
Also do not clash with neighboring atoms
Side Chain Conformation Search
Now it is time to add side-chains to the backbone of the model.
Conserved residues were already copied completely.
3 methods:
• The one with lowest energy is selected.
1. By searching every possible conformation of torsion angle of side chain .
•Rotamers are favoured side chain set of conformers that arise from restricted rotation around a single bond extracted from known
structure.
•In library they ranked as per their frequency of occurrence.
•The one with lowest energy is selected.
2. Using rotamer library
•Many backbone conformations prefer specific rotames only.
•Lastly optimization is done to minimize steric overlaps with rest of the model
•Special program: SCWRL
3. Using correlation between rotamers and backbone confirmation.
Preferred rotamers of this tyrosin (colored sticks) the real side
chain
(cyan) fits in one of them.
Refinement of model using Molecular Mechanics
Many structural artifacts can be introduced while the model protein is being built
 Substitution of large side chains for small ones
 Strained peptide bonds between segments taken from difference reference proteins
 Non optimum conformation of loops
 This leads to irregularities like
 unfavourable bond length,
 unfavouable bond angle,
 close atomic contacts.
Strategies of refinement
Crambin in a simulation cell during a
molecular dynamics simulation.
Energy minimization
Produces a model with
overall lowest energy potential
This would relieve collisions
and strain without changing
molecular structure
1. Energy minimization
2. Molecular simulation
Molecular dynamic simulation
1. E.M moves atoms toward overall energy minima.
2. Often leading to suboptimal structures
3. To search for global energy minima atoms are moved uphill
as well downhill on energy scale.
4. This is done by thermodynamic calculations
5. This simulation follow protein folding and help to find true
structure.
6. GROMOS: dynamic molecular simulation program
Model Validation
 Every homology model contains errors.Two main reasons
 % sequence identity between reference and model
 The number of errors in templates
 Hence it is essential to check the correctness of overall fold/
structure, errors of localized regions and stereochemical
parameters: bond lengths, angles, geometries
The final model is evaluated to make sure that
structural features of model are compatible with
physiochemical rules
phi psi angles
bond lengths
spatial arrangement
energy profile
The difference between statistical values and
experimental values reveal error
Model Evaluation
 WHAT IF http://www.cmbi.kun.nl/gv/servers/WIWWWI/
 SOV http://predictioncenter.llnl.gov/local/sov/sov.html
 PROVE http://www.ucmb.ulb.ac.be/UCMB/PROVE/
 ANOLEA http://www.fundp.ac.be/pub/ANOLEA.html
 ERRAT http://www.doe-mbi.ucla.edu/Services/ERRATv2/
 VERIFY3D http://shannon.mbi.ucla.edu/DOE/Services/Verify_3D/
 BIOTECH http://biotech.embl-ebi.ac.uk:8400/
 ProsaII http://www.came.sbg.ac.at
 WHATCHECK http://www.sander.embl-heidelberg.de/whatcheck/
Automated Web-Based Homology Modeling
 SWISS Model : http://www.expasy.org/swissmod/SWISS-MODEL.html
 WHAT IF : http://www.cmbi.kun.nl/swift/servers/
 The CPHModels Server : http://www.cbs.dtu.dk/services/CPHmodels/
 3D Jigsaw : http://www.bmm.icnet.uk/~3djigsaw/
 SDSC1 : http://cl.sdsc.edu/hm.html
 EsyPred3D : http://www.fundp.ac.be/urbm/bioinfo/esypred/
Comparative Modeling Server & Program
 COMPOSER
http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/match
maker.html
 MODELER http://salilab.org/modeler
 InsightII http://www.msi.com/
 SYBYL http://www.tripos.com/
Challenges
 To model proteins with lower similarities( eg < 30% sequence
identity)
 To increase accuracy of models and to make it fully automated
 Improvements may include simulataneous optimization techniques
in side chain modeling and loop modeling
 Developing better optimizers and potential function, which can
lead the model structure away from template towards the correct
structure
 Although comparative modelling needs significant improvement, it
is already a mature technique that can be used to address many
practical problems
APPLICATIONS
(1) Studying the effect of mutation
(2) Identifying active and binding sites on protein
(useful for ligand design)
(3) searching for ligands of a given binding site (database mining)
(4) Designing novel ligands of a given binding site;
(5) Modeling substrate specificity
(6) Predicting antigenic epitopes
(7) protein-protein docking simulations
(8) Molecular replacement in X-ray structure refinement
(9) Rationalizing known experimental observations
(10) Planning new computational experiments with the provided models.
 Typical applications of a homology model in drug discovery
require a very high accuracy of the local side chain positions in
the binding site.
Thank you

Presentation1

  • 1.
  • 2.
    CONTENTS  INTODUCTION  HISTORY Steps for homology modeling 1. Template selection 2. Sequence alignment 3. Backbone modeling 4. Loop modeling 5. Side chain refinement 6. Model refinement using energy function 7. Model evaluation 8. Comprehensive modeling programs  Homology based databases  Applications
  • 3.
    INTRODUCTION  There are3 computational approach for protein 3D structural modeling prediction.  Ab initio prediction  Threading  Homology modeling  Prediction of 3D structure of a target (unknown) protein based on known structure of homologous protein.  A model is desirable when either x-ray crystallography or NMR structure of a protein, in time or on time. Models are more accurate.
  • 5.
    HOMOLOGY MODELING  Alsocalled as comparative modeling Knowledge-based modeling  Predicts 3D structure of protein on basis of sequence homology  Principle: Homology modeling is based on a reasonable assumption that two homologous proteins will share very similar structure because a protein fold is evolutionarily more conserved than its amino acid sequence .  Target protein can be modeled with reasonable accuracy on distantly related template
  • 6.
    Target sequence Templatesequence Homologous sequences Structure prepared by NMR or X ray crstallography comparison prediction Aim
  • 7.
    HISTORY  The firsthomology modelling studies were done using wire and plastic models of bonds and atoms as early as the 1960’s.  The models were constructed by taking the coordinates of a known protein structure and modified by hand for those amino acids that did not match the structure.  In 1969 David Phillips, Brown and co-workers published the first paper regarding homology modelling.  They modelled a-lactalbumin based on the structure of hen-egg white lysozyme.  The sequence identity between these two proteins was 39%. In addition both proteins contained an identical pattern of cysteins suggesting a similar arrangement of disulphide bonds.  When the structure of a-lactalbumin was solved by X-ray crystallography it was compared to the model and analysed.  The model was essentially correct apart from the C-terminal ends.
  • 8.
    7 STEPS OFHOMOLOGY MODELING Template selection Sequence alignment Backbone model building Loop modeling Side chain refinement Model refinement using energy function Model evaluation
  • 9.
    Template recognition andinitial alignment  Two proteins can be regarded as homologous if % identity falls in safe zone.  At least 30% Template selection involves searching PDB for homologous protein. Search can be performed using: BLAST FASTA SCANPS SSEARCH
  • 10.
    This gives usprobable set of templates. Out of which we will select the one with Max identity In case of multiple max identity choose highest resolution and similar cofactors. In case we get no similar seq. Go for search by 1. PSI-BLAST 2. Threading 3. Structurally conserved regions
  • 11.
    Sequence alignment correction 1.Full length of template and target are realigned to obtain optimal alignment. 2. Incorrect alignment at this stage would lead to incorrect model. 3. We may use multiple alignment algorithms such as PARLINE and T-COFFEE. 4. Even after these manual alignment may be needed to improve alignment. 5. Insertions and deletions can be made until gaps are as small as possible.
  • 12.
    Alignment correction S G P L A E R C I V C R M P E V C RM P E F-D- -A-V TEMPLATE BEST ALIGNED TARGET WITH GAPS TARGET AFTER SHIFTING RESIDUES
  • 13.
    CPISRTGASIFRCW (TEMPLATE) CPISRTA----FRCW (TARGET)OPTION A CPISRTGASIFRCW (TEMPLATE) CPISRT----AFRCW (TARGET) OPTION B
  • 14.
    BACKBONE REGENERATION  MODELINGStructurally Conserved Regions  Here coordinated of template are copied onto target 1. If two aligned residues are identical: Coordinates of side chain are copied along with main chain. 2.If two residues differ : Coordinates are transferred but side chain atoms are replaced automatically to preserve model residues.
  • 15.
    Loop modeling  Modelingof variable regions  Gaps produced during seq. alignment if modeled will create HOLES.  There are two approaches to resolve gaps :for small gaps 1. Database searching 2. Ab ignitio methods
  • 16.
    Database searching Finding spare parts Fromknown protein structures to fit into the stem Suitable match is searched in PDB Lastly coordinate copied from match onto anchoring points
  • 17.
    Loop modelling 1. SEARCHING DATABASEFOR SUITABLE FRAGMENT 2. COORDINATE COPYING LOOP
  • 18.
    Ab initio method Generaterandom loops Search reasonable phi psi angles in allowed regions of RAMACHANDRON POLT Also do not clash with neighboring atoms
  • 19.
    Side Chain ConformationSearch Now it is time to add side-chains to the backbone of the model. Conserved residues were already copied completely. 3 methods: • The one with lowest energy is selected. 1. By searching every possible conformation of torsion angle of side chain . •Rotamers are favoured side chain set of conformers that arise from restricted rotation around a single bond extracted from known structure. •In library they ranked as per their frequency of occurrence. •The one with lowest energy is selected. 2. Using rotamer library •Many backbone conformations prefer specific rotames only. •Lastly optimization is done to minimize steric overlaps with rest of the model •Special program: SCWRL 3. Using correlation between rotamers and backbone confirmation.
  • 20.
    Preferred rotamers ofthis tyrosin (colored sticks) the real side chain (cyan) fits in one of them.
  • 21.
    Refinement of modelusing Molecular Mechanics Many structural artifacts can be introduced while the model protein is being built  Substitution of large side chains for small ones  Strained peptide bonds between segments taken from difference reference proteins  Non optimum conformation of loops  This leads to irregularities like  unfavourable bond length,  unfavouable bond angle,  close atomic contacts.
  • 22.
    Strategies of refinement Crambinin a simulation cell during a molecular dynamics simulation. Energy minimization Produces a model with overall lowest energy potential This would relieve collisions and strain without changing molecular structure 1. Energy minimization 2. Molecular simulation
  • 23.
    Molecular dynamic simulation 1.E.M moves atoms toward overall energy minima. 2. Often leading to suboptimal structures 3. To search for global energy minima atoms are moved uphill as well downhill on energy scale. 4. This is done by thermodynamic calculations 5. This simulation follow protein folding and help to find true structure. 6. GROMOS: dynamic molecular simulation program
  • 24.
    Model Validation  Everyhomology model contains errors.Two main reasons  % sequence identity between reference and model  The number of errors in templates  Hence it is essential to check the correctness of overall fold/ structure, errors of localized regions and stereochemical parameters: bond lengths, angles, geometries
  • 25.
    The final modelis evaluated to make sure that structural features of model are compatible with physiochemical rules phi psi angles bond lengths spatial arrangement energy profile The difference between statistical values and experimental values reveal error
  • 26.
    Model Evaluation  WHATIF http://www.cmbi.kun.nl/gv/servers/WIWWWI/  SOV http://predictioncenter.llnl.gov/local/sov/sov.html  PROVE http://www.ucmb.ulb.ac.be/UCMB/PROVE/  ANOLEA http://www.fundp.ac.be/pub/ANOLEA.html  ERRAT http://www.doe-mbi.ucla.edu/Services/ERRATv2/  VERIFY3D http://shannon.mbi.ucla.edu/DOE/Services/Verify_3D/  BIOTECH http://biotech.embl-ebi.ac.uk:8400/  ProsaII http://www.came.sbg.ac.at  WHATCHECK http://www.sander.embl-heidelberg.de/whatcheck/
  • 27.
    Automated Web-Based HomologyModeling  SWISS Model : http://www.expasy.org/swissmod/SWISS-MODEL.html  WHAT IF : http://www.cmbi.kun.nl/swift/servers/  The CPHModels Server : http://www.cbs.dtu.dk/services/CPHmodels/  3D Jigsaw : http://www.bmm.icnet.uk/~3djigsaw/  SDSC1 : http://cl.sdsc.edu/hm.html  EsyPred3D : http://www.fundp.ac.be/urbm/bioinfo/esypred/
  • 28.
    Comparative Modeling Server& Program  COMPOSER http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/match maker.html  MODELER http://salilab.org/modeler  InsightII http://www.msi.com/  SYBYL http://www.tripos.com/
  • 29.
    Challenges  To modelproteins with lower similarities( eg < 30% sequence identity)  To increase accuracy of models and to make it fully automated  Improvements may include simulataneous optimization techniques in side chain modeling and loop modeling  Developing better optimizers and potential function, which can lead the model structure away from template towards the correct structure  Although comparative modelling needs significant improvement, it is already a mature technique that can be used to address many practical problems
  • 30.
    APPLICATIONS (1) Studying theeffect of mutation (2) Identifying active and binding sites on protein (useful for ligand design) (3) searching for ligands of a given binding site (database mining) (4) Designing novel ligands of a given binding site; (5) Modeling substrate specificity (6) Predicting antigenic epitopes (7) protein-protein docking simulations (8) Molecular replacement in X-ray structure refinement (9) Rationalizing known experimental observations (10) Planning new computational experiments with the provided models.  Typical applications of a homology model in drug discovery require a very high accuracy of the local side chain positions in the binding site.
  • 31.