L1Protein_Structure_Analysis.pptx

Learning Objectives
• At the end of this topic the student should be
able to:
(i) Perform computer aided- protein structure
determination and validation
(ii) Design docking experiments
(iii) Demonstrate understanding of protein
interactions and simulations

Computer-Aided Protein Structure
Prediction
Protein
Sequence +
Structure

Recap
Structure prediction aims at building models of 3D structures of proteins that
could be useful for understanding structure function relationships.

STAT115 5
Protein Structure Levels
Folded structure:
lowest energy state

Watch Video
http://www.youtube.com/watch?v=fvBO3TqJ6F
E&feature=fvw

Protein Structure Determination &
Prediction
• Experimental Techniques
– X-ray Crystallography
– NMR
• Limitations of Current Experimental
Techniques
– Protein DataBank (PDB) -> 30,000 protein structures
– Unique structure are between 4000 to 5000 only
– Non-redudant (NR) -> 10,000,000 proteins sequences
• Importance of Structure Prediction
– Fill gap between known sequence and structures
– Protein Engineering.- To alter function of a protein
– Rational Drug Design

STAT115 8
Protein Structure Prediction
• Since AA sequence determines structure, can we
predict protein structure from its AA sequence?
– predicting the three angles, unlimited DoF!
• Physical properties that determine fold
– Rigidity of the protein backbone
– Interactions among amino acids, including
• Electrostatic interactions
• van der Waals forces
• Volume constraints
• Hydrogen, disulfide bonds
– Interactions of amino acids with water

Why Protein Modeling
 The goal of research in the area of structural genomics is to
provide the means to characterize and identify the large
number of protein sequences that are being discovered
Knowledge of the three-dimensional structure
 helps in the rational design of site-directed mutations
 can be of great importance for the design of drugs
 greatly enhances our understanding of how proteins function
and how they interact with each other , for example, explain
antigenic behaviour, DNA binding specificity, etc

Computational methods for Protein Structure
Prediction
 Homology or Comparative Modeling
 Fold Recognition or threading Methods
 Ab initio methods that utilize knowledge-based information
 Ab initio methods without the aid of knowledge-based
information

Why do we need computational approaches?
 Structural information from x-ray crystallography or NMR
results
obtained much more slowly.
techniques involve elaborate technical procedures
many proteins fail to crystallize at all and/or cannot be
obtained or dissolved in large enough quantities for NMR
measurements
The size of the protein is also a limiting factor for NMR
 With a better computational method this can be done extremely fast.

Tertiary Structure Prediction Methods
Any given protein sequence
Structure selection
Compare sequence identity with proteins that have solved structure
Homology
Modeling
> 35%
Fold
Recognition
ab initio
Folding
< 35%
< 35%
Structure refinement
Final Structure
Structure selection

Homology Modeling
• Predicts the three-dimensional structure of a
given protein sequence (TARGET) based on an
alignment to one or more known protein
structures (TEMPLATES)
• If similarity between the TARGET sequence and
the TEMPLATE sequence is detected, structural
similarity can be assumed.
• In general, 30% sequence identity is required for
generating useful models.

Homology Modeling Process
 Template recognition
 Alignment
 Determining structurally conserved regions
 Backbone generation
 Building loops or variable regions
 Conformational search for side chains
 Refinement of structure
 Validating structures

Template Recognition
• First we search the related proteins sequence(templates) to the target
sequence in any structural database of proteins
• The accuracy of model depends on the selection of a proper template
• FASTA and BLAST (PSI-BLAST) from EMBL-EBI and NCBI can be used
• This gives a probable set of templates but the final one is not yet decided
• After intial aligments and finding structurally conserved regions among
templates, we choose the final template

How good can homology modeling be?
Sequence Identity
60-100% Comparable to medium resolution NMR
Substrate Specificity
30-60% Molecular replacement in crystallography
Support site-directed mutagenesis
through visualization
<30% Serious errors!!!

Protein Structural Databases
• Templates can be found using the TARGET sequence as a
query for searching using FASTA or BLAST
– PDB (http://www.rcsb.org/pdb)
– MODELLER
(http://guitar.rockefeller.edu/modeller/modeller.html)
– ModBase (http://pipe.rockefeller.edu/modbase/general-
info.html)
– 3DCrunch
(http://www.expasy.ch/swissmod/SM_3DCrunch.html)

Target-Template Alignment
• No current comparative modeling method can
recover from an incorrect alignment
• Use multiple sequence alignments as initial guide.
• Consider slightly alternative alignments in areas of
uncertainty, build multiple models
Note: sequence from database versus sequence from
the actual PDB are not always identical

Target-Multiple Template Alignment
• Alignment is prepared by superimposing all template
structures
• Add target sequence to this alignment
• Compare with multiple sequence alignment and adjust

Gaining confidence in template searching
• Once a suitable template is found, it is a good idea to do a
literature search (PubMed) on the relevant fold to
determine what biological role(s) it plays.
• Does this match the biological/biochemical function that
you expect?

Other factors to consider in selecting
templates
• Template environment
– pH
– Ligands present?
• Resolution of the templates
• Family of proteins
– Phylogenetic tree construction can help find the
subfamily closest to the target sequence
• Multiple templates?

Determining Structurally Conserved Regions
(SCRs)
• When two or more reference protein structures are available
• Establish structural guidelines for the family of proteins under consideration
• First step in building a model protein by homology is determining what
regions are structurally conserved or constant among all the reference
proteins
• Target protein is supposed to assume the same conformation in conserved
regions

Structurally Conserved Regions
 SCRs are region in all proteins of a particular family that
are nearly identical in structures.
 Tend to be at inner cores of the proteins
 Usually contains alpha-helices and beta sheets
 No SCR can span more than one secondary structure

Alignment of Target Protein with SCR
 After doing alignments and finding SCRs
 We align the unknown sequence with the aligned reference proteins with the
knowledge of SCRs
 Since SCR cant contain insertions and deletions
 The enhancement of standard alignment algorithm is used
 After finding the suitable template and aligning the unknown sequence with the
template
 Assignment of coordinates within conserved regions is done

Assignment of coordinates within conserved
region
 Once the correspondence between amino acids in the reference and model
sequences has been made, the coordinates for an SCR can be assigned
 The reference proteins' coordinates are used as a basis for this assignment
 Where the side chains of the reference and model proteins are the same at
corresponding locations along the sequence, all the coordinates for the amino
acid are transferred
 Where they differ, the backbone coordinates are transferred , but the side
chain atoms are automatically replaced to preserve the model protein's
residue types

Assignment of coordinates in loop or variable
region
Two main methods
 Finding similar peptide segments in other proteins
 Generating a segment de-novo

Finding similar peptide segment in other proteins
 Advantage: all loops found are guaranteed to have reasonable
internal geometries and conformations
 Disadvantage: may not fit properly into the given model protein’s
framework
In this case, de-novo method is advisable
Assignment of coordinates in loop or
variable region

Selection Of Loops
 Check the loops on the basis of steric overlaps
 A specified degree of overlap can be tolerated
 Check the atoms within the loop against each other
 Then check loop atoms against rest of the protein’s atoms

Sidechains on surface of protein
• Exposed sidechains on surface can be highly flexible without a
single dominant conformation
• So ultimately if these solvent exposed sidechains do not form
binding interactions with other molecules or involved in say, a
catalytic reaction, then accuracy may not be

Side Chain Conformation Search
 With bond lengths, bond angles and two rotable backbone bonds per
residue φ and ψ, its very difficult to find the best conformation of a side
chain
 In addition, side chains of many residues have one or more degree of
freedom.
 Hence Side chain conformational search in loop regions is must
 Side chain residues replaced during coordinate transformations should
also be checked

Selection Of Good Rotamer
 Fortunately, statstical studies show side chain adopt only a small number
of many possible conformations
 The correct rotamer of a particular residue is mainly determined by local
environment
 Side chain generally adopt conformations where they are closely packed

Refinement of model using Molecular
Mechanics
Many structural artifacts can be introduced while the model protein is
being built
 Substitution of large side chains for small ones
 Strained peptide bonds between segments taken from difference
reference proteins
 Non optimum conformation of loops

Model Validation
 Every homology model contains errors. Two main reasons
 % sequence identity between reference and model
 The number of errors in templates
 Hence it is essential to check the correctness of overall fold/ structure,
errors of localized regions and stereochemical parameters: bond
lengths, angles, geometries

Structural Prediction by Homology Modeling
Structural Databases
Reference Proteins
Conserved Regions Protein Sequence
Predicted Conserved Regions
Initial Model
Structure Analysis
Refined Model
SeqFold,Profiles-3D, PSI-BLAST, BLAST & FASTA, Fold-recognition methods
Cα Matrix Matching
Sequence Alignment
Coordinate Assignment
Loop Searching/generation
WHAT IF, PROCHECK, PROSAII,..
Sidechain Rotamers
and/or MM/MD
MODELER

Errors in Homology Modeling
a) Side chain packing b) Distortions and shifts c) no template

Errors in Homology Modeling
d) Misalignments e) incorrect template
Marti-Renom et al., Ann. Rev. Biophys. Biomol. Struct., 2000, 29:291-325.

Automated Web-Based Homology Modelling
 SWISS Model : http://www.expasy.org/swissmod/SWISS-MODEL.html
 WHAT IF : http://www.cmbi.kun.nl/swift/servers/
 The CPHModels Server : http://www.cbs.dtu.dk/services/CPHmodels/
 3D Jigsaw : http://www.bmm.icnet.uk/~3djigsaw/
 SDSC1 : http://cl.sdsc.edu/hm.html
 EsyPred3D : http://www.fundp.ac.be/urbm/bioinfo/esypred/
 HHpred server: https://toolkit.tuebingen.mpg.de/tools/hhpred

Tools for Model Evaluation
 WHAT IF http://www.cmbi.kun.nl/gv/servers/WIWWWI/
 SOV http://predictioncenter.llnl.gov/local/sov/sov.html
 PROVE http://www.ucmb.ulb.ac.be/UCMB/PROVE/
 ANOLEA http://www.fundp.ac.be/pub/ANOLEA.html
 ERRAT http://www.doe-mbi.ucla.edu/Services/ERRATv2/
 VERIFY3D http://shannon.mbi.ucla.edu/DOE/Services/Verify_3D/
 BIOTECH http://biotech.embl-ebi.ac.uk:8400/
 ProsaII http://www.came.sbg.ac.at
 WHATCHECK http://www.sander.embl-heidelberg.de/whatcheck/

Challenges
 To model proteins with lower similarities( eg < 30% sequence identity)
 To increase accuracy of models and to make it fully automated
 Improvements may include simulataneous optimization techniques in
side chain modeling and loop modeling
 Developing better optimizers and potential function, which can lead
the model structure away from template towards the correct
structure
 Although comparative modelling needs significant improvement,
it is already a mature technique that can be used to address
many practical problems

Comparative Modeling Server & Program
 COMPOSER
http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/matchmaker.h
tml
 MODELER http://salilab.org/modeler
 InsightII http://www.msi.com/
 SYBYL http://www.tripos.com/

STAT115 41
Fold Recognition
• Also called protein threading
• Given new sequence and library of known
folds, find best alignment of sequence to each
fold, returned the most favorable one

STAT115 42
Protein Folding with
Dynamic Programming
• D. Eisenburg 1994
• Align sequence to each fold (a string of
environmental classes)
• Advantages: fast and works pretty well
• Disadvantages: do not consider AA contacts

STAT115 43
Fold Recognition
• Each predictor can submit N top hits
• Every predictor does well on something
• Common folds (more examples) are easier to
recognize

STAT115 44
Fold Recognition
• Alignment (seq to fold) is a big problem

STAT115 45
ab initio
• Predict interresidue contacts and then
compute structure (mild success)
• Simplified energy term + reduced search space
(phi/psi or lattice) (moderate success)
• Creative ways to memorize sequence 
structure correlations in short segments from
the PDB, and use these to model new
structures: ROSETTA

46
Ab initio prediction of protein structure
sample conformational space such that native-like conformations are found
astronomically large number of
conformations
5 states/100 residues = 5100 = 1070
select
hard to design functions
that are not fooled by
non-native conformations
(“decoys”)

Programs for Model Protein Prediction
https://molbiol-tools.ca/Protein_tertiary_structure.htm

L1Protein_Structure_Analysis.pptx

L1Protein_Structure_Analysis.pptx

More Related Content

What's hot

Similar to L1Protein_Structure_Analysis.pptx

Recently uploaded

L1Protein_Structure_Analysis.pptx