Computational Prediction Of Protein-1.pptx

Computational Prediction Of Protein
Anas Saifi
M.Pharm (Pharmacology) 2nd Sem.
School of Pharmaceutical Education And Research
Jamia hamdard

• Introduction Of Protein
• Types Of Protein
• Methods For Protein Structure Prediction
• Homology Modeling
• Fold Recognition and Threading
• Ab Initio Protein Structure Prediction
• Rosseta
• References
Presentation Outline

Introduction
• Proteins are the most abundant organic molecules of the living system.
• They occur in every part of the cell and constitute about 50% of the cellular dry weight.
• Proteins form the fundamental basis of structure and function of life.
• Origin Of The Word ‘Protein’
• The term protein is derived from a Greek word proteios, meaning holding the first
place.
• Berzelius (Swedish chemist) suggested the name proteins to the group of organic
compounds that are utmost important to life.
• Mulder (Dutch chemist) in 1838 used the term proteins for the high molecular weight
nitrogen-rich and most abundant substances present in animals and plants.

Introduction
• Proteins are Polymers of the polypeptides which are formed by sequence of amino acids
• When amino acids undergo condensation reactions in which one water molecule per
reaction is lost in order for it to attach to one another with a peptide bond, a protein is
formed
• Hydrogen bonding, Van der Waals forces, ionic interactions and hydrophobic packing are
some of the non-covalent interactions which drive a protein into one or more specific
spatial conformations
• This happens so that proteins are able to perform their biological functions
• This alternate structure is also referred to as a different conformational isomer simply
confirmations and the transition between them is called conformational changes.

Introduction
• Functions of proteins:
• Proteins perform a great variety of specialized and essential functions in the living cells.
• These functions may be broadly grouped as static (structural) and dynamic.
• Structural functions :
• Certain proteins perform brick and mortar roles and are primarily responsible for structure and
strength of body.
• These include collagen and elastin found in bone matrix, vascular system and other organs and
D-keratin present in epidermal tissues.
• Dynamic functions :
• The dynamic functions of proteins are more diversified in nature.
• These include proteins acting as enzymes, hormones, blood clotting factors, immunoglobulins,
membrane receptors, storage proteins.
• Besides their function in genetic control, muscle contraction, respiration etc.
• Proteins performing dynamic functions are appropriately regarded as the working horses of
cell.

Introduction
• Elemental Composition Of Proteins
• Proteins are predominantly constituted by five major elements in the following proportion.
Carbon
50 – 55%
Hydrogen
6 – 7.3%
Oxygen
19 – 24%
Nitrogen
13 – 19%
Sulfur
0 – 4%
Fig. 1: Elemental composition of protein

Types Of Protein Structures
• Primary Structure – The simple linear arrangement of amino acid residue sequences
• Secondary protein structure – The binding pattern of the amino, hydrogen and
carbonyl oxygen atom between amino acid sequences throughout the peptide
backbone
• Two kind- alpha helices and beta strands
• Tertiary structure – 3D structure of monomeric and multimeric molecules alpha
helix and beta strands formed a globular structure together. The folding is initiated by
the hydrophobic bond, disulphide bond, salt bridge and also H bond
• Quaternary structure – Built up through dimeric and/ or multimeric molecules
stabilized by the non-covalent bonds.

Types Of Protein Structures
Fig. 2: Types of protein structure

Method For Protein Structure Prediction
1)Experimental Methods
2)Computational Methods
Fig. 3: Method For Protein Structure Prediction

Experimental Protein Structure Determination
• most accurate
• in vitro
• needs crystals
• ~$100-200K per structure
• time consuming and expensive.
X-ray
crystallography
• fairly accurate
• in vivo
• no need for crystals
• limited to very small proteins
• time consuming and hard.
NMR
• imaging technology
• low resolution
• not more observable.
Electron-
microscopy

Experimental Protein Structure Determination
Table-1: PDB data distribution (protein only structures)

Homology Modeling
• Also called comparative modeling
• Predict protein structure based on sequence homology with known structure
• If two proteins share a high enough sequence similarity they are likely to have very
similar three dimensional structure
• Modeling server :- modbase, swiss- model etc
• Fail in absence of homology

Steps Involved In Homology Modeling
1. Template recognition and Initial alignment
2. Alignment correction
3. Backbone generation
4. Loop modeling
5. Side chain modelling
6. Model optimization
7. Model Validation

1) Template Recognition and Initial Alignment
• Searching of the PDB for homologous protein which consists of determined structures
• BLAST Search (Basic Local Alignment Search Tool)
It is a search algorithm which can compare our query sequence with the sequence present in a
Library or database. And it gives us the sequence which match our query sequence.
It can search amino acid sequence of protein, nucleotides of DNA and RNA sequences.
• FASTA Search(Fast-A)
It is a software which is primarily used for DNA and protein sequence alignment
It has a high speed of execution. First, it marks the potential matches using the pattern of word
hits and word- to- word matches of a given length
It is also famous for its FASTA format which has become widely used in the bioinformatics
domain.

2) Alignment Correction
• The alignment of two sequences of two proteins where the percentage sequence identity is
low can be difficult
• One can then use the other sequences present from the homologous protein to find a solution
• Ex- It is nearly impossible for the sequence LTLTLTLT to be aligned with sequence
YAYAYAYAYA. So we find a third sequence TYTYTYTYT which can be easily aligned
with both of them

3) Loop Modelling
• After the sequence alignment step, due to insertion and deletion there are often regions
created that lead to gaps in the alignment. Loop modeling is used to fill in these gaps, but
it is comparitively less accurate.
• The two main techniques are used to approach the problem
• The database searching method, this method involves going through the database of
known protein structures and finding loops and then superimposing these loops onto the
two stem region of the target regions of the target protein.
• The ab initio method:- In this method, various loops are generated randomly. And then,
it searches of a loop in those loops which has reasonably low energy
• ModLoop is a web server for automated modeling of loops in protein structures.

5) Model Optimization
• It has important because it is used to evaluate the protein- ligand interactions at the active
sites and also the protein –Protein interactions at the contact interface.
• By searching every possible combination for every torsion angle of the side chain, we can
select the one which has the lowest interaction energy with neighbouring atoms
Model optimization is done so that the overall conformation of the molecule has the
lowest possible energy potential. This is done by adjusting the relative position of the
atoms. This leads to a better chance of finding the true structure
Energy= Streaching Energy+ Bending Energy+Torsion Energy+ Non-Bonded Interaction
Energy
4) Side-chain Modeling

6) Model Validation
• Every homology model contains errors
• Two main reasons are:
• The first error occurs of the sequence identity is more than 90% or less than 30% of the
template and target protein
• If the sequence identity is more than 90%, than it can be compared with the structures which
are determined through crystallography
• Else if it is less than 30%, then major errors can occur
• The second major reason for errors is the errors present in the templates itself

Advantages
• It can find the location of alpha carbons of key residues inside the folded protein
• It can hypothesize structure functions relationship
• Mutagenesis experiments can be guided using this
• The putative active sites, binding pockets and ligands can be identified using the position
of the conserved regions of the protein surface.
Disadvantages
• The side-chain positions and predicting conformations or deletions are not possible with
homology models.
• Homology models are only useful for drug designing and development process only if
the sequence identity of the target protein is more than 70% with the template. Because
template with less than 70% identity cannot be used in modeling and ligand docking
studies which is necessary for this scenario

Error Sources In Homology Modelling
• Incorrect sequence alignment-among the most devastating error in homology modelling
• Incorrect choice of template may happen, especially for multidomain proteins
• The loop regions can be built incorrectly. Normally, the servers build the loops automatically
• The person doing the modelling can also make some errors, but this type of errors can be hard
to predict and it can include any kind of errors. One way to minimize this kind of errors is to
have basic knowledge about the principles of protein structures
• There can also be errors present in the template itself. This makes this type of errors difficult to
deal with. A model can hardly be better than the template
HOMOLOGY MODELLING SOFTWARES
IntFOLD, RaptorX, Biskit, MODELLER, SWISS-MODEL

2) Fold recognition and Threading Methods
• Folding is the physical process in which a random coil folds into its characteristic
and functional 3D structure from a polypeptide
• It matches a protein sequence to a fold library using a so- called threading
algorithm that assigns each amino acid to a position on the three-dimensional
structure for a particular fold class which is available in the PDB Threading is the
process of comparing the target protein sequence with a library of template
• The templates consist of the protein structure and fold data of various proteins
• This type of modeling is done when the folds of a protein match the folds of protein
whose structure is known, but the structures of the proteins are not homologous

1. Pairwise Energy Based Method (Threading)
Searched for a structural fold database to find the best matching structural fold using energy
based criteria
Using dynamic programming and heuristic approaches
Calculate energy for raw model
Lowest energy fold that correspond to the structurally a group of most compatible fold
2. Profile Based Method (Fold Recognition)
A profile is constructed for related protein structures
Generated by superimposition of the structures to expose corresponding residues
Secondary structure type, polarity, hydrophobicity
The protein fold to be predicted does not exist in the fold library method will fail

1) Construction of structure template database
• Select the protein structures from a protein structure database
• When selecting the protein structure, we need to make sure that we avoid protein structures
which have high sequence similarity
• PDB, FSSP, SCOP and CATH
2) The design of the scoring function
• Mutation potential, pairwise potential, secondary structure compatibilities and gap penalties are
the main things a scoring function should consist of
• A scoring function should measure the fitness between the template and the target sequence
Steps Involved

3) Threading alignment
• The scoring function has to be optimised so that the target sequence can be aligned with
each of the structure templates
• In threading- based structure prediction program which take into account the pairwise
contact potential
4) Threading prediction
• Choose a threading alignment that is statistically likely as the threading protein
• Structural model is constructed by using the selected structural template and placing the
backbone atoms of the target sequence on the structural templates aligned back one
positions.
Steps Involved

Advantages
• It is moderately successful
• It is good for proteins with less than 100 residues
Disadvantages
• This methodology assumes that the current structural library consists of all the possible
confirmation that could possibly be deciphered experimentally
• Less than 30% if the predicted first are true remote homologs
• High computational cost is involved to screen a Library of thousands of possible folds
• Energy functions are simplified for efficient calculations and results are therefore compromised
Threading softwares
Hhpred, RaptorX, Phyre and Phyre2, FALCON , MUSTER

3) Ab Initio Protein Structure Prediction
• Ab initio protein structure prediction methods build protein 3D structures from sequence based
on physical principle
• Ab Initio Protein Structure Prediction is a method to determine the tertiary structure of protein in
the absence of experimentally solved structure of a similar/homologous protein. This method
builds protein structure guided by energy function
• The ab initio Methods are important even though they are computationally demanding
• Ab initio Methods predict protein structure based on physical models they are indispensable
complementary methods to knowledge- based approach
• Knowledge –based approach would fail in following conditions
• Structure homologues are not available
• Possible undiscovered new fold exists
• Anfinsen’s theory: Protein native structure corresponds to the state with the lowest free energy
of the protein –solvent system

ROSETTA
• Web server for protein 3d structure prediction
• Rosetta development began in the laboratory of Dr. David Baker at the University of
Washington as a structure prediction tool but since then has been adapted to solve
common computational macromolecular problems.
• Mini threading method
• Breaks down the quary sequence into many short segments (3 to 9)
• Predicts the secondary structure of small segments using HMMSTR
• Segments with assigned secondary structure are subsequently assemble into a 3D
configuration
• Random combination of fragments, a large number of models are built and their over all
energy potential calculated
• Confirmation with lowest free energy is chosen as the best model

Application And Limitations Of Methods For
Structure Prediction
• Each methods discussed provide structural details to different extent.
• Homology Modeling can provide atomic level details of target protein, threading can help
only to judge the fold of the protein
• High and medium level homology models with sequence identity > 30% are useful in refining
functional prediction such as ligand binding
• Folds predicted by threading could be used in supporting site-directed mutagenesis
experiments, designing of stable crystallizable variants and in refining NMR structures
• The accuracy and applicability of models produced by ab initio Methods are in general of
lower accuracy compared to models obtained from either homology modeling or threading
• In predicting functional relationships from structural similarity and for identification of patches
of conserved surface residues

References
• Jones, D. T., Singh, T., Kosciolek, T. & Tetchner, S. MetaPSICOV: combining coevolution
methods for accurate prediction of contacts and long range hydrogen bonding in proteins.
Bioinformatics 31,999–1006 (2015).
• Davey, J. A., Damry, A. M., Goto, N. K. & Chica, R. A Rational design of proteins that
exchange on functional timescales. Nat. Chem. Biol. 13, 1280–1285 (2017).
• Jisna, V. A., & Jayaraj, P. B. (2021). Protein Structure Prediction: Conventional and Deep
Learning Perspectives. The protein journal, 40(4), 522–544. https://doi.org/10.1007/s10930-
021-10003-y
• Satyanarayana and Chakrapani, 2013. Biochemistry Fourth Edition, “Proteins and Amino
Acids” Pg No. 43-69

Computational Prediction Of Protein-1.pptx

Computational Prediction Of Protein-1.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Computational Prediction Of Protein-1.pptx

Similar to Computational Prediction Of Protein-1.pptx (20)

More from ashharnomani

More from ashharnomani (20)

Recently uploaded

Recently uploaded (20)

Computational Prediction Of Protein-1.pptx