1. Computational study of Protein
From Scratch;
From Structure to Function
Topic:
Kinza Irshad
Soil and Ecosystem Ecology Lab,
COMSATS University Islamabad,
Abbottabad
5/28/2019 1
2. What is Protein?
Long chain polypeptides
Made up of 20 naturally occurring amino acids
Two amino acids have peptide bond, having double bond characteristic
Functional proteins are folded into 3D structure, containing helix, Beta sheets and loops
Helix and sheets are rigid part while loops are flexible
In 3D structure hydrophobic amino acids are located in core while hydrophilic are oriented at
surface.
5/28/2019 2
5. First Step; Protein Sequence
We are going to take start from Protein Sequence.
We can take protein sequence from database or can translate gene sequence into protein.
In both cases, we should ultimately have FASTA format.
5/28/2019 5
6. What is FASTA Format???
>MH08765.1 Serine_protease, Human
ELPDFTPLVEQASPAVVNISTRQKLPDRAMARGQLSIPDLEGLPPMFRDFLERSIPQVPRNPRGQQREAQSLGSGFIISN
DGYITNNHVVADADEILVRLSDRSEHKAKLIGADPRSDVAVLKIEAKNLPTLKLGDSNKLKVGEWVLAIGSPFGFDHSVTA
GIVSAKGRSLPNESYVPFIQTDVAINPGNSGGPLLNLQGEVVGINSQIFTRSGGFMGLSFAIPIDVALNVADQLKKAGKVS
RGWLGVVIQEVNKDLAESFGLDKPSGALVAQLVEDGPAAKGGLQVGDVILSLNGQSINESADLPHLVGNMKPGDKINL
DVIRNGQRKSLSMAVGSLPDDDEEIASMGAPGAERSSNRLGVTVADLTAEQRKSLDIQGGVVIKEVQDGPAAVIGLRPG
DVITHLDNKAVTSTKVFADVAKALPKNRSVSMRVLRQGRASFITFKLA
5/28/2019 6
Header Region
Protein
Sequence
in one
letter code
7. How To Get Protein Sequence from Database??
1. We are going to take example of largest and most frequently used NCBI nucleotide and
Protein database
https://www.ncbi.nlm.nih.gov/
2. There are two different types of format
◦ GenBank
◦ FASTA
5/28/2019 7
8. After Having Sequence What to Do?
In next step, We will try to figure out Physio-chemical properties of our target protein, including
Molecular weight, Iso-electric point, stability, etc.
We will use very popular tool here, named Expasy Protparam.
https://web.expasy.org/protparam/
The input is protein sequence in FASTA format.
Genbank format is not acceptable
5/28/2019 8
9. Prediction of Secondary Structure Elements
There are three different structural forms of proteins, primary, secondary, and tertiary
structure.
In secondary structure, there will be three different type of element, alpha-helix, beta-sheets
and loops.
Two different type of algorithms used to predict secondary structure elements
I. Ab-initio Based
II. Homology Based
5/28/2019 9
10. Prediction of Secondary Structure Elements
Ab-initio algorithms are stand alone algorithms, identifying the secondary structure elements
of using intrinsic tendencies of amino acids to be in particular confirmation. For example,
glycine and proline, they love to stay in loops only
Homology-based algorithms make prediction based on secondary structure of homolgous.
Structures are more conserved as compared to sequence.
5/28/2019 10
11. PSI-PRED; A Homology-Based Tool
We will use Psi-Pred a homology based tool to predict secondary structure
http://bioinf.cs.ucl.ac.uk/psipred/
The submitted protein sequence will be searched in protein database through BLAST to search
the homologous sequences, based on query coverage and E-value.
The tool will align the homologous sequences (MSA) to get information about conservancy.
The conserved regions should ideally have same secondary structural elements.
5/28/2019 11
13. Signal Peptide Prediction
Signal peptide in defining localization of protein in the cells.
Present at N-terminal of newly synthesized protein
Predominantly hydrophobic amino acids
Important to predict especially if we want to clone the gene into
expression system.
5/28/2019 13
14. Signal Peptide Prediction
Commonly used algorithm for the prediction of
signal peptide is SignalP4.1
http://www.cbs.dtu.dk/services/SignalP/
Input sequence is FASTA format
5/28/2019 14
15. Is Our Protein is Transmembrane ?
In cell, proteins are either globular or
transmembrane.
Transmembrane proteins can be
i. Transmembrane helical
ii. Beta-barrels
To make structural prediction of transmembrane
protein TMHMM (Transmembrane Hidden Markov
Model) algorithm is used.
http://www.cbs.dtu.dk/services/TMHMM/
5/28/2019 15
16. Prediction of Domains & Motifs in Protein Structure
InterPro online server will be used to identify the domains
https://www.ebi.ac.uk/interpro/
The input sequence will be in FASTA format.
InterPro used homology search to identify the domains and mortifies.
5/28/2019 16
18. 3D Structure predication of a protein
Protein structure prediction is the inference of the three-dimensional structure of
a protein from its amino acid sequence—that is, the prediction of its folding and its secondary
and tertiary structure from its primary structure.
Structure prediction is fundamentally different from the inverse problem of protein design.
Protein structure prediction is one of the most important goals pursued by bioinformatics
and theoretical chemistry; it is highly important in medicine (for example, in drug design)
and biotechnology (for example, in the design of novel enzymes).
19. 3D Structure Prediction of Protein
3D structure prediction is one of most complicated computational process
Important step for protein structure analysis
There are three experimental techniques to determine 3D structure of proteins
i. X-ray Crystallography
ii. NMR
iii. Cryo-Electron Microscopy
5/28/2019 DEPARTMENT OF BIOCHEMISTRY & BIOTECHNOLOGY 19
21. 5/28/2019 21
End of Session No 1
Lets do some practical work
Accession # MH045598
https://www.ncbi.nlm.nih.gov/
https://web.expasy.org/protparam/
http://bioinf.cs.ucl.ac.uk/psipred/
http://www.cbs.dtu.dk/services/SignalP/
https://www.ebi.ac.uk/interpro/
https://zhanglab.ccmb.med.umich.edu/I-TASSER/