Protein Threading

Protein structure prediction using protein
threading
Sanjana Pandey
MSc. Bioinformatics

Overview
Protein threading definition:DONE
Why do we need protein threading:DONE
Basic principles:DONE
Workflow:DONE
Describe each step of the workflow:DOING
Assesment step is important
Softwares/tools
Advantages/application
Limitation
References
thankyou

“Threading”?
o Threading=placing/aligning
o Aminoacidsequence is beingthreaded“into” the
templatestructure by “statisticalprinciples” and
stitchthe aligned regions together.
Placed into by force
o Given a Protein sequence and a template library return the best sequence-structure alignment
o In threading, a newsequence is mounted on a series of known folds withthe goal of findinga fold(a
sequence-structure alignment)that providesthe best score (lowest energy).

Need for threading
 Sequence Homology <20%
 Fold recognition
 No 3D structure similarity
 Computational limitations

Protein threading v/s Fold recognition
o Structure prediction method o Identification of “folds”
o Involves the process of
“fold recognition”
Each different topology of alpha helices and beta sheets make up the folds.
All-alpha, All-beta, Alpha/Beta etc.

History
1. (Bowie et aI., 1991) on "the inverse protein folding problem" - foundation
Used simple measures for fitness of different amino acid types to local structural environments in terms of
solvent accessibility and protein secondary structure.
2. The work by Jones, Taylor, and Thornton

Principles
 limitednumber of basic folds found in nature ~1500
 aminoacidpreferences for different structural
environments provide sufficient informationto choose
amongfolds
Thecoresecondarystructureregions(helix& sheets)couldbemodelledandstructureis predicted
Taking into account the large number of
amino acid sequences in databases like
UniProt, one would expect a high number
of folds. But in reality it is limited, it
appears that nature has re-used the same
fold again and again for performing new
functions.

Requirements
1. Query sequence
2. Library of core fold templates
3. Objective function (evaluate any particular placement of a sequence in a core template )
4. Method for searching over space of alignments between sequence and each core template
5. Method for choosing the best template given alignments

Workflow
Fold library
Query sequence
Thread it by their sequential order (gaps allowed) into structural positions of a template structure
Optimize by fitness scores
Sequence structure alignments
Statistical assessment
Prediction of backbone atoms of the query protein

1.Sequence selection
 The query sequence “target
 Homology<20%
 Result depends on the size and details of the library.
 For better results, library must be large and sufficient
 Remove homologous structures
2.Fold library
FSSP PDB-Select PISCES

3.Threading
 Different proteins fold into similar 3D shapes -similar interaction patterns among and between their residues
and environment.
 Interaction patterns could possibly be captured using simple statistics-based energy models.
 Threading/Placing the (backbone atoms of) residues of a query protein into the correct structural positions in
a correct structural fold needs:
(a) an energy function whose global minimum will correspond to the correct placement of residues into the
correct structural template
(b) an algorithm to find the global minimum of the given energy function.

4.Optimization
 Possible sequence-template alignments are scored using a specified objective function
 Objective function scores the sequence-structure compatibility between
1. sequence amino acids
2. their corresponding positions in a core template
1. aminoacidpreferences for solvent accessibility
2. aminoacid preferences for particular secondarystructures
3. interactionsamong spatiallyneighboringaminoacids
“objective function includes interactions
between neighboring (in 3D) amino acids”

Energy of every alignment is given by the sumof pairwise residue-residue interactions.

Solvent accessibility
 Residue solvent accessibility is defined as the extent of accessible surface area of a given residue
 Due to the spatial arrangement and packing
 Important in fold recognition process
 RSA prediction is done by:
1. Two(exposed-buried) and three-state models(exposed-intermediate-buried)
2. Based on relative RSA
 Algorithms:
1. Neural network
2. Nearest neighbour
3. Support vector regression

Search space
 If interaction terms between amino acids are not allowed
– dynamic programming will find optimal alignment efficiently
--deterministic
 If interaction terms allowed
– heuristic methods (fast ,might not find the optimal alignment )
– exact methods (optimal, might take exponential time ,might fail due to time or space limits)
eg: branch and bound

Branch and bound
 Objective function definition
 Lower bound setup
 Splitting of threading sets:
• split the segment having the widest interval
• choose a spit point as the value that results in
the lower bound for the set

 score function recognizes correct arrangements of protein residues.
 usually more coarse-grained than those used in a real energy calculation.
 The residues are placed on the backbone of the template structure and from there, one can
calculate ideal coordinates for the Cβ atom.
 Since, most of the chemical identity of a residue comes from an interaction site located at the Cβ
residue
Howto builda scoring function?
(i) Contact potentials
(ii) Quasi-chemical approx.
If we know the concentration of two particles A and B, we can calculate how often they will be
observed at a certain distance from each other by chance.
G which is a function of the distance rAB between particles of types A and B:
K=Boltzmanns constant
T= temperature.
ρ rAB is the observed
frequency of AB pairs
at distance r
ρ rAB0 is the frequency
of AB pairs at distance
(by chance)

(a)Contact Potential
Potential models:
Solvation-Hamiltonian
Two-body Hamiltonian
Three-body Hamiltonian
Inter-residue potential
Amino acids interact if they are
spatially located within a certain
distance.(contact)

Excerpt from Jones et.al.1992
Ala-Ala and Cys-Cys

(b)Quasi-chemical
o Approximation method
o For deriving pairwise contact potentials from -> number of residue-residue contacts found
o Quite successful
o Finds the interaction parameters for amino acids
o By measuring ∆G (experimentally) for mutated proteins
o We obtain the differences in contact energies and then can be used in the potential models
∆G=Hmut-Hwild H=Potential energy

Assessment
1. Based on the energy function(lower the energy better the s-s alignment)
2. Identification of "reliable" versus "unreliable" parts of a threaded structure by quantitative assessment of the
structural deviations in terms of RMSDfor regions of predicted structures.
3. Calculation of z-scores. Aim is the find the score function which gives the greatest z-score.

Loop modeling
 A threading program could provide a somewhat accurate structure for the backbone atoms in the core
secondary structures while predictions for the loop regions are often not accurate.
 Since, secondary structures among homologous proteins are generally "well" conserved, loops are often not.
Hence, template-based loop predictions are generally not accurate.
 MODELLER which runs a protocol of energy minimization and molecular dynamics simulation to refine a
structural model.
 After a structure model is generated, one can apply structure assessment tools such as WHATIF and
PROCHECK
 Based on this assessment, a user can pick the best among the multiple structures derived from an
alignment.

Variations
1D-3D profile methods
Prepare a profile first for each residue
1. How buried it is
2. Environment(polar/non-polar)
3. Local secondary structure(helix/sheet)
4. We calculate score for a sequence by DP
5. Calculate the significance by z-score

1. It also requires searching over a large set of possible alignments for the one that delivers minimum ``energy‘’. Such
a search is an NP complete problem (i.e. that there is an apparent ``Levinthal'' paradox in threading).
2. The search in threading is biased by the energy function, so that the related key issue is the precision of the energy
function.
3. First, fold recognition for structural analogues and some remote homologues is still challenging(modeling
techniques such as protein threading, but the predictions typically gave a low confidence level)
4. Even when a correct fold is identified, the accuracy of threading alignment has been about 60-90% for proteins
with less than 30% sequence identity with their template structures.
5. The current energy functions are generally coarse gained mainly to achieve fast predictions
6. There is still significant room for further improving the computational efficiency of threading programs
Limitations

References
1. A new approach to protein fold recognition ,D.T.Jones(1992)
2. Protein Structure Prediction by Protein Threading Ying Xu, Zhijie Liu, Liming Cai, and Dong Xu
3. https://web.stanford.edu/class/cs273/refs/torda_chapter_proteomics.pdf
4. http://www.mit.edu/~leonid/publications/Mirny_Shakhnovich_ProtStucPredThread.pdf
5. https://biostat.wisc.edu/bmi776/lectures/threading.pdf
6. Protein Fold Recognition by Prediction-based Threading Burkhard Rost 1,2 *, Reinhard Schneider1 and
Chris Sander
7. COS 597c: Topics in Computational Molecular Biology Lecturer: Larry Brown Scribe: Jessica Bessler 1

Protein Threading

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Protein Threading

Similar to Protein Threading (20)

More from SANJANA PANDEY

More from SANJANA PANDEY (7)

Recently uploaded

Recently uploaded (20)

Protein Threading