SlideShare a Scribd company logo
1 of 33
Protein structure prediction using protein
threading
Sanjana Pandey
MSc. Bioinformatics
Overview
Protein threading definition:DONE
Why do we need protein threading:DONE
Basic principles:DONE
Workflow:DONE
Describe each step of the workflow:DOING
Assesment step is important
Softwares/tools
Advantages/application
Limitation
References
thankyou
“Threading”?
o Threading=placing/aligning
o Aminoacidsequence is beingthreaded“into” the
templatestructure by “statisticalprinciples” and
stitchthe aligned regions together.
Placed into by force
o Given a Protein sequence and a template library return the best sequence-structure alignment
o In threading, a newsequence is mounted on a series of known folds withthe goal of findinga fold(a
sequence-structure alignment)that providesthe best score (lowest energy).
Protein structure prediction
Need for threading
 Sequence Homology <20%
 Fold recognition
 No 3D structure similarity
 Computational limitations
Protein threading v/s Fold recognition
o Structure prediction method o Identification of “folds”
o Involves the process of
“fold recognition”
Each different topology of alpha helices and beta sheets make up the folds.
All-alpha, All-beta, Alpha/Beta etc.
History
1. (Bowie et aI., 1991) on "the inverse protein folding problem" - foundation
Used simple measures for fitness of different amino acid types to local structural environments in terms of
solvent accessibility and protein secondary structure.
2. The work by Jones, Taylor, and Thornton
Principles
 limitednumber of basic folds found in nature ~1500
 aminoacidpreferences for different structural
environments provide sufficient informationto choose
amongfolds
Thecoresecondarystructureregions(helix& sheets)couldbemodelledandstructureis predicted
Taking into account the large number of
amino acid sequences in databases like
UniProt, one would expect a high number
of folds. But in reality it is limited, it
appears that nature has re-used the same
fold again and again for performing new
functions.
Requirements
1. Query sequence
2. Library of core fold templates
3. Objective function (evaluate any particular placement of a sequence in a core template )
4. Method for searching over space of alignments between sequence and each core template
5. Method for choosing the best template given alignments
Workflow
Fold library
Query sequence
Thread it by their sequential order (gaps allowed) into structural positions of a template structure
Optimize by fitness scores
Sequence structure alignments
Statistical assessment
Prediction of backbone atoms of the query protein
1.Sequence selection
 The query sequence “target
 Homology<20%
 Result depends on the size and details of the library.
 For better results, library must be large and sufficient
 Remove homologous structures
2.Fold library
FSSP PDB-Select PISCES
3.Threading
 Different proteins fold into similar 3D shapes -similar interaction patterns among and between their residues
and environment.
 Interaction patterns could possibly be captured using simple statistics-based energy models.
 Threading/Placing the (backbone atoms of) residues of a query protein into the correct structural positions in
a correct structural fold needs:
(a) an energy function whose global minimum will correspond to the correct placement of residues into the
correct structural template
(b) an algorithm to find the global minimum of the given energy function.
4.Optimization
 Possible sequence-template alignments are scored using a specified objective function
 Objective function scores the sequence-structure compatibility between
1. sequence amino acids
2. their corresponding positions in a core template
1. aminoacidpreferences for solvent accessibility
2. aminoacid preferences for particular secondarystructures
3. interactionsamong spatiallyneighboringaminoacids
“objective function includes interactions
between neighboring (in 3D) amino acids”
Energy Function
Energy of every alignment is given by the sumof pairwise residue-residue interactions.
Solvent accessibility
 Residue solvent accessibility is defined as the extent of accessible surface area of a given residue
 Due to the spatial arrangement and packing
 Important in fold recognition process
 RSA prediction is done by:
1. Two(exposed-buried) and three-state models(exposed-intermediate-buried)
2. Based on relative RSA
 Algorithms:
1. Neural network
2. Nearest neighbour
3. Support vector regression
Search space
 If interaction terms between amino acids are not allowed
– dynamic programming will find optimal alignment efficiently
--deterministic
 If interaction terms allowed
– heuristic methods (fast ,might not find the optimal alignment )
– exact methods (optimal, might take exponential time ,might fail due to time or space limits)
eg: branch and bound
Branch and bound
 Objective function definition
 Lower bound setup
 Splitting of threading sets:
• split the segment having the widest interval
• choose a spit point as the value that results in
the lower bound for the set
 score function recognizes correct arrangements of protein residues.
 usually more coarse-grained than those used in a real energy calculation.
 The residues are placed on the backbone of the template structure and from there, one can
calculate ideal coordinates for the Cβ atom.
 Since, most of the chemical identity of a residue comes from an interaction site located at the Cβ
residue
Howto builda scoring function?
(i) Contact potentials
(ii) Quasi-chemical approx.
If we know the concentration of two particles A and B, we can calculate how often they will be
observed at a certain distance from each other by chance.
G which is a function of the distance rAB between particles of types A and B:
K=Boltzmanns constant
T= temperature.
ρ rAB is the observed
frequency of AB pairs
at distance r
ρ rAB0 is the frequency
of AB pairs at distance
(by chance)
(a)Contact Potential
Potential models:
Solvation-Hamiltonian
Two-body Hamiltonian
Three-body Hamiltonian
Inter-residue potential
Amino acids interact if they are
spatially located within a certain
distance.(contact)
Excerpt from Jones et.al.1992
Ala-Ala and Cys-Cys
(b)Quasi-chemical
o Approximation method
o For deriving pairwise contact potentials from -> number of residue-residue contacts found
o Quite successful
o Finds the interaction parameters for amino acids
o By measuring ∆G (experimentally) for mutated proteins
o We obtain the differences in contact energies and then can be used in the potential models
∆G=Hmut-Hwild H=Potential energy
Assessment
1. Based on the energy function(lower the energy better the s-s alignment)
2. Identification of "reliable" versus "unreliable" parts of a threaded structure by quantitative assessment of the
structural deviations in terms of RMSDfor regions of predicted structures.
3. Calculation of z-scores. Aim is the find the score function which gives the greatest z-score.
Loop modeling
 A threading program could provide a somewhat accurate structure for the backbone atoms in the core
secondary structures while predictions for the loop regions are often not accurate.
 Since, secondary structures among homologous proteins are generally "well" conserved, loops are often not.
Hence, template-based loop predictions are generally not accurate.
 MODELLER which runs a protocol of energy minimization and molecular dynamics simulation to refine a
structural model.
 After a structure model is generated, one can apply structure assessment tools such as WHATIF and
PROCHECK
 Based on this assessment, a user can pick the best among the multiple structures derived from an
alignment.
Tools/Softwares
Variations
1D-3D profile methods
Prepare a profile first for each residue
1. How buried it is
2. Environment(polar/non-polar)
3. Local secondary structure(helix/sheet)
4. We calculate score for a sequence by DP
5. Calculate the significance by z-score
1. It also requires searching over a large set of possible alignments for the one that delivers minimum ``energy‘’. Such
a search is an NP complete problem (i.e. that there is an apparent ``Levinthal'' paradox in threading).
2. The search in threading is biased by the energy function, so that the related key issue is the precision of the energy
function.
3. First, fold recognition for structural analogues and some remote homologues is still challenging(modeling
techniques such as protein threading, but the predictions typically gave a low confidence level)
4. Even when a correct fold is identified, the accuracy of threading alignment has been about 60-90% for proteins
with less than 30% sequence identity with their template structures.
5. The current energy functions are generally coarse gained mainly to achieve fast predictions
6. There is still significant room for further improving the computational efficiency of threading programs
Limitations
References
1. A new approach to protein fold recognition ,D.T.Jones(1992)
2. Protein Structure Prediction by Protein Threading Ying Xu, Zhijie Liu, Liming Cai, and Dong Xu
3. https://web.stanford.edu/class/cs273/refs/torda_chapter_proteomics.pdf
4. http://www.mit.edu/~leonid/publications/Mirny_Shakhnovich_ProtStucPredThread.pdf
5. https://biostat.wisc.edu/bmi776/lectures/threading.pdf
6. Protein Fold Recognition by Prediction-based Threading Burkhard Rost 1,2 *, Reinhard Schneider1 and
Chris Sander
7. COS 597c: Topics in Computational Molecular Biology Lecturer: Larry Brown Scribe: Jessica Bessler 1
Thankyou

More Related Content

What's hot (20)

Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Homology Modelling
Homology ModellingHomology Modelling
Homology Modelling
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Protien Structure Prediction
Protien Structure PredictionProtien Structure Prediction
Protien Structure Prediction
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)
 
threading and homology modelling methods
threading and homology modelling methodsthreading and homology modelling methods
threading and homology modelling methods
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Blast
BlastBlast
Blast
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Motif & Domain
Motif & DomainMotif & Domain
Motif & Domain
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Protein database
Protein databaseProtein database
Protein database
 

Similar to Protein Threading

HOMOLOGY MODELLING.pptx
HOMOLOGY MODELLING.pptxHOMOLOGY MODELLING.pptx
HOMOLOGY MODELLING.pptxMO.SHAHANAWAZ
 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody
 
L1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptxL1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptxkigaruantony
 
Presentation1
Presentation1Presentation1
Presentation1firesea
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonbomxuan868
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdfAliAhamd7
 
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdfAliAhamd7
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptxAmnaAkram29
 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200sankar basu
 
Structure based drug design- kiranmayi
Structure based drug design- kiranmayiStructure based drug design- kiranmayi
Structure based drug design- kiranmayiKiranmayiKnv
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Robin Gutell
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
Exploiting tertiary structure through local folds for crystallographic phasing
Exploiting tertiary structure through local folds for crystallographic phasingExploiting tertiary structure through local folds for crystallographic phasing
Exploiting tertiary structure through local folds for crystallographic phasingxrbiotech
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 

Similar to Protein Threading (20)

HOMOLOGY MODELLING.pptx
HOMOLOGY MODELLING.pptxHOMOLOGY MODELLING.pptx
HOMOLOGY MODELLING.pptx
 
Drug discovery presentation
Drug discovery presentationDrug discovery presentation
Drug discovery presentation
 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
 
acs.jpca.9b08723.pdf
acs.jpca.9b08723.pdfacs.jpca.9b08723.pdf
acs.jpca.9b08723.pdf
 
L1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptxL1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptx
 
Presentation1
Presentation1Presentation1
Presentation1
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
 
Bs4201462467
Bs4201462467Bs4201462467
Bs4201462467
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
 
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
 
Homology Modeling.pptx
Homology Modeling.pptxHomology Modeling.pptx
Homology Modeling.pptx
 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200
 
Structure based drug design- kiranmayi
Structure based drug design- kiranmayiStructure based drug design- kiranmayi
Structure based drug design- kiranmayi
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
Exploiting tertiary structure through local folds for crystallographic phasing
Exploiting tertiary structure through local folds for crystallographic phasingExploiting tertiary structure through local folds for crystallographic phasing
Exploiting tertiary structure through local folds for crystallographic phasing
 
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based AlgorithmsNMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 

More from SANJANA PANDEY

Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
biological membranes pdf
 biological membranes pdf biological membranes pdf
biological membranes pdfSANJANA PANDEY
 
Blood functions and composition pdf
Blood functions and composition pdfBlood functions and composition pdf
Blood functions and composition pdfSANJANA PANDEY
 
tissue engineering by sanjana pandey
tissue engineering by sanjana pandeytissue engineering by sanjana pandey
tissue engineering by sanjana pandeySANJANA PANDEY
 
CRISPR/CAS9 ppt by sanjana pandey
CRISPR/CAS9 ppt by sanjana pandeyCRISPR/CAS9 ppt by sanjana pandey
CRISPR/CAS9 ppt by sanjana pandeySANJANA PANDEY
 

More from SANJANA PANDEY (7)

Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Transcriptome project
Transcriptome projectTranscriptome project
Transcriptome project
 
Forms of DNA
Forms of DNAForms of DNA
Forms of DNA
 
biological membranes pdf
 biological membranes pdf biological membranes pdf
biological membranes pdf
 
Blood functions and composition pdf
Blood functions and composition pdfBlood functions and composition pdf
Blood functions and composition pdf
 
tissue engineering by sanjana pandey
tissue engineering by sanjana pandeytissue engineering by sanjana pandey
tissue engineering by sanjana pandey
 
CRISPR/CAS9 ppt by sanjana pandey
CRISPR/CAS9 ppt by sanjana pandeyCRISPR/CAS9 ppt by sanjana pandey
CRISPR/CAS9 ppt by sanjana pandey
 

Recently uploaded

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 

Recently uploaded (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

Protein Threading

  • 1. Protein structure prediction using protein threading Sanjana Pandey MSc. Bioinformatics
  • 2. Overview Protein threading definition:DONE Why do we need protein threading:DONE Basic principles:DONE Workflow:DONE Describe each step of the workflow:DOING Assesment step is important Softwares/tools Advantages/application Limitation References thankyou
  • 3. “Threading”? o Threading=placing/aligning o Aminoacidsequence is beingthreaded“into” the templatestructure by “statisticalprinciples” and stitchthe aligned regions together. Placed into by force o Given a Protein sequence and a template library return the best sequence-structure alignment o In threading, a newsequence is mounted on a series of known folds withthe goal of findinga fold(a sequence-structure alignment)that providesthe best score (lowest energy).
  • 5. Need for threading  Sequence Homology <20%  Fold recognition  No 3D structure similarity  Computational limitations
  • 6. Protein threading v/s Fold recognition o Structure prediction method o Identification of “folds” o Involves the process of “fold recognition” Each different topology of alpha helices and beta sheets make up the folds. All-alpha, All-beta, Alpha/Beta etc.
  • 7. History 1. (Bowie et aI., 1991) on "the inverse protein folding problem" - foundation Used simple measures for fitness of different amino acid types to local structural environments in terms of solvent accessibility and protein secondary structure. 2. The work by Jones, Taylor, and Thornton
  • 8. Principles  limitednumber of basic folds found in nature ~1500  aminoacidpreferences for different structural environments provide sufficient informationto choose amongfolds Thecoresecondarystructureregions(helix& sheets)couldbemodelledandstructureis predicted Taking into account the large number of amino acid sequences in databases like UniProt, one would expect a high number of folds. But in reality it is limited, it appears that nature has re-used the same fold again and again for performing new functions.
  • 9. Requirements 1. Query sequence 2. Library of core fold templates 3. Objective function (evaluate any particular placement of a sequence in a core template ) 4. Method for searching over space of alignments between sequence and each core template 5. Method for choosing the best template given alignments
  • 10. Workflow Fold library Query sequence Thread it by their sequential order (gaps allowed) into structural positions of a template structure Optimize by fitness scores Sequence structure alignments Statistical assessment Prediction of backbone atoms of the query protein
  • 11. 1.Sequence selection  The query sequence “target  Homology<20%  Result depends on the size and details of the library.  For better results, library must be large and sufficient  Remove homologous structures 2.Fold library FSSP PDB-Select PISCES
  • 12.
  • 13. 3.Threading  Different proteins fold into similar 3D shapes -similar interaction patterns among and between their residues and environment.  Interaction patterns could possibly be captured using simple statistics-based energy models.  Threading/Placing the (backbone atoms of) residues of a query protein into the correct structural positions in a correct structural fold needs: (a) an energy function whose global minimum will correspond to the correct placement of residues into the correct structural template (b) an algorithm to find the global minimum of the given energy function.
  • 14.
  • 15. 4.Optimization  Possible sequence-template alignments are scored using a specified objective function  Objective function scores the sequence-structure compatibility between 1. sequence amino acids 2. their corresponding positions in a core template 1. aminoacidpreferences for solvent accessibility 2. aminoacid preferences for particular secondarystructures 3. interactionsamong spatiallyneighboringaminoacids “objective function includes interactions between neighboring (in 3D) amino acids”
  • 17. Energy of every alignment is given by the sumof pairwise residue-residue interactions.
  • 18. Solvent accessibility  Residue solvent accessibility is defined as the extent of accessible surface area of a given residue  Due to the spatial arrangement and packing  Important in fold recognition process  RSA prediction is done by: 1. Two(exposed-buried) and three-state models(exposed-intermediate-buried) 2. Based on relative RSA  Algorithms: 1. Neural network 2. Nearest neighbour 3. Support vector regression
  • 19. Search space  If interaction terms between amino acids are not allowed – dynamic programming will find optimal alignment efficiently --deterministic  If interaction terms allowed – heuristic methods (fast ,might not find the optimal alignment ) – exact methods (optimal, might take exponential time ,might fail due to time or space limits) eg: branch and bound
  • 20. Branch and bound  Objective function definition  Lower bound setup  Splitting of threading sets: • split the segment having the widest interval • choose a spit point as the value that results in the lower bound for the set
  • 21.
  • 22.  score function recognizes correct arrangements of protein residues.  usually more coarse-grained than those used in a real energy calculation.  The residues are placed on the backbone of the template structure and from there, one can calculate ideal coordinates for the Cβ atom.  Since, most of the chemical identity of a residue comes from an interaction site located at the Cβ residue Howto builda scoring function? (i) Contact potentials (ii) Quasi-chemical approx. If we know the concentration of two particles A and B, we can calculate how often they will be observed at a certain distance from each other by chance. G which is a function of the distance rAB between particles of types A and B: K=Boltzmanns constant T= temperature. ρ rAB is the observed frequency of AB pairs at distance r ρ rAB0 is the frequency of AB pairs at distance (by chance)
  • 23.
  • 24. (a)Contact Potential Potential models: Solvation-Hamiltonian Two-body Hamiltonian Three-body Hamiltonian Inter-residue potential Amino acids interact if they are spatially located within a certain distance.(contact)
  • 25. Excerpt from Jones et.al.1992 Ala-Ala and Cys-Cys
  • 26. (b)Quasi-chemical o Approximation method o For deriving pairwise contact potentials from -> number of residue-residue contacts found o Quite successful o Finds the interaction parameters for amino acids o By measuring ∆G (experimentally) for mutated proteins o We obtain the differences in contact energies and then can be used in the potential models ∆G=Hmut-Hwild H=Potential energy
  • 27. Assessment 1. Based on the energy function(lower the energy better the s-s alignment) 2. Identification of "reliable" versus "unreliable" parts of a threaded structure by quantitative assessment of the structural deviations in terms of RMSDfor regions of predicted structures. 3. Calculation of z-scores. Aim is the find the score function which gives the greatest z-score.
  • 28. Loop modeling  A threading program could provide a somewhat accurate structure for the backbone atoms in the core secondary structures while predictions for the loop regions are often not accurate.  Since, secondary structures among homologous proteins are generally "well" conserved, loops are often not. Hence, template-based loop predictions are generally not accurate.  MODELLER which runs a protocol of energy minimization and molecular dynamics simulation to refine a structural model.  After a structure model is generated, one can apply structure assessment tools such as WHATIF and PROCHECK  Based on this assessment, a user can pick the best among the multiple structures derived from an alignment.
  • 30. Variations 1D-3D profile methods Prepare a profile first for each residue 1. How buried it is 2. Environment(polar/non-polar) 3. Local secondary structure(helix/sheet) 4. We calculate score for a sequence by DP 5. Calculate the significance by z-score
  • 31. 1. It also requires searching over a large set of possible alignments for the one that delivers minimum ``energy‘’. Such a search is an NP complete problem (i.e. that there is an apparent ``Levinthal'' paradox in threading). 2. The search in threading is biased by the energy function, so that the related key issue is the precision of the energy function. 3. First, fold recognition for structural analogues and some remote homologues is still challenging(modeling techniques such as protein threading, but the predictions typically gave a low confidence level) 4. Even when a correct fold is identified, the accuracy of threading alignment has been about 60-90% for proteins with less than 30% sequence identity with their template structures. 5. The current energy functions are generally coarse gained mainly to achieve fast predictions 6. There is still significant room for further improving the computational efficiency of threading programs Limitations
  • 32. References 1. A new approach to protein fold recognition ,D.T.Jones(1992) 2. Protein Structure Prediction by Protein Threading Ying Xu, Zhijie Liu, Liming Cai, and Dong Xu 3. https://web.stanford.edu/class/cs273/refs/torda_chapter_proteomics.pdf 4. http://www.mit.edu/~leonid/publications/Mirny_Shakhnovich_ProtStucPredThread.pdf 5. https://biostat.wisc.edu/bmi776/lectures/threading.pdf 6. Protein Fold Recognition by Prediction-based Threading Burkhard Rost 1,2 *, Reinhard Schneider1 and Chris Sander 7. COS 597c: Topics in Computational Molecular Biology Lecturer: Larry Brown Scribe: Jessica Bessler 1