Protein Structure, Databases and Structural Alignment


Published on

Protein Structure, Databases and Structural Alignment

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Atoms on the surface have a higher degree of freedom than those in the core
  • Protein Structure, Databases and Structural Alignment

    1. 1. Protein Structure, DatabaSeS anDStructural alignmentSaramita DeChakravartiResearch Scientist, II (i(ChembiotechResearchLaboratories1
    2. 2. 2Basics of proteinBasics of proteinstructurestructure
    3. 3. 3Why Proteins Structure ?Why Proteins Structure ? Proteins are fundamental components of all livingcells, performing a variety of biological tasks. Each protein has a particular 3D structure thatdetermines its function. Protein structure is more conserved than proteinsequence, and more closely related to function.
    4. 4. 4Protein core - usually conserved.Protein loops - variable regionsHydrophobic coreSurface loopsProtein Structure
    5. 5. 5Supersecondary structuresAssembly of secondary structures which areshared by many structures.Beta hairpinBeta-alpha-beta unitHelix hairpin
    6. 6. 6Hemoglobin (1bab(Fold: General structure composed ofsets of Supersecondary structures
    7. 7. 7 Many Folds Are There ?How Many Folds Are There ?
    8. 8. 8• Two conserved sequences similar structures• Two similar structures conserved sequences?Structure – Sequence RelationshipsStructure – Sequence RelationshipsThere are cases of proteins with the samestructure but no clear sequence similarity.
    9. 9. 9Principles of Protein Structure•Todays proteins reflect millions of years ofevolution.•3D structure is better conserved than sequenceduring evolution.•Similarities among sequences or amongstructures may reveal information about sharedbiological functions of a protein family.
    10. 10. 10The Levinthal paradoxAssume a protein is comprised of 100 AAs and thateach AA can take up 10 different conformations.Altogether we get:10100(i.e. google( conformations.If each conformation were sampled in the shortestpossible time (time of a molecular vibration ~ 10-13s(it would take an astronomical amount of time (~1077years( to sample all possible conformations, in orderto find the Native State.
    11. 11. 11The Levinthal paradoxLuckily, nature works out with these sorts ofnumbers and the correct conformation of a proteinis reached within seconds.
    12. 12. 12How is the 3D Structure Determined ?How is the 3D Structure Determined ?Experimental methods (Best approach(:Experimental methods (Best approach(:• X-rays crystallography.• NMR.• Others (e.g., neutron diffraction(.
    13. 13. 13How is the 3D Structure Determined ?How is the 3D Structure Determined ?In-silico methodsIn-silico methodsAb-initio structure prediction given only thesequence as input - not always successful.
    14. 14. 14A note on ab-initio predictions: Thecurrent state is that “failure can nolonger be guaranteed”…
    15. 15. 15A note on ab-initio secondary structureprediction: Success ~70%.
    16. 16. 16How is the 3D Structure Determined ?How is the 3D Structure Determined ?In-silico methodsIn-silico methodsThreading = Sequence-structure alignment. Theidea is to search for a structure and sequence inexisting databases of 3D structure, and usesimilarity of sequences + information on thestructures to find best predicted structures.
    17. 17. 17Comments• X-ray crystallography is the most widelyused method.• Quaternary structure of large proteins(ribosomes, virus particles, etc) can bedetermined by electron microscopes(cryoEM).
    18. 18. 18Protein DatabasesProtein Databases
    19. 19. 19PDB: Protein Data Bank• Holds 3D models of biological macromolecules(protein, RNA, DNA).• All data are available to the public.• Obtained by X-Ray crystallography (84%) or NMRspectroscopy (16%).• Submitted by biologists and biochemists fromaround the world.
    20. 20. 20PDB: Protein Data Bank•Founded in 1971 by Brookhaven NationalLaboratory, New York.•Transferred to the Research Collaboratoryfor Structural Bioinformatics (RCSB) in 1998.•Currently it holds > 49,426 releasedstructures.61695
    21. 21. 21PDB - model• A model defines the 3D positions of atoms inone or more molecules.• There are models of proteins, proteincomplexes, proteins and DNA, proteinsegments, etc …• The models also include the positions of ligandmolecules, solvent molecules, metal ions, etc.
    22. 22. 22PDB – Protein Data Bank
    23. 23. 23The PDB file – text formatThe PDB file – text format
    24. 24. 24The PDB file – textThe PDB file – text formatformatATOM:Usually proteinor DNAHETATM:Usually Ligand,ion, waterchainResidueidentityResiduenumberAtomnumberAtomidentityThe coordinatesfor each residue inthe structureX Y Z
    25. 25. 25Structural Alignment
    26. 26. 26Why structural alignment?• Structural similarity can point to remoteevolutionary relationship• Shared structural motifs among proteinssuggest similar biological function• Getting insight into sequence-structuremapping (e.g., which parts of the proteinstructure are conserved among relatedorganisms).
    27. 27. 27As in any alignment problem, we cansearch for GLOBAL ALIGNMENT or forLOCAL ALIGNMENT
    28. 28. 28Human Myoglobinpdb:2mm1Human Hemoglobinalpha-chainpdb:1jebASequence id: 27%Structural id: 90%
    29. 29. 29What is the best transformation thatWhat is the best transformation thatsuperimposes the unicorn on the lion?superimposes the unicorn on the lion?
    30. 30. 30Solution:Regard the shapes as sets of pointsand try to “match”these sets using a transformation
    31. 31. 31This is not a good result….
    32. 32. 32Good result:
    33. 33. 33Kinds of transformations:• Rotation• Translation• Scalingand more….
    34. 34. 34Translation:XY
    35. 35. 35Rotation:XY
    36. 36. 36Scale:XY
    37. 37. 37We represent a protein as a geometricobject in the plane.The object consists of points representedby coordinates (x, y, z).ThrLysMet GlyGluAla
    38. 38. 38The aim:Given two proteinsFind the transformation that producesthe best Superimposition of one proteinonto the other
    39. 39. 39Correspondence is UnknownGiven two configurations of points in the threedimensional space:+
    40. 40. 40Find those rotations and translations of one of the pointsets which produce “large” superimpositions ofcorresponding 3-D points?
    41. 41. 41The best transformation:T
    42. 42. 42Simple case – two closely related proteins with thesame number of amino acids.Question:how do we asses thequality of thetransformation?+
    43. 43. 43Scoring the AlignmentTwo point sets: A={ai} i=1…nB={bj} j=1…m• Pairwise Correspondence:(ak1,bt1) (ak2,bt2)… (akN,btN)(1) Bottleneck max ||aki – bti||(2) RMSD (Root Mean Square Distance)Sqrt( Σ||aki – bti||2/N)
    44. 44. 44RMSD – Root Mean SquareDeviationGiven two sets of 3-D points :P={pi}, Q={qi} , i=1,…,n;rmsd(P,Q) = √ Σ i|pi - qi |2/nFind a 3-D transformation T*such that:rmsd( T*(P), Q ) = minT √ Σ i|T(pi) - qi |2/nFind the highest number of atoms aligned with the lowest RMSD
    45. 45. 45Pitfalls of RMSD• all atoms are treated equally(residues on the surface have a higher degree offreedom than those in the core)• best alignment does not always mean minimalRMSD• does not take into account the attributes of theamino acids
    46. 46. 46Flexible alignment vs. RigidalignmentRigid alignmentFlexible alignment
    47. 47. 47Some more issuesSome more issues
    48. 48. 48Does the fact that all proteins have alpha-helix indicates that they are all evolutionaryrelated?No. Alpha helices reflect physical constraints,as do beta sheets.For structures – it is difficult sometimesto separate convergent evolution fromevolutionary relatedness.
    49. 49. 49Structural genomics: solve or predict 3D ofall proteins of a given organism (X-ray, NMR,and homology modelling).Unlike traditional structural biology, 3D isoften solved before anything is known onthe protein in question. A new challengeemerged: predict a protein’s function fromits 3D structure.
    50. 50. 50CASP: a competition for predicting 3Dstructures.Instead of running to publish a new 3Dstructure, the AA sequence is published andeach group is invited to give theirpredictions.
    51. 51. 51Capri: same as casp – but for docking.
    52. 52. 52Homology modeling: predicting thestructure from a closely related knownstructure.This can be important for example topredict how a mutation influences thestructure
    53. 53. 53