BITS Training Protein Structure   Joost Van Durme VIB Switch Laboratory Vrije Universiteit Brussel http://www.bits.vib.be/...
Topics for today <ul><li>Exploring the protein structure databank (PDB) </li></ul><ul><li>Viewing and analyzing protein st...
<ul><li>PDB contains 65000 structures </li></ul><ul><li>EMBL-Bank  contains 114,475,051 sequences or 215,540,553,360  nucl...
The sequence-structure gap
<ul><li>X-ray crystallography (crystals) </li></ul><ul><li>Nuclear Magnetic Resonance (NMR) (in solution) </li></ul><ul><l...
<ul><li>Solving structures is lots of work (6 months to years) </li></ul><ul><li>Need lots of material/reagents. Solubilit...
What can we learn from models/structures? <ul><li>Active site structure: structure-based drug design </li></ul><ul><li>Pro...
Human vs parasite Parasite Active site
1918 Influenza Epidemic Influenza Virus
NEURAMIDASE POCKET SIALIC ACID
RELENZA SIALIC ACID
RELENZA TAMIFLU 5.000.000+ doses in NL
Trouw – 3 maart 2009
RELENZA TAMIFLU WT K i  = 1.0 H274Y K i = 1.9 WT K i  = 1.0 H274Y K i  =265 H274Y H274Y
PDB structures come from ... <ul><li>X-Ray crystallography experiments </li></ul><ul><li>NMR structure determination </li>...
Principle of X-Ray crystallography initial model electron densities
X-Ray structure
X-Ray models components <ul><li>x, y, z coordinates: define the mean atom position </li></ul><ul><li>disorder about this m...
Occupancy ATOM 625 C  ILE A 77 -11.322 28.374 -1.179 1.00 28.77 C ATOM 626 O  ILE A 77 -11.946 29.453 -1.112 1.00 28.84 O ...
Atomic B-factors <ul><li>Value which determines the precision of an atom’s given position </li></ul><ul><li>Atoms with the...
B-factor www.YASARA.org Low High
Resolution (Angstrom) <ul><li>Level of detail that can be observed in the electron density map </li></ul><ul><li>The great...
R-factor <ul><li>The difference between the observed and computed diffraction pattern </li></ul><ul><li>A measure of how w...
NMR Structure determination
NMR models components <ul><li>In solution </li></ul><ul><ul><li>study protein dynamics </li></ul></ul><ul><ul><li>solve pr...
Structure superposition  (root mean square distance) n  = number of atoms di  = distance between 2 corresponding atoms  i ...
NMR ensemble RMSD  (root mean square distance) <ul><li>Superpose the NMR models </li></ul><ul><li>Calculate RMSD of local ...
Structural data is stored in the Protein Data Bank (PDB) http://www.pdb.org Protein Data Bank (PDB)
©CMBI 2009 ©CMBI 2009 Protein Data Bank (PDB)  <ul><li>Databank for  3-dimensional structures  of biomolecules: </li></ul>...
©CMBI 2009 PDB important records (1)  <ul><li>PDB nomenclature Filename= accession number=  PDB Code Filename is 4 positio...
©CMBI 2009 PDB important records (2)  <ul><li>SEQRES Sequence of protein; be aware:  Not always all 3d-coordinates are pre...
©CMBI 2009 PDB important records (3) and at the end of the PDB file the “real”  data: ATOM one line for each atom with its...
PDB entry
PDB entry
PDB entry
©CMBI 2009 Structure Visualization  <ul><li>Structures from PDB can be visualized with: </li></ul><ul><li>YASARA (http://w...
YASARA View nomenclature Atom Residue  =  any continuous stretch of atoms sharing the same residue name, residue number an...
Standard atom colors <ul><li>C = cyan </li></ul><ul><li>O = red </li></ul><ul><li>N = blue </li></ul><ul><li>H = white </l...
Atom nomenclature N-term C-term C α C β C γ O γ N N O C α C β C C C γ C δ 1 C δ 2 OT1 OT2
FoldX: a molecular design toolkit <ul><li>Predict the effect of point mutation on the protein stability </li></ul><ul><li>...
Predict effect of point mutation <ul><li>FoldX is an empirical force field </li></ul><ul><ul><li>It is validated with calo...
Predict effect of point mutation <ul><li>FoldX calculates the stability of WT and MT and makes the difference (net effect ...
Introduction to homology modeling <ul><li>Goal:  predict a structure from its sequence with an accuracy that is comparable...
Homology Modeling
Principles of Homology Modeling <ul><li>Search for a sequence with a known structure that is very similar to the sequence ...
Sequence similarity rule <ul><li>Rost (1999) modeled lots of structures and compared them to the real ones in the PDB </li...
FoldX plugin for YASARA
Acknowledgements <ul><li>Gert Vriend, Radboud Universiteit Nijmegen, NL ( www.cmbi.ru.nl ) </li></ul><ul><li>Sander Nabuur...
Upcoming SlideShare
Loading in …5
×

Bits protein structure

1,217 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,217
On SlideShare
0
From Embeds
0
Number of Embeds
278
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Also drugs against invaders: bacteria, viruses.... Trypanosoma (sleeping disease): eukaryotic lifeform. So many proteins, just like us. So if we make a medicine that eg knocks out the glycolysis of trypanosomas, we should be very careful that we don’t knock out the glycolysis of the host. So we have to study differences in proteins between tryp and host. In picture you see active site of trypanosomas. Then we can design a ligand that blocks the trypanosomas protein active site but not that of the human.
  • Neuraminidase inhibitors Neuraminidase helps to release the viral particle from the infected cell so it can infect another cell. It breaks sialic acid bonds in the membrane so that a membrane vesicle with virus inside can travel to another cell. So the neuraminidase binds sialic acid.
  • Smart as we are, we made molecules that look like sialic acid (relenza and tamiflu) to neutralize the neuraminidase and prevent the virus from being mobile.
  • But at some point the tamiflu lost its action against neuraminidase.
  • On binding tamiflu, the conformation of the Glu 277 side chain of the wt enzyme is altered such that it exposes a hydrophobic site with which the pentyloxy group of oseltamivir interacts. In the mutant enzyme, the bulkier Tyr residue at 275 displaces the carboxyl group of Glu 277 into the binding site, such that it disrupts the hydrophobic pocket and causes a change in conformation of the pentyloxy substituent of oseltamivir (Fig. 1a), with consequent reduction in affinity of binding of some 300-fold or greater. Luckily for us, relenza’s side group isn’t that big and still fits in the pocket of the mutant neuraminidase. The question is, what will we do if the virus also beats relenza?
  • Variations in time: x-ray experiment takes seconds. Motions happen in femtoseconds. So motions can be the source of uncertainty. In space: different molecules in the crystal have a different conformation of a sidechain
  • The resolution for a given crystal depends on the ordering of the molecules in the crystal. That is, how close the unit cells thoughout the crystal are to being identical copies of one another. Rule: the larger the molecule, the lower the resolution of the data. At low resolution (4A): overall shape of molecule, no interactions! 3A: path of chain can be traced 2A: sidechains become visible 1.2-0.9A: hydrogen atoms become visible, occupancies easily detectable These structures require fewer geometric constraints during refinement and give a better indication of the true geometry of protein structures.
  • X-rays are scattered through the crystal generating a diffraction pattern from which an initial model of the structure is generated based on the electron density map. This model can also be a homology model of the new structure based on a close homolog of which we know it will adopt more or less the same structure. Using scattering theory it is possible to calculate computationally the expected diffraction pattern. Usually this will differ from the observed one. Refinement involves iteratively modifying the model untill the computed diffraction pattern has a best fit with the observed diffraction pattern. Ferredoxin was incorrectly solved due to wrong space group assignment, but all the water molecules made up for it. 344 waters on 117 residues. Refinable parameters The structural model describes a collection of scattering centres (atoms), each located at a fixed position in the crystal lattice, and with some degree of mobility or extension around that locus. In adjusting the structural model to improve the fit between calculated and observed diffraction patterns, the crystallographer may vary these and other parameters. Refinable parameters are those that may be varied in order to improve the fit. Usually they comprise atomic coordinates, atomic displacement parameters, a scale factor to bring the observed and calculated amplitudes or intensities to the same scale. They may also include extinction parameters, occupancy factors, twin component fractions, and even the assigned space group. Relations between the refinable parameters may be expressed as constraints or restraints that modify the function to be minimized.
  • Deposition procedure: an experimentalist choosing the best 20 structures from a much larger ensemble can result in very misleading statistics! eg. the best 20 models may be the best solution with only small variations, so RMSD will be small but further down the original list are alternative solutions, which are less consistent with the data but radically different!
  • Bits protein structure

    1. 1. BITS Training Protein Structure Joost Van Durme VIB Switch Laboratory Vrije Universiteit Brussel http://www.bits.vib.be/training
    2. 2. Topics for today <ul><li>Exploring the protein structure databank (PDB) </li></ul><ul><li>Viewing and analyzing protein structures with YASARA </li></ul><ul><li>Comparing similar protein structures </li></ul><ul><li>In silico mutagenesis with FoldX </li></ul><ul><li>Homology modeling with FoldX </li></ul>
    3. 3. <ul><li>PDB contains 65000 structures </li></ul><ul><li>EMBL-Bank contains 114,475,051 sequences or 215,540,553,360 nucleotides! </li></ul>Sequences and structures
    4. 4. The sequence-structure gap
    5. 5. <ul><li>X-ray crystallography (crystals) </li></ul><ul><li>Nuclear Magnetic Resonance (NMR) (in solution) </li></ul><ul><li>Electron microscopy (in native tissue) </li></ul>Structures can be solved
    6. 6. <ul><li>Solving structures is lots of work (6 months to years) </li></ul><ul><li>Need lots of material/reagents. Solubility is a problem. </li></ul><ul><li>Some protein structures are really difficult to solve: membrane proteins, extremely large proteins, protein complexes </li></ul><ul><li>The field evolves fast. Techniques improve, more user friendly software, more automatisation (x-ray infrastructure, crystal growth) </li></ul><ul><li>Despite this progress it is not expected that the sequence-structure gap will ever be closed. </li></ul>But ...
    7. 7. What can we learn from models/structures? <ul><li>Active site structure: structure-based drug design </li></ul><ul><li>Protein-protein interactions </li></ul><ul><li>Function </li></ul><ul><li>Antigenic behavior / vaccine development </li></ul><ul><li>Stabilising proteins using structural knowledge </li></ul>
    8. 8. Human vs parasite Parasite Active site
    9. 9. 1918 Influenza Epidemic Influenza Virus
    10. 10. NEURAMIDASE POCKET SIALIC ACID
    11. 11. RELENZA SIALIC ACID
    12. 12. RELENZA TAMIFLU 5.000.000+ doses in NL
    13. 13. Trouw – 3 maart 2009
    14. 14. RELENZA TAMIFLU WT K i = 1.0 H274Y K i = 1.9 WT K i = 1.0 H274Y K i =265 H274Y H274Y
    15. 15. PDB structures come from ... <ul><li>X-Ray crystallography experiments </li></ul><ul><li>NMR structure determination </li></ul><ul><li>The PDB no longer contains: </li></ul><ul><li>EM structures (too low resolution) </li></ul><ul><li>Models (too unreliable) </li></ul>
    16. 16. Principle of X-Ray crystallography initial model electron densities
    17. 17. X-Ray structure
    18. 18. X-Ray models components <ul><li>x, y, z coordinates: define the mean atom position </li></ul><ul><li>disorder about this mean: B-factor and occupancy </li></ul><ul><ul><li>variations in time and space </li></ul></ul><ul><li>B-factor: </li></ul><ul><ul><li>model the ‘smearing out’ of disorder around the mean atom position (ellipsoids) </li></ul></ul><ul><ul><li>higher B-factor means more uncertainty about position </li></ul></ul><ul><li>Occupancy: </li></ul><ul><ul><li>consider alternative conformations of the same sidechain </li></ul></ul><ul><ul><li>how often do we find this sidechain in one conformation and how often in the other conformation </li></ul></ul>
    19. 19. Occupancy ATOM 625 C ILE A 77 -11.322 28.374 -1.179 1.00 28.77 C ATOM 626 O ILE A 77 -11.946 29.453 -1.112 1.00 28.84 O ATOM 627 CA AILE A 77 -11.432 27.329 -0.087 0.70 28.15 C ATOM 628 CB AILE A 77 -12.918 26.874 0.087 0.70 28.64 C ATOM 629 CG1AILE A 77 -13.042 25.758 1.141 0.70 26.75 C ATOM 630 CG2AILE A 77 -13.516 26.421 -1.241 0.70 28.13 C ATOM 631 CD1AILE A 77 -13.378 26.302 2.501 0.70 26.47 C ATOM 632 CA BILE A 77 -11.423 27.327 -0.082 0.30 28.50 C ATOM 633 CB BILE A 77 -12.874 26.775 0.117 0.30 28.79 C ATOM 634 CG1BILE A 77 -13.519 26.423 -1.227 0.30 28.62 C ATOM 635 CG2BILE A 77 -13.748 27.739 0.916 0.30 28.40 C ATOM 636 CD1BILE A 77 -14.720 25.518 -1.100 0.30 28.69 C ATOM 637 N ARG A 78 -10.521 28.048 -2.183 1.00 28.70 N ATOM 638 CA ARG A 78 -10.258 28.952 -3.268 1.00 28.47 C ATOM 639 C ARG A 78 -10.857 28.469 -4.584 1.00 28.22 C 2VWC
    20. 20. Atomic B-factors <ul><li>Value which determines the precision of an atom’s given position </li></ul><ul><li>Atoms with the largest B-factors will have the largest positional uncertainty </li></ul><ul><li>Indication of mobility of an atom </li></ul><ul><li>0 < B < 20 : Atom is most likely OK </li></ul><ul><li>20 < B < 40 : Atom is probably OK, but positional errors up to 0.5 Ångstrom are normal </li></ul><ul><li>40 < B < 60 : Atom is probably reasonably OK, but be careful, because positional errors up to 1.0 Ångstrom can be observed </li></ul><ul><li>B > 60 : Atom is not likely to be within 1.0 Ångstrom from where you see it </li></ul><ul><li>B around 100 : Atom is guaranteed not within 1.0 Ångstrom from where you see it </li></ul>
    21. 21. B-factor www.YASARA.org Low High
    22. 22. Resolution (Angstrom) <ul><li>Level of detail that can be observed in the electron density map </li></ul><ul><li>The greater the disorder in the crystal, the lower the resolution (proportional to the protein size) </li></ul>3.0A 2.0A 1.2A
    23. 23. R-factor <ul><li>The difference between the observed and computed diffraction pattern </li></ul><ul><li>A measure of how well the refined structure predicts the observed data </li></ul><ul><li>Higher values mean less agreement </li></ul><ul><li>0.40-0.60: very unreliable </li></ul><ul><li>0.20 seems to be the standard threshold </li></ul>electron density map
    24. 24. NMR Structure determination
    25. 25. NMR models components <ul><li>In solution </li></ul><ul><ul><li>study protein dynamics </li></ul></ul><ul><ul><li>solve protein structures that are difficult to crystallize </li></ul></ul><ul><li>Nuclear Overhauser Effect or NOE </li></ul><ul><ul><li>intensities of signal peaks correspond to short inter-atomic distances between spatially close protons (NOE distances) </li></ul></ul><ul><li>NOE constraints are known with low precision. E.g. NOEs are binned 2.5-4.0, 4.0-5.5, and 5.5-7.0 Angstrom </li></ul><ul><li>Multiple models are generated that are consistent with the distance and angle constraints using e.g. molecular dynamics: the NMR ensemble </li></ul><ul><li>Take average or best model from ensemble for PDB deposition, or just deposit a selected ensemble of superposed structures </li></ul>
    26. 26. Structure superposition (root mean square distance) n = number of atoms di = distance between 2 corresponding atoms i in 2 structures The more atoms superpose on each other, the lower the RMSD <ul><li>Unit of RMSD => Ångstroms </li></ul><ul><li>identical structures => RMSD = “0” </li></ul><ul><li>similar structures => RMSD is small (1 – 3 Å) </li></ul><ul><li>distant structures => RMSD > 3 Å </li></ul><ul><li>However, care has to be taken as RMSD is length dependent and dominated by outliers: </li></ul><ul><li>comparison of two short peptide structures can result in a small RMSD even if their structure is visibly different. </li></ul><ul><li>very similar structures can have a bad RMSD due to a short part of the structures that is very different (loops) </li></ul><ul><li>Insertions and deletions are not implemented in the RMSD calculation, since we only look at equivalent atoms/residues (see figure) </li></ul>
    27. 27. NMR ensemble RMSD (root mean square distance) <ul><li>Superpose the NMR models </li></ul><ul><li>Calculate RMSD of local regions and also whole models </li></ul><ul><li>Regions with high RMSD are less well defined by the data </li></ul>
    28. 28. Structural data is stored in the Protein Data Bank (PDB) http://www.pdb.org Protein Data Bank (PDB)
    29. 29. ©CMBI 2009 ©CMBI 2009 Protein Data Bank (PDB) <ul><li>Databank for 3-dimensional structures of biomolecules: </li></ul><ul><ul><li>Protein </li></ul></ul><ul><ul><li>DNA </li></ul></ul><ul><ul><li>RNA </li></ul></ul><ul><ul><li>Ligands </li></ul></ul><ul><li>Obligatory deposit of coordinates in the PDB before publication </li></ul><ul><li>~ 65 000 entries (April 2010) ( ~27000 “unique” structures) </li></ul><ul><li>PDB file is a keyword-organised flat-file ( 80 column) </li></ul><ul><li>1) human readable 2) every line starts with a keyword (3-6 letters) 3) platform independent </li></ul>
    30. 30. ©CMBI 2009 PDB important records (1) <ul><li>PDB nomenclature Filename= accession number= PDB Code Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN.pdb) </li></ul><ul><li>HEADER describes molecule & gives deposition date HEADER PLANT SEED PROTEIN 30-APR-81 1CRN </li></ul><ul><li>CMPND name of molecule COMPND CRAMBIN </li></ul><ul><li>SOURCE organism SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED </li></ul>
    31. 31. ©CMBI 2009 PDB important records (2) <ul><li>SEQRES Sequence of protein; be aware: Not always all 3d-coordinates are present for all the amino acids in SEQRES!! SEQRES 1 46 THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE 1CRN 51 SEQRES 2 46 ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS 1CRN 52 SEQRES 3 46 ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR 1CRN 53 SEQRES 4 46 CYS PRO GLY ASP TYR ALA ASN 1CRN 54 </li></ul><ul><li>SSBOND disulfide bridges SSBOND 1 CYS 3 CYS 40 </li></ul><ul><li>SSBOND 2 CYS 4 CYS 32 </li></ul>
    32. 32. ©CMBI 2009 PDB important records (3) and at the end of the PDB file the “real” data: ATOM one line for each atom with its unique name and its x,y,z coordinates ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70 ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71 ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72 ATOM 4 O THR 1 15.268 13.825 5.594 1.00 9.85 1CRN 73 ATOM 5 CB THR 1 18.170 12.703 5.337 1.00 13.02 1CRN 74 ATOM 6 OG1 THR 1 19.334 12.829 4.463 1.00 15.06 1CRN 75 ATOM 7 CG2 THR 1 18.150 11.546 6.304 1.00 14.23 1CRN 76 ATOM 8 N THR 2 15.115 11.555 5.265 1.00 7.81 1CRN 77 ATOM 9 CA THR 2 13.856 11.469 6.066 1.00 8.31 1CRN 78 ATOM 10 C THR 2 14.164 10.785 7.379 1.00 5.80 1CRN 79 ATOM 11 O THR 2 14.993 9.862 7.443 1.00 6.94 1CRN 80
    33. 33. PDB entry
    34. 34. PDB entry
    35. 35. PDB entry
    36. 36. ©CMBI 2009 Structure Visualization <ul><li>Structures from PDB can be visualized with: </li></ul><ul><li>YASARA (http://www.yasara.org) </li></ul><ul><li>SwissPDBViewer (http://spdbv.vital-it.ch/) </li></ul><ul><li>PyMOL (http://www.pymol/org) </li></ul><ul><li>Chimera (http:// www.cgl.ucsf.edu/chimera ) </li></ul>
    37. 37. YASARA View nomenclature Atom Residue = any continuous stretch of atoms sharing the same residue name, residue number and molecule name Molecule = any continuous stretch of residues sharing the same molecule name (PDB calls this a CHAIN) Object = a collection of molecules and additional items
    38. 38. Standard atom colors <ul><li>C = cyan </li></ul><ul><li>O = red </li></ul><ul><li>N = blue </li></ul><ul><li>H = white </li></ul><ul><li>S = green </li></ul>
    39. 39. Atom nomenclature N-term C-term C α C β C γ O γ N N O C α C β C C C γ C δ 1 C δ 2 OT1 OT2
    40. 40. FoldX: a molecular design toolkit <ul><li>Predict the effect of point mutation on the protein stability </li></ul><ul><li>Predict the 3D structure of a sequence: homology modeling </li></ul>
    41. 41. Predict effect of point mutation <ul><li>FoldX is an empirical force field </li></ul><ul><ul><li>It is validated with calorimetric experiments </li></ul></ul><ul><ul><li>E.g. If such an experiment concludes that breaking a hydrogen bond costs 1.5 kcal/mol, FoldX uses this knowledge rather than using theoretical physics equations </li></ul></ul><ul><li>FoldX compares WT and mutant for: </li></ul><ul><ul><li>Hydrogen bonds, electrostatics, Van der Waals clashes and contacts, entropy, desolvation, ... </li></ul></ul><ul><li>FoldX energies </li></ul><ul><ul><li>Energy of a single molecule is meaningless </li></ul></ul><ul><ul><li>The difference in energy of two molecules (such as WT and a point mutant) approaches realistic values </li></ul></ul>
    42. 42. Predict effect of point mutation <ul><li>FoldX calculates the stability of WT and MT and makes the difference (net effect of mutation): </li></ul><ul><ul><li>Δ G MT - Δ G WT = ΔΔ G m u t at i o n </li></ul></ul><ul><li>If ΔΔ G m u t at i o n </li></ul><ul><ul><li>> 0 : mutation is bad for stability </li></ul></ul><ul><ul><li>< 0 : mutation is good for stability </li></ul></ul><ul><li>FoldX error margin is 0.5 kcal/mol, so changes within this margin are meaningless </li></ul>
    43. 43. Introduction to homology modeling <ul><li>Goal: predict a structure from its sequence with an accuracy that is comparable to the best results achieved experimentally (X-Ray) </li></ul><ul><li>Protein modeling is the only way to obtain structural information when experimental techniques (x-ray, NMR, EM) fail </li></ul>
    44. 44. Homology Modeling
    45. 45. Principles of Homology Modeling <ul><li>Search for a sequence with a known structure that is very similar to the sequence with the unknown structure. Build model using known structure as template </li></ul><ul><li>The structure of a protein is uniquely determined by its amino acid sequence </li></ul><ul><li>Structure is more conserved than sequence </li></ul><ul><ul><li>Similar sequences adopt nearly exact same structure </li></ul></ul><ul><ul><li>Distantly related sequences can still fold into a similar structure </li></ul></ul>
    46. 46. Sequence similarity rule <ul><li>Rost (1999) modeled lots of structures and compared them to the real ones in the PDB </li></ul><ul><li>Derived precise limits for homology modeling </li></ul><ul><li>This rule tells you whether a model will be reliable or unreliable </li></ul>
    47. 47. FoldX plugin for YASARA
    48. 48. Acknowledgements <ul><li>Gert Vriend, Radboud Universiteit Nijmegen, NL ( www.cmbi.ru.nl ) </li></ul><ul><li>Sander Nabuurs, Lead Pharma, Nijmegen, NL </li></ul><ul><li>Greet De Baets, VIB Switch Laboratory </li></ul>

    ×