BITS Training Protein Structure   Joost Van Durme VIB Switch Laboratory Vrije Universiteit Brussel http://www.bits.vib.be/training
Topics for today Exploring the protein structure databank (PDB) Viewing and analyzing protein structures with YASARA Comparing similar protein structures In silico  mutagenesis with FoldX Homology modeling with FoldX
PDB contains 65000 structures EMBL-Bank  contains 114,475,051 sequences or 215,540,553,360  nucleotides! Sequences and structures
The sequence-structure gap
X-ray crystallography (crystals) Nuclear Magnetic Resonance (NMR) (in solution) Electron microscopy (in native tissue) Structures can be solved
Solving structures is lots of work (6 months to years) Need lots of material/reagents. Solubility is a problem. Some protein structures are really difficult to solve: membrane proteins, extremely large proteins, protein complexes The field evolves fast. Techniques improve, more user friendly software, more automatisation (x-ray infrastructure, crystal growth) Despite this progress it is not expected that the sequence-structure gap will ever be closed. But ...
What can we learn from models/structures? Active site structure: structure-based drug design Protein-protein interactions Function Antigenic behavior / vaccine development Stabilising proteins using structural knowledge
Human vs parasite Parasite Active site
1918 Influenza Epidemic Influenza Virus
NEURAMIDASE POCKET SIALIC ACID
RELENZA SIALIC ACID
RELENZA TAMIFLU 5.000.000+ doses in NL
Trouw – 3 maart 2009
RELENZA TAMIFLU WT K i  = 1.0 H274Y K i = 1.9 WT K i  = 1.0 H274Y K i  =265 H274Y H274Y
PDB structures come from ... X-Ray crystallography experiments NMR structure determination The PDB no longer contains: EM structures (too low resolution) Models (too unreliable)
Principle of X-Ray crystallography initial model electron densities
X-Ray structure
X-Ray models components x, y, z coordinates: define the mean atom position disorder about this mean: B-factor and occupancy variations in time and space B-factor: model the ‘smearing out’ of disorder around the mean atom position (ellipsoids) higher B-factor means more uncertainty about position Occupancy: consider alternative conformations of the same sidechain how often do we find this sidechain in one conformation and how often in the other conformation
Occupancy ATOM 625 C  ILE A 77 -11.322 28.374 -1.179 1.00 28.77 C ATOM 626 O  ILE A 77 -11.946 29.453 -1.112 1.00 28.84 O  ATOM 627 CA AILE A 77 -11.432 27.329 -0.087 0.70 28.15 C  ATOM 628 CB AILE A 77 -12.918 26.874  0.087 0.70 28.64 C  ATOM 629 CG1AILE A 77 -13.042 25.758  1.141 0.70 26.75 C  ATOM 630 CG2AILE A 77 -13.516 26.421 -1.241 0.70 28.13 C  ATOM 631 CD1AILE A 77 -13.378 26.302  2.501 0.70 26.47 C  ATOM 632 CA BILE A 77 -11.423 27.327 -0.082 0.30 28.50 C  ATOM 633 CB BILE A 77 -12.874 26.775  0.117 0.30 28.79 C  ATOM 634 CG1BILE A 77 -13.519 26.423 -1.227 0.30 28.62 C  ATOM 635 CG2BILE A 77 -13.748 27.739  0.916 0.30 28.40 C  ATOM 636 CD1BILE A 77 -14.720 25.518 -1.100 0.30 28.69 C  ATOM 637 N  ARG A 78 -10.521 28.048 -2.183 1.00 28.70 N  ATOM 638 CA  ARG A 78 -10.258 28.952 -3.268 1.00 28.47 C  ATOM 639 C  ARG A 78 -10.857 28.469 -4.584 1.00 28.22 C  2VWC
Atomic B-factors Value which determines the precision of an atom’s given position Atoms with the largest B-factors will have the largest positional uncertainty Indication of mobility of an atom 0 < B < 20 : Atom is most likely OK  20 < B < 40 : Atom is probably OK, but positional errors up to 0.5 Ångstrom are normal  40 < B < 60 : Atom is probably reasonably OK, but be careful, because positional errors up to 1.0 Ångstrom can be observed  B > 60 : Atom is not likely to be within 1.0 Ångstrom from where you see it  B around 100 : Atom is guaranteed not within 1.0 Ångstrom from where you see it
B-factor www.YASARA.org Low High
Resolution (Angstrom) Level of detail that can be observed in the electron density map The greater the disorder in the crystal, the lower the resolution (proportional to the protein size) 3.0A 2.0A 1.2A
R-factor The difference between the observed and computed diffraction pattern A measure of how well the refined structure predicts the observed data Higher values mean less agreement 0.40-0.60: very unreliable 0.20 seems to be the standard threshold electron density map
NMR Structure determination
NMR models components In solution study protein dynamics solve protein structures that are difficult to crystallize Nuclear Overhauser Effect or NOE intensities of signal peaks correspond to short inter-atomic distances between spatially close protons (NOE distances) NOE constraints are known with low precision. E.g. NOEs are binned 2.5-4.0, 4.0-5.5, and 5.5-7.0 Angstrom Multiple models  are generated that are consistent with the distance and angle constraints using e.g. molecular dynamics: the NMR ensemble Take average or best model from ensemble for PDB deposition, or just deposit a selected ensemble of  superposed  structures
Structure superposition  (root mean square distance) n  = number of atoms di  = distance between 2 corresponding atoms  i  in 2 structures The more atoms superpose on each other, the lower the RMSD Unit of RMSD => Ångstroms identical structures =>  RMSD  = “0” similar structures =>  RMSD  is small (1 – 3 Å) distant structures =>  RMSD  > 3 Å However, care has to be taken as RMSD is length dependent and dominated by outliers: comparison of two short peptide structures can result in a small RMSD even if their structure is visibly different.  very similar structures can have a bad RMSD due to a short part of the structures that is very different (loops) Insertions and deletions are not implemented in the RMSD calculation, since we only look at equivalent atoms/residues (see figure)
NMR ensemble RMSD  (root mean square distance) Superpose the NMR models Calculate RMSD of local regions and also whole models Regions with high RMSD are less well defined by the data
Structural data is stored in the Protein Data Bank (PDB) http://www.pdb.org Protein Data Bank (PDB)
©CMBI 2009 ©CMBI 2009 Protein Data Bank (PDB)  Databank for  3-dimensional structures  of biomolecules: Protein DNA RNA Ligands  Obligatory deposit of coordinates in the PDB before publication ~ 65 000  entries (April 2010) ( ~27000 “unique” structures) PDB file  is a keyword-organised flat-file ( 80 column) 1) human readable 2) every line starts with a keyword (3-6 letters) 3) platform independent
©CMBI 2009 PDB important records (1)  PDB nomenclature Filename= accession number=  PDB Code Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN.pdb) HEADER describes molecule & gives deposition date HEADER  PLANT SEED PROTEIN  30-APR-81  1CRN   CMPND name of molecule COMPND  CRAMBIN   SOURCE organism SOURCE  ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED
©CMBI 2009 PDB important records (2)  SEQRES Sequence of protein; be aware:  Not always all 3d-coordinates are present for all the amino acids in SEQRES!! SEQRES  1  46  THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE  1CRN  51 SEQRES  2  46  ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS  1CRN  52 SEQRES  3  46  ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR  1CRN  53 SEQRES  4  46  CYS PRO GLY ASP TYR ALA ASN  1CRN  54 SSBOND disulfide bridges SSBOND  1 CYS  3  CYS  40  SSBOND  2 CYS  4  CYS  32
©CMBI 2009 PDB important records (3) and at the end of the PDB file the “real”  data: ATOM one line for each atom with its unique name and its x,y,z coordinates ATOM  1  N  THR  1  17.047  14.099  3.625  1.00 13.79  1CRN  70 ATOM  2  CA  THR  1  16.967  12.784  4.338  1.00 10.80  1CRN  71 ATOM  3  C  THR  1  15.685  12.755  5.133  1.00  9.19  1CRN  72 ATOM  4  O  THR  1  15.268  13.825  5.594  1.00  9.85  1CRN  73 ATOM  5  CB  THR  1  18.170  12.703  5.337  1.00 13.02  1CRN  74 ATOM  6  OG1 THR  1  19.334  12.829  4.463  1.00 15.06  1CRN  75 ATOM  7  CG2 THR  1  18.150  11.546  6.304  1.00 14.23  1CRN  76 ATOM  8  N  THR  2  15.115  11.555  5.265  1.00  7.81  1CRN  77 ATOM  9  CA  THR  2  13.856  11.469  6.066  1.00  8.31  1CRN  78 ATOM  10  C  THR  2  14.164  10.785  7.379  1.00  5.80  1CRN  79 ATOM  11  O  THR  2  14.993  9.862  7.443  1.00  6.94  1CRN  80
PDB entry
PDB entry
PDB entry
©CMBI 2009 Structure Visualization  Structures from PDB can be visualized with: YASARA (http://www.yasara.org) SwissPDBViewer (http://spdbv.vital-it.ch/) PyMOL (http://www.pymol/org) Chimera (http:// www.cgl.ucsf.edu/chimera  )
YASARA View nomenclature Atom Residue  =  any continuous stretch of atoms sharing the same residue name, residue number and molecule name Molecule  =  any continuous stretch of residues sharing the same molecule name (PDB calls this a CHAIN) Object  =  a collection of molecules and additional items
Standard atom colors C = cyan O = red N = blue H = white S = green
Atom nomenclature N-term C-term C α C β C γ O γ N N O C α C β C C C γ C δ 1 C δ 2 OT1 OT2
FoldX: a molecular design toolkit Predict the effect of point mutation on the protein stability Predict the 3D structure of a sequence: homology modeling
Predict effect of point mutation FoldX is an empirical force field It is validated with calorimetric experiments E.g. If such an experiment concludes that breaking a hydrogen bond costs 1.5 kcal/mol, FoldX uses this knowledge rather than using theoretical physics equations FoldX compares WT and mutant for: Hydrogen bonds, electrostatics, Van der Waals clashes and contacts, entropy, desolvation, ... FoldX energies Energy of a single molecule is meaningless The difference in energy of two molecules (such as WT and a point mutant) approaches realistic values
Predict effect of point mutation FoldX calculates the stability of WT and MT and makes the difference (net effect of mutation): Δ G MT - Δ G WT  =  ΔΔ G m u t at i o n If  ΔΔ G m u t at i o n > 0 : mutation is bad for stability < 0 : mutation is good for stability FoldX error margin is 0.5 kcal/mol, so changes within this margin are meaningless
Introduction to homology modeling Goal:  predict a structure from its sequence with an accuracy that is comparable to the best results achieved experimentally (X-Ray) Protein modeling is the only way to obtain structural information when experimental techniques (x-ray, NMR, EM) fail
Homology Modeling
Principles of Homology Modeling Search for a sequence with a known structure that is very similar to the sequence with the unknown structure. Build model using known structure as template The structure of a protein is uniquely determined by its amino acid sequence Structure is more conserved than sequence Similar sequences adopt nearly exact same structure Distantly related sequences can still fold into a similar structure
Sequence similarity rule Rost (1999) modeled lots of structures and compared them to the real ones in the PDB Derived precise limits for homology modeling This rule tells you whether a model will be reliable or unreliable
FoldX plugin for YASARA
Acknowledgements Gert Vriend, Radboud Universiteit Nijmegen, NL ( www.cmbi.ru.nl ) Sander Nabuurs, Lead Pharma, Nijmegen, NL Greet De Baets, VIB Switch Laboratory

Bits protein structure

  • 1.
    BITS Training ProteinStructure Joost Van Durme VIB Switch Laboratory Vrije Universiteit Brussel http://www.bits.vib.be/training
  • 2.
    Topics for todayExploring the protein structure databank (PDB) Viewing and analyzing protein structures with YASARA Comparing similar protein structures In silico mutagenesis with FoldX Homology modeling with FoldX
  • 3.
    PDB contains 65000structures EMBL-Bank contains 114,475,051 sequences or 215,540,553,360 nucleotides! Sequences and structures
  • 4.
  • 5.
    X-ray crystallography (crystals)Nuclear Magnetic Resonance (NMR) (in solution) Electron microscopy (in native tissue) Structures can be solved
  • 6.
    Solving structures islots of work (6 months to years) Need lots of material/reagents. Solubility is a problem. Some protein structures are really difficult to solve: membrane proteins, extremely large proteins, protein complexes The field evolves fast. Techniques improve, more user friendly software, more automatisation (x-ray infrastructure, crystal growth) Despite this progress it is not expected that the sequence-structure gap will ever be closed. But ...
  • 7.
    What can welearn from models/structures? Active site structure: structure-based drug design Protein-protein interactions Function Antigenic behavior / vaccine development Stabilising proteins using structural knowledge
  • 8.
    Human vs parasiteParasite Active site
  • 9.
    1918 Influenza EpidemicInfluenza Virus
  • 10.
  • 11.
  • 12.
  • 13.
    Trouw – 3maart 2009
  • 14.
    RELENZA TAMIFLU WTK i = 1.0 H274Y K i = 1.9 WT K i = 1.0 H274Y K i =265 H274Y H274Y
  • 15.
    PDB structures comefrom ... X-Ray crystallography experiments NMR structure determination The PDB no longer contains: EM structures (too low resolution) Models (too unreliable)
  • 16.
    Principle of X-Raycrystallography initial model electron densities
  • 17.
  • 18.
    X-Ray models componentsx, y, z coordinates: define the mean atom position disorder about this mean: B-factor and occupancy variations in time and space B-factor: model the ‘smearing out’ of disorder around the mean atom position (ellipsoids) higher B-factor means more uncertainty about position Occupancy: consider alternative conformations of the same sidechain how often do we find this sidechain in one conformation and how often in the other conformation
  • 19.
    Occupancy ATOM 625C ILE A 77 -11.322 28.374 -1.179 1.00 28.77 C ATOM 626 O ILE A 77 -11.946 29.453 -1.112 1.00 28.84 O ATOM 627 CA AILE A 77 -11.432 27.329 -0.087 0.70 28.15 C ATOM 628 CB AILE A 77 -12.918 26.874 0.087 0.70 28.64 C ATOM 629 CG1AILE A 77 -13.042 25.758 1.141 0.70 26.75 C ATOM 630 CG2AILE A 77 -13.516 26.421 -1.241 0.70 28.13 C ATOM 631 CD1AILE A 77 -13.378 26.302 2.501 0.70 26.47 C ATOM 632 CA BILE A 77 -11.423 27.327 -0.082 0.30 28.50 C ATOM 633 CB BILE A 77 -12.874 26.775 0.117 0.30 28.79 C ATOM 634 CG1BILE A 77 -13.519 26.423 -1.227 0.30 28.62 C ATOM 635 CG2BILE A 77 -13.748 27.739 0.916 0.30 28.40 C ATOM 636 CD1BILE A 77 -14.720 25.518 -1.100 0.30 28.69 C ATOM 637 N ARG A 78 -10.521 28.048 -2.183 1.00 28.70 N ATOM 638 CA ARG A 78 -10.258 28.952 -3.268 1.00 28.47 C ATOM 639 C ARG A 78 -10.857 28.469 -4.584 1.00 28.22 C 2VWC
  • 20.
    Atomic B-factors Valuewhich determines the precision of an atom’s given position Atoms with the largest B-factors will have the largest positional uncertainty Indication of mobility of an atom 0 < B < 20 : Atom is most likely OK 20 < B < 40 : Atom is probably OK, but positional errors up to 0.5 Ångstrom are normal 40 < B < 60 : Atom is probably reasonably OK, but be careful, because positional errors up to 1.0 Ångstrom can be observed B > 60 : Atom is not likely to be within 1.0 Ångstrom from where you see it B around 100 : Atom is guaranteed not within 1.0 Ångstrom from where you see it
  • 21.
  • 22.
    Resolution (Angstrom) Levelof detail that can be observed in the electron density map The greater the disorder in the crystal, the lower the resolution (proportional to the protein size) 3.0A 2.0A 1.2A
  • 23.
    R-factor The differencebetween the observed and computed diffraction pattern A measure of how well the refined structure predicts the observed data Higher values mean less agreement 0.40-0.60: very unreliable 0.20 seems to be the standard threshold electron density map
  • 24.
  • 25.
    NMR models componentsIn solution study protein dynamics solve protein structures that are difficult to crystallize Nuclear Overhauser Effect or NOE intensities of signal peaks correspond to short inter-atomic distances between spatially close protons (NOE distances) NOE constraints are known with low precision. E.g. NOEs are binned 2.5-4.0, 4.0-5.5, and 5.5-7.0 Angstrom Multiple models are generated that are consistent with the distance and angle constraints using e.g. molecular dynamics: the NMR ensemble Take average or best model from ensemble for PDB deposition, or just deposit a selected ensemble of superposed structures
  • 26.
    Structure superposition (root mean square distance) n = number of atoms di = distance between 2 corresponding atoms i in 2 structures The more atoms superpose on each other, the lower the RMSD Unit of RMSD => Ångstroms identical structures => RMSD = “0” similar structures => RMSD is small (1 – 3 Å) distant structures => RMSD > 3 Å However, care has to be taken as RMSD is length dependent and dominated by outliers: comparison of two short peptide structures can result in a small RMSD even if their structure is visibly different. very similar structures can have a bad RMSD due to a short part of the structures that is very different (loops) Insertions and deletions are not implemented in the RMSD calculation, since we only look at equivalent atoms/residues (see figure)
  • 27.
    NMR ensemble RMSD (root mean square distance) Superpose the NMR models Calculate RMSD of local regions and also whole models Regions with high RMSD are less well defined by the data
  • 28.
    Structural data isstored in the Protein Data Bank (PDB) http://www.pdb.org Protein Data Bank (PDB)
  • 29.
    ©CMBI 2009 ©CMBI2009 Protein Data Bank (PDB) Databank for 3-dimensional structures of biomolecules: Protein DNA RNA Ligands Obligatory deposit of coordinates in the PDB before publication ~ 65 000 entries (April 2010) ( ~27000 “unique” structures) PDB file is a keyword-organised flat-file ( 80 column) 1) human readable 2) every line starts with a keyword (3-6 letters) 3) platform independent
  • 30.
    ©CMBI 2009 PDBimportant records (1) PDB nomenclature Filename= accession number= PDB Code Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN.pdb) HEADER describes molecule & gives deposition date HEADER PLANT SEED PROTEIN 30-APR-81 1CRN CMPND name of molecule COMPND CRAMBIN SOURCE organism SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED
  • 31.
    ©CMBI 2009 PDBimportant records (2) SEQRES Sequence of protein; be aware: Not always all 3d-coordinates are present for all the amino acids in SEQRES!! SEQRES 1 46 THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE 1CRN 51 SEQRES 2 46 ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS 1CRN 52 SEQRES 3 46 ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR 1CRN 53 SEQRES 4 46 CYS PRO GLY ASP TYR ALA ASN 1CRN 54 SSBOND disulfide bridges SSBOND 1 CYS 3 CYS 40 SSBOND 2 CYS 4 CYS 32
  • 32.
    ©CMBI 2009 PDBimportant records (3) and at the end of the PDB file the “real” data: ATOM one line for each atom with its unique name and its x,y,z coordinates ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70 ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71 ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72 ATOM 4 O THR 1 15.268 13.825 5.594 1.00 9.85 1CRN 73 ATOM 5 CB THR 1 18.170 12.703 5.337 1.00 13.02 1CRN 74 ATOM 6 OG1 THR 1 19.334 12.829 4.463 1.00 15.06 1CRN 75 ATOM 7 CG2 THR 1 18.150 11.546 6.304 1.00 14.23 1CRN 76 ATOM 8 N THR 2 15.115 11.555 5.265 1.00 7.81 1CRN 77 ATOM 9 CA THR 2 13.856 11.469 6.066 1.00 8.31 1CRN 78 ATOM 10 C THR 2 14.164 10.785 7.379 1.00 5.80 1CRN 79 ATOM 11 O THR 2 14.993 9.862 7.443 1.00 6.94 1CRN 80
  • 33.
  • 34.
  • 35.
  • 36.
    ©CMBI 2009 StructureVisualization Structures from PDB can be visualized with: YASARA (http://www.yasara.org) SwissPDBViewer (http://spdbv.vital-it.ch/) PyMOL (http://www.pymol/org) Chimera (http:// www.cgl.ucsf.edu/chimera )
  • 37.
    YASARA View nomenclatureAtom Residue = any continuous stretch of atoms sharing the same residue name, residue number and molecule name Molecule = any continuous stretch of residues sharing the same molecule name (PDB calls this a CHAIN) Object = a collection of molecules and additional items
  • 38.
    Standard atom colorsC = cyan O = red N = blue H = white S = green
  • 39.
    Atom nomenclature N-termC-term C α C β C γ O γ N N O C α C β C C C γ C δ 1 C δ 2 OT1 OT2
  • 40.
    FoldX: a moleculardesign toolkit Predict the effect of point mutation on the protein stability Predict the 3D structure of a sequence: homology modeling
  • 41.
    Predict effect ofpoint mutation FoldX is an empirical force field It is validated with calorimetric experiments E.g. If such an experiment concludes that breaking a hydrogen bond costs 1.5 kcal/mol, FoldX uses this knowledge rather than using theoretical physics equations FoldX compares WT and mutant for: Hydrogen bonds, electrostatics, Van der Waals clashes and contacts, entropy, desolvation, ... FoldX energies Energy of a single molecule is meaningless The difference in energy of two molecules (such as WT and a point mutant) approaches realistic values
  • 42.
    Predict effect ofpoint mutation FoldX calculates the stability of WT and MT and makes the difference (net effect of mutation): Δ G MT - Δ G WT = ΔΔ G m u t at i o n If ΔΔ G m u t at i o n > 0 : mutation is bad for stability < 0 : mutation is good for stability FoldX error margin is 0.5 kcal/mol, so changes within this margin are meaningless
  • 43.
    Introduction to homologymodeling Goal: predict a structure from its sequence with an accuracy that is comparable to the best results achieved experimentally (X-Ray) Protein modeling is the only way to obtain structural information when experimental techniques (x-ray, NMR, EM) fail
  • 44.
  • 45.
    Principles of HomologyModeling Search for a sequence with a known structure that is very similar to the sequence with the unknown structure. Build model using known structure as template The structure of a protein is uniquely determined by its amino acid sequence Structure is more conserved than sequence Similar sequences adopt nearly exact same structure Distantly related sequences can still fold into a similar structure
  • 46.
    Sequence similarity ruleRost (1999) modeled lots of structures and compared them to the real ones in the PDB Derived precise limits for homology modeling This rule tells you whether a model will be reliable or unreliable
  • 47.
  • 48.
    Acknowledgements Gert Vriend,Radboud Universiteit Nijmegen, NL ( www.cmbi.ru.nl ) Sander Nabuurs, Lead Pharma, Nijmegen, NL Greet De Baets, VIB Switch Laboratory

Editor's Notes

  • #9 Also drugs against invaders: bacteria, viruses.... Trypanosoma (sleeping disease): eukaryotic lifeform. So many proteins, just like us. So if we make a medicine that eg knocks out the glycolysis of trypanosomas, we should be very careful that we don’t knock out the glycolysis of the host. So we have to study differences in proteins between tryp and host. In picture you see active site of trypanosomas. Then we can design a ligand that blocks the trypanosomas protein active site but not that of the human.
  • #11 Neuraminidase inhibitors Neuraminidase helps to release the viral particle from the infected cell so it can infect another cell. It breaks sialic acid bonds in the membrane so that a membrane vesicle with virus inside can travel to another cell. So the neuraminidase binds sialic acid.
  • #12 Smart as we are, we made molecules that look like sialic acid (relenza and tamiflu) to neutralize the neuraminidase and prevent the virus from being mobile.
  • #14 But at some point the tamiflu lost its action against neuraminidase.
  • #15 On binding tamiflu, the conformation of the Glu 277 side chain of the wt enzyme is altered such that it exposes a hydrophobic site with which the pentyloxy group of oseltamivir interacts. In the mutant enzyme, the bulkier Tyr residue at 275 displaces the carboxyl group of Glu 277 into the binding site, such that it disrupts the hydrophobic pocket and causes a change in conformation of the pentyloxy substituent of oseltamivir (Fig. 1a), with consequent reduction in affinity of binding of some 300-fold or greater. Luckily for us, relenza’s side group isn’t that big and still fits in the pocket of the mutant neuraminidase. The question is, what will we do if the virus also beats relenza?
  • #19 Variations in time: x-ray experiment takes seconds. Motions happen in femtoseconds. So motions can be the source of uncertainty. In space: different molecules in the crystal have a different conformation of a sidechain
  • #23 The resolution for a given crystal depends on the ordering of the molecules in the crystal. That is, how close the unit cells thoughout the crystal are to being identical copies of one another. Rule: the larger the molecule, the lower the resolution of the data. At low resolution (4A): overall shape of molecule, no interactions! 3A: path of chain can be traced 2A: sidechains become visible 1.2-0.9A: hydrogen atoms become visible, occupancies easily detectable These structures require fewer geometric constraints during refinement and give a better indication of the true geometry of protein structures.
  • #24 X-rays are scattered through the crystal generating a diffraction pattern from which an initial model of the structure is generated based on the electron density map. This model can also be a homology model of the new structure based on a close homolog of which we know it will adopt more or less the same structure. Using scattering theory it is possible to calculate computationally the expected diffraction pattern. Usually this will differ from the observed one. Refinement involves iteratively modifying the model untill the computed diffraction pattern has a best fit with the observed diffraction pattern. Ferredoxin was incorrectly solved due to wrong space group assignment, but all the water molecules made up for it. 344 waters on 117 residues. Refinable parameters The structural model describes a collection of scattering centres (atoms), each located at a fixed position in the crystal lattice, and with some degree of mobility or extension around that locus. In adjusting the structural model to improve the fit between calculated and observed diffraction patterns, the crystallographer may vary these and other parameters. Refinable parameters are those that may be varied in order to improve the fit. Usually they comprise atomic coordinates, atomic displacement parameters, a scale factor to bring the observed and calculated amplitudes or intensities to the same scale. They may also include extinction parameters, occupancy factors, twin component fractions, and even the assigned space group. Relations between the refinable parameters may be expressed as constraints or restraints that modify the function to be minimized.
  • #28 Deposition procedure: an experimentalist choosing the best 20 structures from a much larger ensemble can result in very misleading statistics! eg. the best 20 models may be the best solution with only small variations, so RMSD will be small but further down the original list are alternative solutions, which are less consistent with the data but radically different!