Protein Structure Prediction

8,466 views

Published on

Published in: Education
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,466
On SlideShare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
601
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Protein Structure Prediction

  1. 1. Rohit Digitally signed by Rohit Jhawer DN: cn=Rohit Jhawer, o, ou, email=rohit_jhawer@hotmail. Jhawer com, c=IN Date: 2007.03.09 14:10:44 +05'30' Lecture 14: Protein Structure Prediction CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  2. 2. Review of Proteins • Proteins: polypeptides with a three dimensional structure • • Primary structure – sequence of amino acids constituting polypeptide chain • Secondary structure – local organization of polypeptide chain into secondary structures such as α helices and β sheets CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  3. 3. Review of Proteins • Tertiary structure –three dimensional arrangements of amino acids as they react to one another due to polarity and interactions between side chains • Quaternary structure – Interaction of several protein subunits CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  4. 4. Protein Structure • Proteins: chains of amino acids joined by peptide bonds • Amino Acids: – Polar (separate positive and negatively charged regions) – free C=O group (CARBOXYL), can act as hydrogen bond acceptor – free NH group (AMINYL), can act as hydrogen bond donor CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  5. 5. Protein Structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  6. 6. Protein Structure • Many confirmations possible due to the rotation around the Alpha-Carbon (Cα) atom • Confirmational changes lead to differences in three-dimensional structure of protein CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  7. 7. Protein Structure • Polypeptide chain has pattern of N-Cα-C repeated • Angle between aminyl group and Cα is PHI (φ) angle; angle between Cα and carboxyl group is PSI (ψ) angle CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  8. 8. Protein Structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  9. 9. Differences between A.A.’s • Difference between 20 amino acids is the R side chains • Amino acids can be separated based on the chemical properties of the side chains: – Hydrophobic – Charged – Polar CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  10. 10. Differences between A.A.’s • Hydrophobic: Alanine(A), Valine(V), phenylalanine (Y), Proline (P), Methionine (M), isoleucine (I), and Leucine(L) • Charged: Aspartic acid (D), Glutamic Acid (E), Lysine (K), Arginine (R) • Polar: Serine (S), Theronine (T), Tyrosine (Y); Histidine (H), Cysteine (C), Asparagine (N), Glutamine (Q), Tryptophan (W) • CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  11. 11. Secondary Structure • Image source: http://www.ebi.ac.uk/microarray/biology_intro.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  12. 12. Secondary Structures • Core of each protein made up of regular secondary structures • Regular patterns of hydrogen bonds are formed between neighboring amino acids • Amino acids in secondary structures have similar φ and ψ angles CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  13. 13. Secondary Structures • Structures act to neutralize the polar groups on each amino acid • Secondary structures tightly packed in protein core and a hydrophobic environment • Each amino acid side group has a limited space to occupy -- therefore a limited number of possible interactions CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  14. 14. Types of Secondary Structures • α Helices • β Sheets • Loops • Coils CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  15. 15. α Helix • Most abundant secondary structure • 3.6 amino acids per turn • Hydrogen bond formed between every fourth reside • Average length: 10 amino acids, or 3 turns • Varies from 5 to 40 amino acids Image source: http://www.hhmi.princeton.edu/sw/2002/psidelsk/scavengerhunt.htm; http://www4.ocn.ne.jp/~bio/biology/protein.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  16. 16. α Helix • Normally found on the surface of protein cores • Interact with aqueous environment – Inner facing side has hydrophobic amino acids – Outer-facing side has hydrophilic amino acids CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  17. 17. α Helix • Every third amino acid tends to be hydrophobic • Pattern can be detected computationally • Rich in alanine (A), gutamic acid (E), leucine (L), and methionine (M) • Poor in proline (P), glycine (G), tyrosine (Y), and serine (S) CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  18. 18. β Sheet Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html; http://www4.ocn.ne.jp/~bio/biology/protein.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  19. 19. β Sheet • Hydrogen bonds between 5-10 consecutive amino acids in one portion of the chain with another 5-10 farther down the chain • Interacting regions may be adjacent with a short loop, or far apart with other structures in between CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  20. 20. β Sheet • Directions: – Same: Parallel Sheet – Opposite: Anti-parallel Sheet – Mixed: Mixed Sheet • Pattern of hydrogen bond formation in parallel and anti-parallel sheets is different CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  21. 21. β Sheet • Slight counterclockwise rotation • Alpha carbons (as well as R side groups) alternate above and below the sheet • Prediction difficult, due to wide range of φ and ψ angles CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  22. 22. Interactions in Helices and Sheets CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  23. 23. Loop • Regions between α helices and β sheets • Various lengths and three-dimensional configurations • Located on surface of the structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  24. 24. Loop • Hairpin loops: complete turn in the polypeptide chain, (anti-parallel β sheets) • More variable sequence structure • Tend to have charged and polar amino acids • Frequently a component of active sites CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  25. 25. Coil • Region of secondary structure that is not a helix, sheet, or loop CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  26. 26. Secondary Structure • Image source: http://www.ebi.ac.uk/microarray/biology_intro.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  27. 27. 6 Classes of Protein Structure 1) Class α: bundles of α helices connected by loops on surface of proteins 2) Class β: antiparallel β sheets, usually two sheets in close contact forming sandwich 3) Class α/β: mainly parallel β sheets with intervening α helices; may also have mixed β sheets (metabolic enzymes) CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  28. 28. 6 Classes of Protein Structure 4) Class α+ β: mainly segregated α helices and antiparallel β sheets 5) Multidomain (α and β) proteins more than one of the above four domains 6) Membrane and cell-surface proteins and peptides excluding proteins of the immune system CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  29. 29. α Class Protein (hemoglobin) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=3hhb;page=;pid=&opt=show&size=250 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  30. 30. β Class Protein (T-Cell CD8) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1cd8;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  31. 31. α/ β Class Protein (tryptohan synthase) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=2wsy;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  32. 32. α+β Class Protein (1RNB) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1rnb;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  33. 33. Membrane Protein (10PF) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1opf;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  34. 34. Protein Structure Databases • Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques • Protein Databases: – PDB – SCOP – Swiss-Prot – PIR CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  35. 35. Protein Structure Databases • Most extensive for 3-D structure is the Protein Data Bank (PDB) • Current release of PDB (April 8, 2003) has 20,622 structures CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  36. 36. Partial PDB File ATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162 ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163 ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164 ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165 ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166 ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167 ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168 ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169 ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170 ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171 ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172 ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173 ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174 ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175 ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  37. 37. Description of PDB File • second column: amino acid position in the polypeptide chain • fourth column: current amino acid • Columns 7, 8, and 9: x, y, and z coordinates (in angstroms) • The 11th column: temperature factor -- can be used as a measurement of uncertainty CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  38. 38. Protein Structure Classification Databases • Structural Classification of proteins (SCOP) • based on expert definition of structural similarities • SCOP classifies by class, family, superfamily, and fold • http://scop.mrc-lmb.cam.ac.uk/scop/ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  39. 39. Protein Structure Classification Databases • Classification by class, architecture, topology, and homology (CATH) • Classifies proteins into hierarchical levels by class • a/B and a+B are considered to be a single class • http://www.biochem.ucl.ac.uk/bsm/cath/ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  40. 40. Protein Structure Classification Databases • Molecular Modeling Database (MMDB) • structures from PDB categorized into structurally related groups using the VAST • looks for similar arrangements of secondary structural elements • http://www.ncbi.nlm.nih.gov/Entrez CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  41. 41. Protein Structure Classification Databases • Spatial Arrangement of Backbone Fragments (SARF) • categorized on structural similarities, similar to the MMDB • http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  42. 42. Visualization of Proteins • A number of programs convert atomic coordinates of 3-d structures into views of the molecule • allow the user to manipulate the molecule by rotation, zooming, etc. • Critical in drug design -- yields insight into how the protein might interact with ligands at active sites CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  43. 43. Visualization of Proteins • Most popular program for viewing 3- dimensional structures is Rasmol Rasmol: http://www.umass.edu/microbio/rasmol/ Chime: http://www.umass.edu/microbio/chime/ Cn3D: http://www.ncbi.nlm.nih.gov/Structure/ Mage: http://kinemage.biochem.duke.edu/website/kinhome.html Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  44. 44. Alignment of Protein Structure • Three-dimensional structure of one protein compared against three-dimensional structure of second protein • Atoms fit together as closely as possible to minimize the average deviation • Structural similarity between proteins does not necessarily mean evolutionary relationship CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  45. 45. Alignment of Protein Structure • Positions of atoms in three-dimensional structures compared • Look for positions of secondary structural elements (helices and strands) within a protein domain CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  46. 46. Alignment of Protein Structure • Distances between carbon atoms examined to determine degree structures may be superimposed • Side chain information can be incorporated – Buried; visible CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  47. 47. SSAP • Secondary Structure Alignment Program • Incorporates double dynamic programming to produce a structural alignment between two proteins CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  48. 48. Steps in SSAP • 1) Calculate vectors from Cβ of one amino acid to set of nearby amino acids – Vectors from two separate proteins compared – Difference (expressed as an angle) calculated, and converted to score • 2) Matrix for scores of vector differences from one protein to the next is computed. CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  49. 49. Steps in SSAP • 3) Optimal alignment found using global dynamic programming, with a constant gap penalty • 4) Next amino acid residue considered, optimal path to align this amino acid to the second sequence computed CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  50. 50. Steps in SSAP • 5) Alignments transferred to summary matrix – If paths cross same matrix position, scores are summed – If part of alignment path found in both matrices, evidence of similarity CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  51. 51. Steps in SSAP • 6) Dynamic programming alignment is performed for the summary matrix – Final alignment represents optimal alignment between the protein structures – Resulting score converted so it can be compared to see how closely related two structures are CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  52. 52. Distance Matrix Approach • Uses graphical procedure similar to dot plots • Identifies atoms that lie most closely together in three-dimensional structure • Two sequences with similar structure can have dot plots superimposed CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  53. 53. Distance Matrix Approach • Values in distance matrix represent distance between the Cα atoms in the three dimensional structure • positions of closest packing atoms marked with a dot to highlight regions of interest • Similar groups superimposed as closely as possible by minimizing sum of atomic distances CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  54. 54. DALI • Distance Alignment Tool (DALI) • Uses distance matrix method to align protein structures • Assembly step uses Monte Carlo simulation to find submatrices that can be aligned • Existing structures that have been compared are organized into the FSSP database CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  55. 55. Fast Structural Similarity Search • Compare types and arrangements of secondary structures within two proteins • If elements similarly arranged, three- dimensional structures are similar • VAST and SARF are programs that use these fast methods CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  56. 56. Structural Motifs Based on Sequence Analysis • Some structural elements can be determined by looking at sequence composition – zinc finger motifs – leucine zippers – coiled-coil structures CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  57. 57. Zinc Finger Motifs • Found by looking at order and spacing of cysteine and histidine residues • Typical zinc finger motifs are composed of two cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/ by two histidines tanlab_gallery_protdna.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  58. 58. Leucine Zippers • Found by looking for two antiparallel alpha helices held together • Interactions between hydrophobic leucine residues found every seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  59. 59. Transmembrane Proteins • traverse back and forth through alpha helices • Typical length: 20-30 residues • Transmembrane alpha helices have hydrophobic residues on the inside facing portions, and hydrophilic residues on the outside Image source: http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  60. 60. Membrane Prediction Programs • PHDhtm: employs neural network approach; neural network trained to recognize sequence patterns and variations of helices in transmembrane proteins of known structures • Tmpred: functions by searching a protein against a sequence scoring matrix obtained by aligning the sequences of all known transmembrane alpha helix regions CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  61. 61. Distance Matrix Approach • Uses graphical procedure similar to dot plots • Identifies atoms that lie most closely together in three-dimensional structure • Two sequences with similar structure can have dot plots superimposed CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  62. 62. Distance Matrix Approach • Values in distance matrix represent distance between the Cα atoms in the three dimensional structure • positions of closest packing atoms marked with a dot to highlight regions of interest • Similar groups superimposed as closely as possible by minimizing sum of atomic distances CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  63. 63. DALI • Distance Alignment Tool (DALI) • Uses distance matrix method to align protein structures • Assembly step uses Monte Carlo simulation to find sub-matrices that can be aligned • Existing structures that have been compared are organized into the FSSP database CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  64. 64. Fast Structural Similarity Search • Compare types and arrangements of secondary structures within two proteins • If elements similarly arranged, three- dimensional structures are similar • VAST and SARF are programs that use these fast methods CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  65. 65. Structural Motifs Based on Sequence Analysis • Some structural elements can be determined by looking at sequence composition – zinc finger motifs – leucine zippers – coiled-coil structures CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  66. 66. Zinc Finger Motifs • Found by looking at order and spacing of cysteine and histidine residues • Typical zinc finger motifs are composed of two cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/ by two histidines tanlab_gallery_protdna.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  67. 67. Leucine Zippers • Found by looking for two antiparallel alpha helices held together • Interactions between hydrophobic leucine residues found every seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  68. 68. Transmembrane Proteins • traverse back and forth through alpha helices • Typical length: 20-30 residues • Transmembrane alpha helices have hydrophobic residues on the inside facing portions, and hydrophilic residues on the outside Image source: http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  69. 69. Membrane Prediction Programs • PHDhtm: employs neural network approach; neural network trained to recognize sequence patterns and variations of helices in transmembrane proteins of known structures • Tmpred: functions by searching a protein against a sequence scoring matrix obtained by aligning the sequences of all known transmembrane alpha helix regions CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  70. 70. Chou-Fasman Method • based on analyzing frequency of amino acids in different secondary structures – A, E, L, and M strong predictors of alpha helices – P and G are predictors in the break of a helix • Table of predictive values created for alpha helices, beta sheets, and loops • Structure with greatest overall prediction value greater than 1 used to determine the structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  71. 71. GOR Method • Improves upon the Chou-Fasman method • Assumes amino acids surrounding the central amino acid influence secondary structure central amino acid is likely to adopt • Scoring matrices used in GOR method, incorporates information theory and Bayesian statistics • Mount, p450-451 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  72. 72. Neural Network Models • Programs trained to recognize amino acid patterns located in known secondary structures • distinguish these patterns from patterns not located in structures • PHD and NNPREDICT use neural networks CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  73. 73. Nearest-neighbor • machine learning method • secondary structure confirmation of an amino acid calculated by identifying sequences of known structures similar to the query by looking at the surrounding amino acids • Nearest-neighbor programs include include PSSP, Simpa96, SOPM, and SOPMA CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  74. 74. Prediction of 3d Structures • Threading is most Robust technique • Time consuming • Requires knowledge of protein structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  75. 75. Threading • Searches for structures with similar folds without sequence similarity • Threading takes a sequence with unknown structure and threads it through the coordinates of a target protein whose structure has been solved – X-ray crystallography – NMR imaging CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  76. 76. Threading • Considered position by position subject to predetermined constraints • Thermodynamic calculations made to determine most energetically favorable and confirmationally stable alignment CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  77. 77. Environmental Template • Environment of each amino acid in each known structural core is determined – secondary structure – area of side chain buried by closeness to other atoms – types of nearby side chains CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  78. 78. Environmental Template • Each position classified into one of 18 types – 6 representing increasing levels of residue burial – three classes of secondary structure (alpha helices, beta sheets, and loops). CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  79. 79. Upcoming Seminars • Topic TBA – Rafael Irizarry, Johns Hopkins University • Friday, 4/23/2004 • 8:30 AM – 9:30 AM • LOCATION: K-Building Room 2036 (HSC Campus) CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  80. 80. Presentations • 4:45 – 5:00 Richard Jones • 5:00 – 5:15 Steven Xu • 5:15 – 5:30 Olutola Iyun • 5:30 – 5:45 Frank Baker • 5:45 – 6:00 Guanghui Lan • 6:00 – 6:15 Tim Hardin • 6:15 – 6:30 Satish Bollimpalli & Ravi Gundlapalli CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

×