Rohit
                                                                                                     Digitally signed by Rohit Jhawer
                                                                                                     DN: cn=Rohit Jhawer, o, ou,
                                                                                                     email=rohit_jhawer@hotmail.


                                                                                  Jhawer
                                                                                                     com, c=IN
                                                                                                     Date: 2007.03.09 14:10:44
                                                                                                     +05'30'




                             Lecture 14:
    Protein Structure Prediction



CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Review of Proteins
• Proteins: polypeptides with a three
  dimensional structure
•
• Primary structure – sequence of amino
  acids constituting polypeptide chain

• Secondary structure – local organization of
  polypeptide chain into secondary structures
  such as α helices and β sheets

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Review of Proteins
• Tertiary structure –three dimensional
  arrangements of amino acids as they react to
  one another due to polarity and interactions
  between side chains

• Quaternary structure – Interaction of several
  protein subunits



   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure
• Proteins: chains of amino acids joined by
  peptide bonds

• Amino Acids:
  – Polar (separate positive and negatively charged
    regions)
  – free C=O group (CARBOXYL), can act as
    hydrogen bond acceptor
  – free NH group (AMINYL), can act as hydrogen
    bond donor


   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure




CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure
• Many confirmations possible due to the
  rotation around the Alpha-Carbon (Cα)
  atom

• Confirmational changes lead to
  differences in three-dimensional
  structure of protein


   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure
• Polypeptide chain has pattern of N-Cα-C
  repeated

• Angle between aminyl group and Cα is
  PHI (φ) angle; angle between Cα and
  carboxyl group is PSI (ψ) angle



   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure




CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Differences between A.A.’s
• Difference between 20 amino acids is the R
  side chains

• Amino acids can be separated based on the
  chemical properties of the side chains:
  – Hydrophobic
  – Charged
  – Polar



   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Differences between A.A.’s
• Hydrophobic: Alanine(A), Valine(V),
  phenylalanine (Y), Proline (P), Methionine
  (M), isoleucine (I), and Leucine(L)

• Charged: Aspartic acid (D), Glutamic Acid
  (E), Lysine (K), Arginine (R)

• Polar: Serine (S), Theronine (T), Tyrosine (Y);
  Histidine (H), Cysteine (C), Asparagine (N),
  Glutamine (Q), Tryptophan (W)
•
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Secondary Structure




•   Image source: http://www.ebi.ac.uk/microarray/biology_intro.html
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Secondary Structures
• Core of each protein made up of regular
  secondary structures

• Regular patterns of hydrogen bonds are
  formed between neighboring amino acids

• Amino acids in secondary structures have
  similar φ and ψ angles


   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Secondary Structures
• Structures act to neutralize the polar groups
  on each amino acid

• Secondary structures tightly packed in protein
  core and a hydrophobic environment

• Each amino acid side group has a limited
  space to occupy -- therefore a limited number
  of possible interactions

    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Types of Secondary
                   Structures
•   α Helices
•   β Sheets
•   Loops
•   Coils




     CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
α Helix
                             • Most abundant secondary
                               structure

                             • 3.6 amino acids per turn

                             • Hydrogen bond formed
                               between every fourth reside

                             • Average length: 10 amino
                               acids, or 3 turns

                             • Varies from 5 to 40 amino acids

Image source: http://www.hhmi.princeton.edu/sw/2002/psidelsk/scavengerhunt.htm; http://www4.ocn.ne.jp/~bio/biology/protein.htm
              CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
α Helix
• Normally found on the surface of protein
  cores

• Interact with aqueous environment
  – Inner facing side has hydrophobic amino
    acids
  – Outer-facing side has hydrophilic amino
    acids

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
α Helix
• Every third amino acid tends to be
  hydrophobic

• Pattern can be detected computationally

• Rich in alanine (A), gutamic acid (E), leucine
  (L), and methionine (M)

• Poor in proline (P), glycine (G), tyrosine (Y),
  and serine (S)
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
β Sheet




     Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html;
                    http://www4.ocn.ne.jp/~bio/biology/protein.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
β Sheet
• Hydrogen bonds between 5-10
  consecutive amino acids in one portion
  of the chain with another 5-10 farther
  down the chain

• Interacting regions may be adjacent
  with a short loop, or far apart with other
  structures in between

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
β Sheet
• Directions:
  – Same: Parallel Sheet
  – Opposite: Anti-parallel Sheet
  – Mixed: Mixed Sheet

• Pattern of hydrogen bond formation in
  parallel and anti-parallel sheets is
  different

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
β Sheet
• Slight counterclockwise rotation

• Alpha carbons (as well as R side
  groups) alternate above and below the
  sheet

• Prediction difficult, due to wide range of
  φ and ψ angles

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Interactions in Helices and
          Sheets




CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Loop
• Regions between α helices and β
  sheets

• Various lengths and three-dimensional
  configurations

• Located on surface of the structure

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Loop
• Hairpin loops: complete turn in the
  polypeptide chain, (anti-parallel β sheets)

• More variable sequence structure

• Tend to have charged and polar amino acids

• Frequently a component of active sites

    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Coil
• Region of secondary structure that is
  not a helix, sheet, or loop




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Secondary Structure




•   Image source: http://www.ebi.ac.uk/microarray/biology_intro.html
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
6 Classes of Protein Structure
1) Class α: bundles of α helices connected by
  loops on surface of proteins

2) Class β: antiparallel β sheets, usually two
  sheets in close contact forming sandwich

3) Class α/β: mainly parallel β sheets with
  intervening α helices; may also have mixed β
  sheets (metabolic enzymes)

    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
6 Classes of Protein Structure
4) Class α+ β: mainly segregated α helices and
   antiparallel β sheets

5) Multidomain (α and β) proteins more than
   one of the above four domains

6) Membrane and cell-surface proteins and
   peptides excluding proteins of the immune
   system

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
α Class Protein (hemoglobin)




•   http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=3hhb;page=;pid=&opt=show&size=250


       CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
β Class Protein (T-Cell CD8)




•   http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1cd8;page=;pid=&opt=show&size=500


       CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
α/ β Class Protein
                (tryptohan synthase)




•   http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=2wsy;page=;pid=&opt=show&size=500


       CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
α+β Class Protein
                         (1RNB)




•   http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1rnb;page=;pid=&opt=show&size=500


       CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Membrane Protein (10PF)




•   http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1opf;page=;pid=&opt=show&size=500


       CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure Databases
• Databases of three dimensional structures of
  proteins, where structure has been solved
  using X-ray crystallography or nuclear
  magnetic resonance (NMR) techniques

• Protein Databases:
  –    PDB
  –    SCOP
  –    Swiss-Prot
  –    PIR

      CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure Databases
• Most extensive for 3-D structure is the
  Protein Data Bank (PDB)

• Current release of PDB (April 8, 2003)
  has 20,622 structures




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Partial PDB File
ATOM    1   N     VAL    A     1            6.452        16.459         4.843       7.00     47.38           3HHB   162
ATOM    2   CA    VAL    A     1            7.060        17.792         4.760       6.00     48.47           3HHB   163
ATOM    3   C     VAL    A     1            8.561        17.703         5.038       6.00     37.13           3HHB   164
ATOM    4   O     VAL    A     1            8.992        17.182         6.072       8.00     36.25           3HHB   165
ATOM    5   CB    VAL    A     1            6.342        18.738         5.727       6.00     55.13           3HHB   166
ATOM    6   CG1   VAL    A     1            7.114        20.033         5.993       6.00     54.30           3HHB   167
ATOM    7   CG2   VAL    A     1            4.924        19.032         5.232       6.00     64.75           3HHB   168
ATOM    8   N     LEU    A     2            9.333        18.209         4.095       7.00     30.18           3HHB   169
ATOM    9   CA    LEU    A     2           10.785        18.159         4.237       6.00     35.60           3HHB   170
ATOM   10   C     LEU    A     2           11.247        19.305         5.133       6.00     35.47           3HHB   171
ATOM   11   O     LEU    A     2           11.017        20.477         4.819       8.00     37.64           3HHB   172
ATOM   12   CB    LEU    A     2           11.451        18.286         2.866       6.00     35.22           3HHB   173
ATOM   13   CG    LEU    A     2           11.081        17.137         1.927       6.00     31.04           3HHB   174
ATOM   14   CD1   LEU    A     2           11.766        17.306          .570       6.00     39.08           3HHB   175
ATOM   15   CD2   LEU    A     2           11.427        15.778         2.539       6.00     38.96           3HHB   176




        CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Description of PDB File
• second column: amino acid position in the
  polypeptide chain

• fourth column: current amino acid

• Columns 7, 8, and 9: x, y, and z coordinates
  (in angstroms)

• The 11th column: temperature factor -- can be
  used as a measurement of uncertainty
   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure
     Classification Databases
• Structural Classification of proteins
  (SCOP)

• based on expert definition of structural
  similarities

• SCOP classifies by class, family, superfamily,
  and fold

• http://scop.mrc-lmb.cam.ac.uk/scop/
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure
     Classification Databases
• Classification by class, architecture,
  topology, and homology (CATH)

• Classifies proteins into hierarchical levels by
  class

• a/B and a+B are considered to be a single
  class

• http://www.biochem.ucl.ac.uk/bsm/cath/
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure
    Classification Databases
• Molecular Modeling Database (MMDB)

• structures from PDB categorized into
  structurally related groups using the VAST

• looks for similar arrangements of secondary
  structural elements

• http://www.ncbi.nlm.nih.gov/Entrez

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Protein Structure
     Classification Databases
• Spatial Arrangement of Backbone
  Fragments (SARF)

• categorized on structural similarities,
  similar to the MMDB

• http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html


    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Visualization of Proteins
• A number of programs convert atomic
  coordinates of 3-d structures into views of the
  molecule

• allow the user to manipulate the molecule by
  rotation, zooming, etc.

• Critical in drug design -- yields insight into
  how the protein might interact with ligands at
  active sites
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Visualization of Proteins
• Most popular program for viewing 3-
  dimensional structures is Rasmol

Rasmol: http://www.umass.edu/microbio/rasmol/
Chime: http://www.umass.edu/microbio/chime/
Cn3D: http://www.ncbi.nlm.nih.gov/Structure/
Mage: http://kinemage.biochem.duke.edu/website/kinhome.html
Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html




     CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure
• Three-dimensional structure of one protein
  compared against three-dimensional
  structure of second protein

• Atoms fit together as closely as possible to
  minimize the average deviation

• Structural similarity between proteins does
  not necessarily mean evolutionary
  relationship
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure
• Positions of atoms in three-dimensional
  structures compared

• Look for positions of secondary
  structural elements (helices and
  strands) within a protein domain



   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure
• Distances between carbon atoms
  examined to determine degree
  structures may be superimposed

• Side chain information can be
  incorporated
  – Buried; visible


   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
SSAP
• Secondary Structure Alignment
  Program

• Incorporates double dynamic
  programming to produce a structural
  alignment between two proteins



   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 1)    Calculate vectors from Cβ of one amino
  acid to set of nearby amino acids
  – Vectors from two separate proteins compared
  – Difference (expressed as an angle) calculated,
    and converted to score


• 2)   Matrix for scores of vector differences
  from one protein to the next is computed.


    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 3) Optimal alignment found using
  global dynamic programming, with a
  constant gap penalty

• 4) Next amino acid residue
  considered, optimal path to align this
  amino acid to the second sequence
  computed

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 5) Alignments transferred to
  summary matrix
  – If paths cross same matrix position, scores
    are summed
  – If part of alignment path found in both
    matrices, evidence of similarity




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 6) Dynamic programming alignment
  is performed for the summary matrix
  – Final alignment represents optimal
    alignment between the protein structures
  – Resulting score converted so it can be
    compared to see how closely related two
    structures are



   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Uses graphical procedure similar to dot
  plots

• Identifies atoms that lie most closely
  together in three-dimensional structure

• Two sequences with similar structure
  can have dot plots superimposed

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Values in distance matrix represent distance
  between the Cα atoms in the three
  dimensional structure

• positions of closest packing atoms marked
  with a dot to highlight regions of interest

• Similar groups superimposed as closely as
  possible by minimizing sum of atomic
  distances
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
DALI
• Distance Alignment Tool (DALI)

• Uses distance matrix method to align protein
  structures

• Assembly step uses Monte Carlo simulation
  to find submatrices that can be aligned

• Existing structures that have been compared
  are organized into the FSSP database
   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Fast Structural Similarity
              Search
• Compare types and arrangements of
  secondary structures within two proteins

• If elements similarly arranged, three-
  dimensional structures are similar

• VAST and SARF are programs that use
  these fast methods

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Structural Motifs Based on
      Sequence Analysis
• Some structural elements can be
  determined by looking at sequence
  composition
  – zinc finger motifs
  – leucine zippers
  – coiled-coil structures




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Zinc Finger Motifs
• Found by looking at
  order and spacing of
  cysteine and
  histidine residues

• Typical zinc finger
  motifs are
  composed of two
  cysteines followed                                        Image source: www.bmb.psu.edu/faculty/tan/lab/
  by two histidines                                         tanlab_gallery_protdna.html




    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Leucine Zippers
• Found by looking for
  two antiparallel alpha
  helices held together

• Interactions between
  hydrophobic leucine
  residues found every
  seventh position in helix                                   Image source: ww2.mcgill.ca/biology/undergra/
                                                              c200a/sec3-5.htm




    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Transmembrane Proteins
• traverse back and forth
  through alpha helices

• Typical length: 20-30
  residues

• Transmembrane alpha
  helices have hydrophobic
  residues on the inside
  facing portions, and
  hydrophilic residues on the
  outside                                                 Image source:
                                                          http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

     CECS 694-02 Introduction to Bioinformatics University of Louisville    Spring 2004 Dr. Eric Rouchka
Membrane Prediction
               Programs
• PHDhtm: employs neural network approach;
  neural network trained to recognize sequence
  patterns and variations of helices in
  transmembrane proteins of known structures

• Tmpred: functions by searching a protein
  against a sequence scoring matrix obtained
  by aligning the sequences of all known
  transmembrane alpha helix regions

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Uses graphical procedure similar to dot
  plots

• Identifies atoms that lie most closely
  together in three-dimensional structure

• Two sequences with similar structure
  can have dot plots superimposed

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Values in distance matrix represent distance
  between the Cα atoms in the three
  dimensional structure

• positions of closest packing atoms marked
  with a dot to highlight regions of interest

• Similar groups superimposed as closely as
  possible by minimizing sum of atomic
  distances
    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
DALI
• Distance Alignment Tool (DALI)

• Uses distance matrix method to align protein
  structures

• Assembly step uses Monte Carlo simulation
  to find sub-matrices that can be aligned

• Existing structures that have been compared
  are organized into the FSSP database
   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Fast Structural Similarity
              Search
• Compare types and arrangements of
  secondary structures within two proteins

• If elements similarly arranged, three-
  dimensional structures are similar

• VAST and SARF are programs that use
  these fast methods

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Structural Motifs Based on
      Sequence Analysis
• Some structural elements can be
  determined by looking at sequence
  composition
  – zinc finger motifs
  – leucine zippers
  – coiled-coil structures




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Zinc Finger Motifs
• Found by looking at
  order and spacing of
  cysteine and
  histidine residues

• Typical zinc finger
  motifs are
  composed of two
  cysteines followed                                        Image source: www.bmb.psu.edu/faculty/tan/lab/
  by two histidines                                         tanlab_gallery_protdna.html




    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Leucine Zippers
• Found by looking for
  two antiparallel alpha
  helices held together

• Interactions between
  hydrophobic leucine
  residues found every
  seventh position in helix                                   Image source: ww2.mcgill.ca/biology/undergra/
                                                              c200a/sec3-5.htm




    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Transmembrane Proteins
• traverse back and forth
  through alpha helices

• Typical length: 20-30
  residues

• Transmembrane alpha
  helices have hydrophobic
  residues on the inside
  facing portions, and
  hydrophilic residues on the
  outside                                                 Image source:
                                                          http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

     CECS 694-02 Introduction to Bioinformatics University of Louisville    Spring 2004 Dr. Eric Rouchka
Membrane Prediction
               Programs
• PHDhtm: employs neural network approach;
  neural network trained to recognize sequence
  patterns and variations of helices in
  transmembrane proteins of known structures

• Tmpred: functions by searching a protein
  against a sequence scoring matrix obtained
  by aligning the sequences of all known
  transmembrane alpha helix regions

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Chou-Fasman Method
• based on analyzing frequency of amino acids in
  different secondary structures
   – A, E, L, and M strong predictors of alpha helices
   – P and G are predictors in the break of a helix


• Table of predictive values created for alpha helices,
  beta sheets, and loops

• Structure with greatest overall prediction value
  greater than 1 used to determine the structure



    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
GOR Method
• Improves upon the Chou-Fasman method

• Assumes amino acids surrounding the central amino
  acid influence secondary structure central amino acid
  is likely to adopt

• Scoring matrices used in GOR method, incorporates
  information theory and Bayesian statistics

• Mount, p450-451


    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Neural Network Models
• Programs trained to recognize amino acid
  patterns located in known secondary
  structures

• distinguish these patterns from patterns not
  located in structures

• PHD and NNPREDICT use neural networks


    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Nearest-neighbor
• machine learning method

• secondary structure confirmation of an amino
  acid calculated by identifying sequences of
  known structures similar to the query by
  looking at the surrounding amino acids

• Nearest-neighbor programs include include
  PSSP, Simpa96, SOPM, and SOPMA

   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Prediction of 3d Structures
• Threading is most Robust technique
• Time consuming
• Requires knowledge of protein structure




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Threading
• Searches for structures with similar folds
  without sequence similarity

• Threading takes a sequence with unknown
  structure and threads it through the
  coordinates of a target protein whose
  structure has been solved
  – X-ray crystallography
  – NMR imaging


    CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Threading
• Considered position by position subject
  to predetermined constraints

• Thermodynamic calculations made to
  determine most energetically favorable
  and confirmationally stable alignment



   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Environmental Template
• Environment of each amino acid in each
  known structural core is determined
  – secondary structure
  – area of side chain buried by closeness to
    other atoms
  – types of nearby side chains




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Environmental Template
• Each position classified into one of 18
  types
  – 6 representing increasing levels of residue
    burial
  – three classes of secondary structure (alpha
    helices, beta sheets, and loops).




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Upcoming Seminars
• Topic TBA
  – Rafael Irizarry, Johns Hopkins University
       • Friday, 4/23/2004
       • 8:30 AM – 9:30 AM
       • LOCATION: K-Building Room 2036 (HSC
         Campus)




   CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka
Presentations
•   4:45 – 5:00 Richard Jones
•   5:00 – 5:15 Steven Xu
•   5:15 – 5:30 Olutola Iyun
•   5:30 – 5:45 Frank Baker
•   5:45 – 6:00 Guanghui Lan
•   6:00 – 6:15 Tim Hardin
•   6:15 – 6:30 Satish Bollimpalli & Ravi
    Gundlapalli

     CECS 694-02 Introduction to Bioinformatics University of Louisville   Spring 2004 Dr. Eric Rouchka

Protein Structure Prediction

  • 1.
    Rohit Digitally signed by Rohit Jhawer DN: cn=Rohit Jhawer, o, ou, email=rohit_jhawer@hotmail. Jhawer com, c=IN Date: 2007.03.09 14:10:44 +05'30' Lecture 14: Protein Structure Prediction CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 2.
    Review of Proteins •Proteins: polypeptides with a three dimensional structure • • Primary structure – sequence of amino acids constituting polypeptide chain • Secondary structure – local organization of polypeptide chain into secondary structures such as α helices and β sheets CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 3.
    Review of Proteins •Tertiary structure –three dimensional arrangements of amino acids as they react to one another due to polarity and interactions between side chains • Quaternary structure – Interaction of several protein subunits CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 4.
    Protein Structure • Proteins:chains of amino acids joined by peptide bonds • Amino Acids: – Polar (separate positive and negatively charged regions) – free C=O group (CARBOXYL), can act as hydrogen bond acceptor – free NH group (AMINYL), can act as hydrogen bond donor CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 5.
    Protein Structure CECS 694-02Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 6.
    Protein Structure • Manyconfirmations possible due to the rotation around the Alpha-Carbon (Cα) atom • Confirmational changes lead to differences in three-dimensional structure of protein CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 7.
    Protein Structure • Polypeptidechain has pattern of N-Cα-C repeated • Angle between aminyl group and Cα is PHI (φ) angle; angle between Cα and carboxyl group is PSI (ψ) angle CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 8.
    Protein Structure CECS 694-02Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 9.
    Differences between A.A.’s •Difference between 20 amino acids is the R side chains • Amino acids can be separated based on the chemical properties of the side chains: – Hydrophobic – Charged – Polar CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 10.
    Differences between A.A.’s •Hydrophobic: Alanine(A), Valine(V), phenylalanine (Y), Proline (P), Methionine (M), isoleucine (I), and Leucine(L) • Charged: Aspartic acid (D), Glutamic Acid (E), Lysine (K), Arginine (R) • Polar: Serine (S), Theronine (T), Tyrosine (Y); Histidine (H), Cysteine (C), Asparagine (N), Glutamine (Q), Tryptophan (W) • CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 11.
    Secondary Structure • Image source: http://www.ebi.ac.uk/microarray/biology_intro.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 12.
    Secondary Structures • Coreof each protein made up of regular secondary structures • Regular patterns of hydrogen bonds are formed between neighboring amino acids • Amino acids in secondary structures have similar φ and ψ angles CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 13.
    Secondary Structures • Structuresact to neutralize the polar groups on each amino acid • Secondary structures tightly packed in protein core and a hydrophobic environment • Each amino acid side group has a limited space to occupy -- therefore a limited number of possible interactions CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 14.
    Types of Secondary Structures • α Helices • β Sheets • Loops • Coils CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 15.
    α Helix • Most abundant secondary structure • 3.6 amino acids per turn • Hydrogen bond formed between every fourth reside • Average length: 10 amino acids, or 3 turns • Varies from 5 to 40 amino acids Image source: http://www.hhmi.princeton.edu/sw/2002/psidelsk/scavengerhunt.htm; http://www4.ocn.ne.jp/~bio/biology/protein.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 16.
    α Helix • Normallyfound on the surface of protein cores • Interact with aqueous environment – Inner facing side has hydrophobic amino acids – Outer-facing side has hydrophilic amino acids CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 17.
    α Helix • Everythird amino acid tends to be hydrophobic • Pattern can be detected computationally • Rich in alanine (A), gutamic acid (E), leucine (L), and methionine (M) • Poor in proline (P), glycine (G), tyrosine (Y), and serine (S) CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 18.
    β Sheet Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html; http://www4.ocn.ne.jp/~bio/biology/protein.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 19.
    β Sheet • Hydrogenbonds between 5-10 consecutive amino acids in one portion of the chain with another 5-10 farther down the chain • Interacting regions may be adjacent with a short loop, or far apart with other structures in between CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 20.
    β Sheet • Directions: – Same: Parallel Sheet – Opposite: Anti-parallel Sheet – Mixed: Mixed Sheet • Pattern of hydrogen bond formation in parallel and anti-parallel sheets is different CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 21.
    β Sheet • Slightcounterclockwise rotation • Alpha carbons (as well as R side groups) alternate above and below the sheet • Prediction difficult, due to wide range of φ and ψ angles CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 22.
    Interactions in Helicesand Sheets CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 23.
    Loop • Regions betweenα helices and β sheets • Various lengths and three-dimensional configurations • Located on surface of the structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 24.
    Loop • Hairpin loops:complete turn in the polypeptide chain, (anti-parallel β sheets) • More variable sequence structure • Tend to have charged and polar amino acids • Frequently a component of active sites CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 25.
    Coil • Region ofsecondary structure that is not a helix, sheet, or loop CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 26.
    Secondary Structure • Image source: http://www.ebi.ac.uk/microarray/biology_intro.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 27.
    6 Classes ofProtein Structure 1) Class α: bundles of α helices connected by loops on surface of proteins 2) Class β: antiparallel β sheets, usually two sheets in close contact forming sandwich 3) Class α/β: mainly parallel β sheets with intervening α helices; may also have mixed β sheets (metabolic enzymes) CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 28.
    6 Classes ofProtein Structure 4) Class α+ β: mainly segregated α helices and antiparallel β sheets 5) Multidomain (α and β) proteins more than one of the above four domains 6) Membrane and cell-surface proteins and peptides excluding proteins of the immune system CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 29.
    α Class Protein(hemoglobin) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=3hhb;page=;pid=&opt=show&size=250 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 30.
    β Class Protein(T-Cell CD8) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1cd8;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 31.
    α/ β ClassProtein (tryptohan synthase) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=2wsy;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 32.
    α+β Class Protein (1RNB) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1rnb;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 33.
    Membrane Protein (10PF) • http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1opf;page=;pid=&opt=show&size=500 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 34.
    Protein Structure Databases •Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques • Protein Databases: – PDB – SCOP – Swiss-Prot – PIR CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 35.
    Protein Structure Databases •Most extensive for 3-D structure is the Protein Data Bank (PDB) • Current release of PDB (April 8, 2003) has 20,622 structures CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 36.
    Partial PDB File ATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162 ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163 ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164 ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165 ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166 ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167 ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168 ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169 ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170 ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171 ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172 ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173 ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174 ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175 ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 37.
    Description of PDBFile • second column: amino acid position in the polypeptide chain • fourth column: current amino acid • Columns 7, 8, and 9: x, y, and z coordinates (in angstroms) • The 11th column: temperature factor -- can be used as a measurement of uncertainty CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 38.
    Protein Structure Classification Databases • Structural Classification of proteins (SCOP) • based on expert definition of structural similarities • SCOP classifies by class, family, superfamily, and fold • http://scop.mrc-lmb.cam.ac.uk/scop/ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 39.
    Protein Structure Classification Databases • Classification by class, architecture, topology, and homology (CATH) • Classifies proteins into hierarchical levels by class • a/B and a+B are considered to be a single class • http://www.biochem.ucl.ac.uk/bsm/cath/ CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 40.
    Protein Structure Classification Databases • Molecular Modeling Database (MMDB) • structures from PDB categorized into structurally related groups using the VAST • looks for similar arrangements of secondary structural elements • http://www.ncbi.nlm.nih.gov/Entrez CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 41.
    Protein Structure Classification Databases • Spatial Arrangement of Backbone Fragments (SARF) • categorized on structural similarities, similar to the MMDB • http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 42.
    Visualization of Proteins •A number of programs convert atomic coordinates of 3-d structures into views of the molecule • allow the user to manipulate the molecule by rotation, zooming, etc. • Critical in drug design -- yields insight into how the protein might interact with ligands at active sites CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 43.
    Visualization of Proteins •Most popular program for viewing 3- dimensional structures is Rasmol Rasmol: http://www.umass.edu/microbio/rasmol/ Chime: http://www.umass.edu/microbio/chime/ Cn3D: http://www.ncbi.nlm.nih.gov/Structure/ Mage: http://kinemage.biochem.duke.edu/website/kinhome.html Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 44.
    Alignment of ProteinStructure • Three-dimensional structure of one protein compared against three-dimensional structure of second protein • Atoms fit together as closely as possible to minimize the average deviation • Structural similarity between proteins does not necessarily mean evolutionary relationship CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 45.
    Alignment of ProteinStructure • Positions of atoms in three-dimensional structures compared • Look for positions of secondary structural elements (helices and strands) within a protein domain CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 46.
    Alignment of ProteinStructure • Distances between carbon atoms examined to determine degree structures may be superimposed • Side chain information can be incorporated – Buried; visible CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 47.
    SSAP • Secondary StructureAlignment Program • Incorporates double dynamic programming to produce a structural alignment between two proteins CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 48.
    Steps in SSAP •1) Calculate vectors from Cβ of one amino acid to set of nearby amino acids – Vectors from two separate proteins compared – Difference (expressed as an angle) calculated, and converted to score • 2) Matrix for scores of vector differences from one protein to the next is computed. CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 49.
    Steps in SSAP •3) Optimal alignment found using global dynamic programming, with a constant gap penalty • 4) Next amino acid residue considered, optimal path to align this amino acid to the second sequence computed CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 50.
    Steps in SSAP •5) Alignments transferred to summary matrix – If paths cross same matrix position, scores are summed – If part of alignment path found in both matrices, evidence of similarity CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 51.
    Steps in SSAP •6) Dynamic programming alignment is performed for the summary matrix – Final alignment represents optimal alignment between the protein structures – Resulting score converted so it can be compared to see how closely related two structures are CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 52.
    Distance Matrix Approach •Uses graphical procedure similar to dot plots • Identifies atoms that lie most closely together in three-dimensional structure • Two sequences with similar structure can have dot plots superimposed CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 53.
    Distance Matrix Approach •Values in distance matrix represent distance between the Cα atoms in the three dimensional structure • positions of closest packing atoms marked with a dot to highlight regions of interest • Similar groups superimposed as closely as possible by minimizing sum of atomic distances CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 54.
    DALI • Distance AlignmentTool (DALI) • Uses distance matrix method to align protein structures • Assembly step uses Monte Carlo simulation to find submatrices that can be aligned • Existing structures that have been compared are organized into the FSSP database CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 55.
    Fast Structural Similarity Search • Compare types and arrangements of secondary structures within two proteins • If elements similarly arranged, three- dimensional structures are similar • VAST and SARF are programs that use these fast methods CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 56.
    Structural Motifs Basedon Sequence Analysis • Some structural elements can be determined by looking at sequence composition – zinc finger motifs – leucine zippers – coiled-coil structures CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 57.
    Zinc Finger Motifs •Found by looking at order and spacing of cysteine and histidine residues • Typical zinc finger motifs are composed of two cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/ by two histidines tanlab_gallery_protdna.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 58.
    Leucine Zippers • Foundby looking for two antiparallel alpha helices held together • Interactions between hydrophobic leucine residues found every seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 59.
    Transmembrane Proteins • traverseback and forth through alpha helices • Typical length: 20-30 residues • Transmembrane alpha helices have hydrophobic residues on the inside facing portions, and hydrophilic residues on the outside Image source: http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 60.
    Membrane Prediction Programs • PHDhtm: employs neural network approach; neural network trained to recognize sequence patterns and variations of helices in transmembrane proteins of known structures • Tmpred: functions by searching a protein against a sequence scoring matrix obtained by aligning the sequences of all known transmembrane alpha helix regions CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 61.
    Distance Matrix Approach •Uses graphical procedure similar to dot plots • Identifies atoms that lie most closely together in three-dimensional structure • Two sequences with similar structure can have dot plots superimposed CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 62.
    Distance Matrix Approach •Values in distance matrix represent distance between the Cα atoms in the three dimensional structure • positions of closest packing atoms marked with a dot to highlight regions of interest • Similar groups superimposed as closely as possible by minimizing sum of atomic distances CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 63.
    DALI • Distance AlignmentTool (DALI) • Uses distance matrix method to align protein structures • Assembly step uses Monte Carlo simulation to find sub-matrices that can be aligned • Existing structures that have been compared are organized into the FSSP database CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 64.
    Fast Structural Similarity Search • Compare types and arrangements of secondary structures within two proteins • If elements similarly arranged, three- dimensional structures are similar • VAST and SARF are programs that use these fast methods CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 65.
    Structural Motifs Basedon Sequence Analysis • Some structural elements can be determined by looking at sequence composition – zinc finger motifs – leucine zippers – coiled-coil structures CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 66.
    Zinc Finger Motifs •Found by looking at order and spacing of cysteine and histidine residues • Typical zinc finger motifs are composed of two cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/ by two histidines tanlab_gallery_protdna.html CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 67.
    Leucine Zippers • Foundby looking for two antiparallel alpha helices held together • Interactions between hydrophobic leucine residues found every seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 68.
    Transmembrane Proteins • traverseback and forth through alpha helices • Typical length: 20-30 residues • Transmembrane alpha helices have hydrophobic residues on the inside facing portions, and hydrophilic residues on the outside Image source: http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 69.
    Membrane Prediction Programs • PHDhtm: employs neural network approach; neural network trained to recognize sequence patterns and variations of helices in transmembrane proteins of known structures • Tmpred: functions by searching a protein against a sequence scoring matrix obtained by aligning the sequences of all known transmembrane alpha helix regions CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 70.
    Chou-Fasman Method • basedon analyzing frequency of amino acids in different secondary structures – A, E, L, and M strong predictors of alpha helices – P and G are predictors in the break of a helix • Table of predictive values created for alpha helices, beta sheets, and loops • Structure with greatest overall prediction value greater than 1 used to determine the structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 71.
    GOR Method • Improvesupon the Chou-Fasman method • Assumes amino acids surrounding the central amino acid influence secondary structure central amino acid is likely to adopt • Scoring matrices used in GOR method, incorporates information theory and Bayesian statistics • Mount, p450-451 CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 72.
    Neural Network Models •Programs trained to recognize amino acid patterns located in known secondary structures • distinguish these patterns from patterns not located in structures • PHD and NNPREDICT use neural networks CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 73.
    Nearest-neighbor • machine learningmethod • secondary structure confirmation of an amino acid calculated by identifying sequences of known structures similar to the query by looking at the surrounding amino acids • Nearest-neighbor programs include include PSSP, Simpa96, SOPM, and SOPMA CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 74.
    Prediction of 3dStructures • Threading is most Robust technique • Time consuming • Requires knowledge of protein structure CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 75.
    Threading • Searches forstructures with similar folds without sequence similarity • Threading takes a sequence with unknown structure and threads it through the coordinates of a target protein whose structure has been solved – X-ray crystallography – NMR imaging CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 76.
    Threading • Considered positionby position subject to predetermined constraints • Thermodynamic calculations made to determine most energetically favorable and confirmationally stable alignment CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 77.
    Environmental Template • Environmentof each amino acid in each known structural core is determined – secondary structure – area of side chain buried by closeness to other atoms – types of nearby side chains CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 78.
    Environmental Template • Eachposition classified into one of 18 types – 6 representing increasing levels of residue burial – three classes of secondary structure (alpha helices, beta sheets, and loops). CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 79.
    Upcoming Seminars • TopicTBA – Rafael Irizarry, Johns Hopkins University • Friday, 4/23/2004 • 8:30 AM – 9:30 AM • LOCATION: K-Building Room 2036 (HSC Campus) CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
  • 80.
    Presentations • 4:45 – 5:00 Richard Jones • 5:00 – 5:15 Steven Xu • 5:15 – 5:30 Olutola Iyun • 5:30 – 5:45 Frank Baker • 5:45 – 6:00 Guanghui Lan • 6:00 – 6:15 Tim Hardin • 6:15 – 6:30 Satish Bollimpalli & Ravi Gundlapalli CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka