Protein Structure Databases
 Databases of three dimensional structures of proteins,
where structure has been solved using X-ray
crystallography or nuclear magnetic resonance (NMR)
techniques
 Protein Databases:
 PDB (protein data bank)
 Swiss-Prot
 PIR (Protein Information Resource)
 SCOP (Structural Classification of Proteins)
2
Fibrous proteins have a structural role
Source:http://www.prideofindia.net/images/nails.jpg
http://opbs.okstate.edu/~petracek/2002%20protein%20structure%20function/CH06/Fig%2006-12.GIF
http://my.webmd.com/hw/health_guide_atoz/zm2662.asp?printing=true
• Collagen is the most abundant protein in
vertebrates. Collagen fibers are a major
portion of tendons, bone and skin. Alpha
helices of collagen make up a triple helix
structure giving it tough and flexible
properties.
• Fibroin fibers make the silk spun by
spiders and silk worms stronger weight for
weight than steel! The soft and flexible
properties come from the beta structure.
• Keratin is a tough insoluble protein that
makes up the quills of echidna, your hair
and nails and the rattle of a rattle snake.
The structure comes from alpha helices
that are cross-linked by disulfide bonds.
3
The globular proteins
The globular proteins have a number of biologically important roles. They
include:
Cell motility – proteins link together to form filaments which make movement
possible.
Organic catalysts in biochemical reactions – enzymes
Regulatory proteins – hormones, transcription factors
Membrane proteins – MHC markers, protein channels, gap junctions
Defense against pathogens – poisons/toxins, antibodies, complement
Transport and storage – hemoglobin and myosin
4
Proteins for cell motility
Source: http://www.ebsa.org/npbsn41/maf_home.html
http://sun0.mpimf-
Above: Myosin (red) and actin filaments
(green) in coordinated muscle contraction.
Right: Actin bound to the mysoin binding
site (groove in red part of myosin protein).
Add energy (ATP) and myosin moves,
moving actin with it.
5
Eukaryote cells have a cytoskeleton made up of straight hollow
cylinders called microtubules (bottom left).
They help cells maintain their shape, they act like conveyer belts
moving organelles around in the cytoplasm, and they participate in
forming spindle fibres in cell division.
Microtubules are composed of filaments of the protein, tubulin (top
left) . These filaments are compressed like springs allowing
microtubules to ‘stretch and contract’.
13 of these filaments attach side to side, a little like the slats in a
barrel, to form a microtubule. This barrel shaped structure gives
strength to the microtubule.
Tubulin
forms
helical
filaments
Source:
heidelberg.mpg.de/shared/docs/staff/user/0001/24.php3?department=01&LANG=en
http://www.fz-juelich.de/ibi/ibi-1/Cellular_signaling/
http://cpmcnet.columbia.edu/dept/gsas/anatomy/Faculty/Gundersen/main.html
Proteins in the Cell Cytoskeleton
6
Catalase speeds up the
breakdown of
hydrogen peroxide,
(H2O2) a toxic by
product of metabolic
reactions, to the
harmless substances,
water and oxygen.
The reaction is extremely
rapid as the enzyme
lowers the energy
needed to kick-start
the reaction (activation
energy)
Energy
Progress of reaction
Substrate Product
No catalyst =
Input of 71kJ energy required
Activation
Energy
With catalase
= Input of 8 kJ energy required
Proteins speed up reactions - Enzymes
+
2 2
7
Proteins can regulate metabolism – hormones
When your body detects an increase in the sugar
content of blood after a meal, the hormone
insulin is released from cells in the pancreas.
Insulin binds to cell membranes and this triggers the
cells to absorb glucose for use or for storage as
glycogen in the liver.
Proteins span membranes –protein channels
Source: http://www.biology.arizona.edu/biochemistry/tutorials/chemistry/page2.html
http://www.cbp.pitt.edu/bradbury/projects.htm
The CFTR membrane protein is an ion channel that
regulates the flow of chloride ions.
Not enough of this protein gets inserted into the
membranes of people suffering Cystic fibrosis. This causes
secretions to become thick as they are not hydrated. The
lungs and secretory ducts become blocked as a
consequence.
8
Proteins Defend us against pathogens –
antibodies
Source: http://www.biology.arizona.edu/immunology/tutorials/antibody/FR.html
http://tutor.lscf.ucsb.edu/instdev/sears/immunology/info/sears-ab.htm
http://www.spilya.com/research/
http://www.umass.edu/microbio/chime/
Left: Antibodies like IgG found in
humans, recognise and bind to
groups of molecules or epitopes
found on foreign invaders.
Right: The binding site of an antigen
protein (left) interacting with the
epitope of a foreign antigen (green)
9
Making Proteins
How are such a diverse range of proteins possible? The code for making a protein is
found in your genes (on your DNA). This genetic code is copied onto a messenger
RNA molecule. The mRNA code is read in multiples of 3 (a codon) by ribosomes
which join amino acids together to form a polypeptide. This is known as gene
expression.
Source: http://genetics.nbii.gov/Basic1.html
10
G T A C T A
Chromosome
The order of bases in
DNA is a code for
making proteins. The
code is read in groups of
three
DNA
Gene
Cell machinery
copies the code
making an mRNA
molecule. This
moves into the
cytoplasm.
Ribosomes read the
code and accurately
join Amino acids
together to make a
protein
AUGAGUAAAGGAGAAGAACUUUUCACUGGAUA
M S E E L
K G T
F G
The protein folds
to form its
working shape
M
S E
K
G
E
L T
F G
M
S
E
K
G
E
L T
F G
M
S
E
K
G
E
L
T
F
G
M
S
E
K
G
E
L
T
F
G
M
S
E
K
G
E
L
T F
G
CELL
NUCLEUS
Gene Expression
M
S
E
K
G
E
L
T F
G
T
G
M
S
E
K
G
E
L
F
T
G
M
E
K
G
E
L
F
S
11
The building blocks
The amino acids for making new proteins come from
the proteins that you eat and digest. Every time you
eat a burger (vege or beef), you break the proteins
down into single amino acids ready for use in
building new proteins. And yes, proteins have the
job of digesting proteins, they are known as
proteases.
There are only 20 different amino acids but they can
be joined together in many different combinations
to form the diverse range of proteins that exist on
this planet
12
Amino Acids
An amino acid is a relatively small molecule with characteristic groups of
atoms that determine its chemical behaviour.
The structural formula of an amino acid is shown at the end of the animation
below. The R group is the only part that differs between the 20 amino acids.
O
R
O
H
H
H
H N C C
H3C
CH3
C H
C
H
H H
Glycine
Alanine
Valine
Cysteine
Phenylalanine
H
H
C
S
H H
C
H H
Amino Acid
13
The 20 Amino Acids
The amino acids each have their own shape and charge
due to their specific R group.
View the molecular shape of amino acids by clicking on
the URL link below:
http://sosnick.uchicago.edu/amino_acids.html
Would the shape of a protein be affected if the wrong
amino acid were added to a growing protein chain?
14
Making a Polypeptide
H2N
C
O
C
R
C
O
C
O¯H
R
N
H
H
O H
O H
H N
C
O
C
R
H
O H
H
C
O
C
R
N
H
N
C
O
C
R
H2N
C
O
C
R H
O H
O H
N
C
O
C
R
H2N
C
O
C
R H
N
C
O
C
R
H
Peptide Bond Peptide Bond
Peptide Bond
Polypeptide production = Condensation Reaction
Polypeptide
Growth
15
Protein structure
16
Why Investigate Protein Structure?
Proteins are complex molecules whose
structure can be discussed in terms
of:
primary structure
secondary structure
tertiary structure
quaternary structure
The structure of proteins is important as
the shape of a protein allows it to
perform its particular role or function
Four levels of protein
structure
18
Protein Primary Structure
The primary structure is the sequence of amino acids that are linked
together. The linear structure is called a polypeptide
http://www.mywiseowl.com/articles/Image:Protein-primary-structure.png
19
Protein Secondary Structure
The secondary structure of proteins consists of:
alpha helices
beta sheets
Random coils – usually form the binding and active sites of
proteins
Source: http://www.rothamsted.bbsrc.ac.uk/notebook/courses/guide/prot.htm#I
20
Protein Tertiary Structure
Involves the way the random coils, alpha
helices and beta sheets fold in respect to
each other.
This shape is held in place by bonds such as
• weak Hydrogen bonds between amino
acids that lie close to each other,
• strong ionic bonds between R groups
with positive and negative charges, and
• disulfide bridges (strong covalent S-S
bonds)
Amino acids that were distant in the primary
structure may now become very close to
each other after the folding has taken
place
The subunit of a more complex protein has
now been formed. It may be globular or
fibrous. It now has its functional shape or
conformation.
Source: io.uwinnipeg.ca/~simmons/ cm1503/proteins.htm
21
Protein Quaternary Structure
This is packing of the protein subunits to
form the final protein complex. For
example, the human hemoglobin
molecule is a tetramer made up of
two alpha and two beta polypeptide
chains (right)
Source:
www.cem.msu.edu/~parrill/movies/neur
am.GIF
This is also when the protein associates
with non-proteic groups. For example,
carbohydrates can be added to form a
glycoprotein
Source:
www.ibri.org/Books/ Pun_Evolution/Cha
pter2/2.6.htm
Protein Structure Prediction
 Why ?
 Type of protein structure
predictions
 Sec Str. Pred
 Homology Modelling
 Fold Recognition
 Ab Initio
 Secondary structure prediction
 Why
 History
 Performance
 Usefullness
Why do we need structure
prediction?
 3D structure give clues to function:
 active sites, binding sites, conformational
changes...
 structure and function conserved more than
sequence
 3D structure determination is difficult, slow and
expensive
 Intellectual challenge, Nobel prizes etc...
 Engineering new proteins
The Use of Structure
The Use of Structure
The Use of Structure
It's not that simple...
 Amino acid sequence contains all
the information for 3D structure
(experiments of Anfinsen, 1970's)
 But, there are thousands of atoms,
rotatable bonds, solvent and other
molecules to deal with...
 Levinthal's paradox
Structure prediction
Summary of the four main approaches to structure prediction. Note
that there are overlaps between nearly all categories.
Method Knowledge Approach Difficulty Usefulness
Comparative
modelling
(Homology
modelling)
Proteins of
known
structure
Identify related structure with
sequence methods, copy 3D
coords and modify where
necessary
Relatively easy Very, if sequence identity
drug design
Fold
recognition
Proteins of
known
structure
Same as above, but use more
sophisticated methods to find
related structure
Medium Limited due to poormodels
Secondary
structure
prediction
Sequence-
structure
statistics
Forget 3D arrangement and
predict where the helices/strands
are
Medium Can improve alignments,
fold recognition, ab initio
ab initio
tertiary
structure
prediction
Energy
functions,
statistics
Simulate folding, or generate lots
of structures and try to pick the
correct one
Very hard Not really
Secondary structures -Helix
Secondary Structure - Sheet
Secondary structure - turns
Secondary Structure
Predictions
Some highlights in performance
 1974 Chou and Fasman 50%
 1978 Garnier 62%
 1993 PhD 72%
 2000 PsiPred 76%
Secondary
structure
prediction
1st generation
methods
 Chou and Fassman
1) Assign all residues the appropriate set of parameters.
2) Scan through the peptide and identify helical regions
3) Repeat this procedure to locate all of the helical regions in the
sequence.
4) Scan through the peptide and identify sheet regions.
5) Solve conflicts between helical and sheet assignments
6) Identify turns
 Claims of around 70-80% - actual accuracy about 50-60%
Helix Strand
Strong former E A L M V I
Former H M Q W V F C Y F Q L T W
Weak former K I A
Indifferent D T S R C R G D
Breaker N Y K S H N P
Strong
breaker
P G E
GOR III
Garnier, Osguthorpe, Robson, 1990
 Secondary structure depends on aminoacids
propensities
 As in Chou Fassman
 Also influences by neighboring residues
 Helix capping
 Turns etc
 How to include distant information.
 Performance approximately 67%
GOR III
Garnier, Osguthorpe, Robson, 1990
The helix propensity tables thus have 20x17 entries.
Assign the state with the highest propensity
Status of predictions in 1990
 Too short secondary structure segments
 About 65% accuracy
 Worse for Beta-strands
 Example:
Secondary structure prediction
2nd generation methods
 sequence-to-structure relationship modelled
using more complex statistics, e.g. artificial
neural networks (NNs) or hidden Markov
models (HMMs)
 evolutionary information included (profiles)
 prediction accuracy >70% (PhD, Rost 1993)
PhD-predictions
 Secondary structure ``prediction'' by
homology
 If sequence of unknown secondary structure has a homologue of known
structure, it is more accurate to make an alignment and copy the known
secondary structure over to the unknown sequence, than to do ``ab initio''
secondary structure prediction.
3rd generation methods
 enhanced evolutionary sequence information
(PSI-BLAST profiles) and larger sequence
databases takes Q3 to > 75%
 PHD and PSIPRED are the best known
methods
PSIPRED
 Similar to PhD
 Psiblast to detect more remote homologs
 only two layers
 SVM or NN gives similar performance
Alignment of Protein Structure
 Compare 3D structure of one protein against 3D
structure of second protein
 Compare positions of atoms in three-dimensional structures
 Look for positions of secondary structural elements (helices
and strands) within a protein domain
 Exam distances between carbon atoms to determine degree
structures may be superimposed
 Side chain information can be incorporated
 Buried; visible
 Structural similarity between proteins does not necessarily
mean evolutionary relationship
Alignment of Protein Structure
T
Simple case – two closely related proteins with the
same number of amino acids.
Structure alignment
Find a transformation
to achieve the best
superposition
Types of
Structure Comparison
 Sequence-dependent vs. sequence-independent
structural alignment
 Global vs. local structural alignment
 Pairwise vs. multiple structural alignment
1234567
ASCRKLE
¦¦¦¦¦¦¦
ASCRKLE
1
2
3 4
5 6
7
1
2
3
4 5
6 7
Minimize rmsd
of distances 1-1,...,7-7
 

N
i
i
y
i
x
N
rmsd 2
))
(
)
(
(
1
Sequence-dependent Structure
Comparison
1
2
3 4
5 6
7
1
2
3
4 5
6
7
Sequence-dependent Structure
Comparison
 Can be solved in O(n) time.
 Useful in comparing structures of the same protein
solved in different methods, under different
conformation, through dynamics.
 Evaluation protein structure prediction.
Sequence-independent Structure
Comparison
Given two configurations of points in the three
dimensional space:
find T which produces “largest” superimpositions of
corresponding 3-D points.
T
Evaluating Structural Alignments
1. Number of amino acid correspondences created.
2. RMSD of corresponding amino acids
3. Percent identity in aligned residues
4. Number of gaps introduced
5. Size of the two proteins
6. Conservation of known active site environments
7. …
No universally agreed upon criteria. It depends on what you are
using the alignment for.

Bioinformatik tools-protein-structure.pptx

  • 1.
    Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques  Protein Databases:  PDB (protein data bank)  Swiss-Prot  PIR (Protein Information Resource)  SCOP (Structural Classification of Proteins)
  • 2.
    2 Fibrous proteins havea structural role Source:http://www.prideofindia.net/images/nails.jpg http://opbs.okstate.edu/~petracek/2002%20protein%20structure%20function/CH06/Fig%2006-12.GIF http://my.webmd.com/hw/health_guide_atoz/zm2662.asp?printing=true • Collagen is the most abundant protein in vertebrates. Collagen fibers are a major portion of tendons, bone and skin. Alpha helices of collagen make up a triple helix structure giving it tough and flexible properties. • Fibroin fibers make the silk spun by spiders and silk worms stronger weight for weight than steel! The soft and flexible properties come from the beta structure. • Keratin is a tough insoluble protein that makes up the quills of echidna, your hair and nails and the rattle of a rattle snake. The structure comes from alpha helices that are cross-linked by disulfide bonds.
  • 3.
    3 The globular proteins Theglobular proteins have a number of biologically important roles. They include: Cell motility – proteins link together to form filaments which make movement possible. Organic catalysts in biochemical reactions – enzymes Regulatory proteins – hormones, transcription factors Membrane proteins – MHC markers, protein channels, gap junctions Defense against pathogens – poisons/toxins, antibodies, complement Transport and storage – hemoglobin and myosin
  • 4.
    4 Proteins for cellmotility Source: http://www.ebsa.org/npbsn41/maf_home.html http://sun0.mpimf- Above: Myosin (red) and actin filaments (green) in coordinated muscle contraction. Right: Actin bound to the mysoin binding site (groove in red part of myosin protein). Add energy (ATP) and myosin moves, moving actin with it.
  • 5.
    5 Eukaryote cells havea cytoskeleton made up of straight hollow cylinders called microtubules (bottom left). They help cells maintain their shape, they act like conveyer belts moving organelles around in the cytoplasm, and they participate in forming spindle fibres in cell division. Microtubules are composed of filaments of the protein, tubulin (top left) . These filaments are compressed like springs allowing microtubules to ‘stretch and contract’. 13 of these filaments attach side to side, a little like the slats in a barrel, to form a microtubule. This barrel shaped structure gives strength to the microtubule. Tubulin forms helical filaments Source: heidelberg.mpg.de/shared/docs/staff/user/0001/24.php3?department=01&LANG=en http://www.fz-juelich.de/ibi/ibi-1/Cellular_signaling/ http://cpmcnet.columbia.edu/dept/gsas/anatomy/Faculty/Gundersen/main.html Proteins in the Cell Cytoskeleton
  • 6.
    6 Catalase speeds upthe breakdown of hydrogen peroxide, (H2O2) a toxic by product of metabolic reactions, to the harmless substances, water and oxygen. The reaction is extremely rapid as the enzyme lowers the energy needed to kick-start the reaction (activation energy) Energy Progress of reaction Substrate Product No catalyst = Input of 71kJ energy required Activation Energy With catalase = Input of 8 kJ energy required Proteins speed up reactions - Enzymes + 2 2
  • 7.
    7 Proteins can regulatemetabolism – hormones When your body detects an increase in the sugar content of blood after a meal, the hormone insulin is released from cells in the pancreas. Insulin binds to cell membranes and this triggers the cells to absorb glucose for use or for storage as glycogen in the liver. Proteins span membranes –protein channels Source: http://www.biology.arizona.edu/biochemistry/tutorials/chemistry/page2.html http://www.cbp.pitt.edu/bradbury/projects.htm The CFTR membrane protein is an ion channel that regulates the flow of chloride ions. Not enough of this protein gets inserted into the membranes of people suffering Cystic fibrosis. This causes secretions to become thick as they are not hydrated. The lungs and secretory ducts become blocked as a consequence.
  • 8.
    8 Proteins Defend usagainst pathogens – antibodies Source: http://www.biology.arizona.edu/immunology/tutorials/antibody/FR.html http://tutor.lscf.ucsb.edu/instdev/sears/immunology/info/sears-ab.htm http://www.spilya.com/research/ http://www.umass.edu/microbio/chime/ Left: Antibodies like IgG found in humans, recognise and bind to groups of molecules or epitopes found on foreign invaders. Right: The binding site of an antigen protein (left) interacting with the epitope of a foreign antigen (green)
  • 9.
    9 Making Proteins How aresuch a diverse range of proteins possible? The code for making a protein is found in your genes (on your DNA). This genetic code is copied onto a messenger RNA molecule. The mRNA code is read in multiples of 3 (a codon) by ribosomes which join amino acids together to form a polypeptide. This is known as gene expression. Source: http://genetics.nbii.gov/Basic1.html
  • 10.
    10 G T AC T A Chromosome The order of bases in DNA is a code for making proteins. The code is read in groups of three DNA Gene Cell machinery copies the code making an mRNA molecule. This moves into the cytoplasm. Ribosomes read the code and accurately join Amino acids together to make a protein AUGAGUAAAGGAGAAGAACUUUUCACUGGAUA M S E E L K G T F G The protein folds to form its working shape M S E K G E L T F G M S E K G E L T F G M S E K G E L T F G M S E K G E L T F G M S E K G E L T F G CELL NUCLEUS Gene Expression M S E K G E L T F G T G M S E K G E L F T G M E K G E L F S
  • 11.
    11 The building blocks Theamino acids for making new proteins come from the proteins that you eat and digest. Every time you eat a burger (vege or beef), you break the proteins down into single amino acids ready for use in building new proteins. And yes, proteins have the job of digesting proteins, they are known as proteases. There are only 20 different amino acids but they can be joined together in many different combinations to form the diverse range of proteins that exist on this planet
  • 12.
    12 Amino Acids An aminoacid is a relatively small molecule with characteristic groups of atoms that determine its chemical behaviour. The structural formula of an amino acid is shown at the end of the animation below. The R group is the only part that differs between the 20 amino acids. O R O H H H H N C C H3C CH3 C H C H H H Glycine Alanine Valine Cysteine Phenylalanine H H C S H H C H H Amino Acid
  • 13.
    13 The 20 AminoAcids The amino acids each have their own shape and charge due to their specific R group. View the molecular shape of amino acids by clicking on the URL link below: http://sosnick.uchicago.edu/amino_acids.html Would the shape of a protein be affected if the wrong amino acid were added to a growing protein chain?
  • 14.
    14 Making a Polypeptide H2N C O C R C O C O¯H R N H H OH O H H N C O C R H O H H C O C R N H N C O C R H2N C O C R H O H O H N C O C R H2N C O C R H N C O C R H Peptide Bond Peptide Bond Peptide Bond Polypeptide production = Condensation Reaction Polypeptide Growth
  • 15.
  • 16.
    16 Why Investigate ProteinStructure? Proteins are complex molecules whose structure can be discussed in terms of: primary structure secondary structure tertiary structure quaternary structure The structure of proteins is important as the shape of a protein allows it to perform its particular role or function
  • 17.
    Four levels ofprotein structure
  • 18.
    18 Protein Primary Structure Theprimary structure is the sequence of amino acids that are linked together. The linear structure is called a polypeptide http://www.mywiseowl.com/articles/Image:Protein-primary-structure.png
  • 19.
    19 Protein Secondary Structure Thesecondary structure of proteins consists of: alpha helices beta sheets Random coils – usually form the binding and active sites of proteins Source: http://www.rothamsted.bbsrc.ac.uk/notebook/courses/guide/prot.htm#I
  • 20.
    20 Protein Tertiary Structure Involvesthe way the random coils, alpha helices and beta sheets fold in respect to each other. This shape is held in place by bonds such as • weak Hydrogen bonds between amino acids that lie close to each other, • strong ionic bonds between R groups with positive and negative charges, and • disulfide bridges (strong covalent S-S bonds) Amino acids that were distant in the primary structure may now become very close to each other after the folding has taken place The subunit of a more complex protein has now been formed. It may be globular or fibrous. It now has its functional shape or conformation. Source: io.uwinnipeg.ca/~simmons/ cm1503/proteins.htm
  • 21.
    21 Protein Quaternary Structure Thisis packing of the protein subunits to form the final protein complex. For example, the human hemoglobin molecule is a tetramer made up of two alpha and two beta polypeptide chains (right) Source: www.cem.msu.edu/~parrill/movies/neur am.GIF This is also when the protein associates with non-proteic groups. For example, carbohydrates can be added to form a glycoprotein Source: www.ibri.org/Books/ Pun_Evolution/Cha pter2/2.6.htm
  • 22.
    Protein Structure Prediction Why ?  Type of protein structure predictions  Sec Str. Pred  Homology Modelling  Fold Recognition  Ab Initio  Secondary structure prediction  Why  History  Performance  Usefullness
  • 23.
    Why do weneed structure prediction?  3D structure give clues to function:  active sites, binding sites, conformational changes...  structure and function conserved more than sequence  3D structure determination is difficult, slow and expensive  Intellectual challenge, Nobel prizes etc...  Engineering new proteins
  • 24.
    The Use ofStructure
  • 25.
    The Use ofStructure
  • 26.
    The Use ofStructure
  • 27.
    It's not thatsimple...  Amino acid sequence contains all the information for 3D structure (experiments of Anfinsen, 1970's)  But, there are thousands of atoms, rotatable bonds, solvent and other molecules to deal with...  Levinthal's paradox
  • 28.
    Structure prediction Summary ofthe four main approaches to structure prediction. Note that there are overlaps between nearly all categories. Method Knowledge Approach Difficulty Usefulness Comparative modelling (Homology modelling) Proteins of known structure Identify related structure with sequence methods, copy 3D coords and modify where necessary Relatively easy Very, if sequence identity drug design Fold recognition Proteins of known structure Same as above, but use more sophisticated methods to find related structure Medium Limited due to poormodels Secondary structure prediction Sequence- structure statistics Forget 3D arrangement and predict where the helices/strands are Medium Can improve alignments, fold recognition, ab initio ab initio tertiary structure prediction Energy functions, statistics Simulate folding, or generate lots of structures and try to pick the correct one Very hard Not really
  • 29.
  • 30.
  • 31.
  • 32.
    Secondary Structure Predictions Some highlightsin performance  1974 Chou and Fasman 50%  1978 Garnier 62%  1993 PhD 72%  2000 PsiPred 76%
  • 33.
    Secondary structure prediction 1st generation methods  Chouand Fassman 1) Assign all residues the appropriate set of parameters. 2) Scan through the peptide and identify helical regions 3) Repeat this procedure to locate all of the helical regions in the sequence. 4) Scan through the peptide and identify sheet regions. 5) Solve conflicts between helical and sheet assignments 6) Identify turns  Claims of around 70-80% - actual accuracy about 50-60% Helix Strand Strong former E A L M V I Former H M Q W V F C Y F Q L T W Weak former K I A Indifferent D T S R C R G D Breaker N Y K S H N P Strong breaker P G E
  • 34.
    GOR III Garnier, Osguthorpe,Robson, 1990  Secondary structure depends on aminoacids propensities  As in Chou Fassman  Also influences by neighboring residues  Helix capping  Turns etc  How to include distant information.  Performance approximately 67%
  • 35.
    GOR III Garnier, Osguthorpe,Robson, 1990 The helix propensity tables thus have 20x17 entries. Assign the state with the highest propensity
  • 36.
    Status of predictionsin 1990  Too short secondary structure segments  About 65% accuracy  Worse for Beta-strands  Example:
  • 37.
    Secondary structure prediction 2ndgeneration methods  sequence-to-structure relationship modelled using more complex statistics, e.g. artificial neural networks (NNs) or hidden Markov models (HMMs)  evolutionary information included (profiles)  prediction accuracy >70% (PhD, Rost 1993)
  • 38.
    PhD-predictions  Secondary structure``prediction'' by homology  If sequence of unknown secondary structure has a homologue of known structure, it is more accurate to make an alignment and copy the known secondary structure over to the unknown sequence, than to do ``ab initio'' secondary structure prediction.
  • 39.
    3rd generation methods enhanced evolutionary sequence information (PSI-BLAST profiles) and larger sequence databases takes Q3 to > 75%  PHD and PSIPRED are the best known methods
  • 40.
    PSIPRED  Similar toPhD  Psiblast to detect more remote homologs  only two layers  SVM or NN gives similar performance
  • 41.
    Alignment of ProteinStructure  Compare 3D structure of one protein against 3D structure of second protein  Compare positions of atoms in three-dimensional structures  Look for positions of secondary structural elements (helices and strands) within a protein domain  Exam distances between carbon atoms to determine degree structures may be superimposed  Side chain information can be incorporated  Buried; visible  Structural similarity between proteins does not necessarily mean evolutionary relationship
  • 42.
  • 43.
    T Simple case –two closely related proteins with the same number of amino acids. Structure alignment Find a transformation to achieve the best superposition
  • 44.
    Types of Structure Comparison Sequence-dependent vs. sequence-independent structural alignment  Global vs. local structural alignment  Pairwise vs. multiple structural alignment
  • 45.
    1234567 ASCRKLE ¦¦¦¦¦¦¦ ASCRKLE 1 2 3 4 5 6 7 1 2 3 45 6 7 Minimize rmsd of distances 1-1,...,7-7    N i i y i x N rmsd 2 )) ( ) ( ( 1 Sequence-dependent Structure Comparison 1 2 3 4 5 6 7 1 2 3 4 5 6 7
  • 46.
    Sequence-dependent Structure Comparison  Canbe solved in O(n) time.  Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics.  Evaluation protein structure prediction.
  • 47.
    Sequence-independent Structure Comparison Given twoconfigurations of points in the three dimensional space: find T which produces “largest” superimpositions of corresponding 3-D points. T
  • 48.
    Evaluating Structural Alignments 1.Number of amino acid correspondences created. 2. RMSD of corresponding amino acids 3. Percent identity in aligned residues 4. Number of gaps introduced 5. Size of the two proteins 6. Conservation of known active site environments 7. … No universally agreed upon criteria. It depends on what you are using the alignment for.

Editor's Notes

  • #48 This is DIFFERENT than the problem of aligning two structures in an RMS sense GIVEN the correspondences between atoms (see previous lecture on RMS). In this case, we don’t know which atoms match with which, and so (in principal) we must perform a combinatorial search of all correspondences. RMS is used as a tool to evaluate the correspondences.