Primary structure of proteins
The primary structure is the linear order of amino acid residues along the polypeptide chain.
Every protein is defined by a unique sequence of residues and all subsequent levels of
organization (secondary, super secondary, tertiary and quaternary) rely on this primary level of
Some proteins are related to one another leading to varying degrees of similarity in primary
How do you determine the primary structure of a protein?
1. Determining amino acid composition of a protein
I. Hydrolysis (heat at 100-110 C in 6M HCl for 24 hrs or longer)
II. Separation (chromatography techniques)
III. Quantitative analysis (color producing reagents e.g. ninhydrin)
3. C-terminal amino acid analysis (Carboxypetidases)
2. N-terminal amino acid analysis
I. React the peptide with a reagent which selectively label the terminal amino acid (e.g.
II. Hydrolyze the protein
III. Determine the amino acid by chromatography and compare with standards.
Enzymatic C-terminal amino acid cleavage by one of several carboxypeptidase enzymes is a
fast and convenient method of analysis. A peptide having a C-terminal sequence: ~Gly-Ser-
Leu is subjected to carboxypeptidase cleavage and the free amino acids cleaved in this
reaction are analyzed at increasing time intervals.
C-Terminal Group Analysis
Selective Peptide Cleavage
Name Type Specificity
Cyanogen Bromide Chemical Carboxyl Side of Methionine
Trypsin Enzymatic Carboxyl Side of Basic Amino Acids e.g. Lys & Arg
Chymotrypsin Enzymatic Carboxyl Side of Aryl Amino Acids e.g. Phe, Tyr & Trp
1. DNA or mRNA
2. Edman degradation
3. Mass spectrometry
Protein sequence determination methods
A free amine function, usually in equilibrium with zwitterion species, is necessary for the
initial bonding to the phenyl isothiocyanate reagent. The products of the Edman degradation
are a thiohydantoin heterocycle incorporating the N-terminal amino acid together with a
shortened peptide chain.
Amine functions on a side-chain as in lysine may also react with the isothiocyanate reagent.
A major advantage of the Edman procedure is that the remaining peptide chain is not further
degraded by the reaction. This means that the N-terminal analysis may be repeated several
times, thus providing the sequence of the first three to five amino acids in the chain.
A disadvantage of the procedure is that is peptides larger than 30 to 40 units do not give
It however does not give thiohydantoin products.
Secondary structure of proteins
The local conformation of the polypeptide chain or the spatial relationship of amino acid
residues that are close together in the primary sequence.
Why do proteins form secondary structures?
We know that proteins have hydrophobic cores. To bring the side chains into the core , the
main chain must also fold into the interior. The main chain is however highly polar and
therefore hydrophilic, with one hydrogen bond donor, N-H, and one hydrogen bond acceptor
C=O, for each peptide unit. In a hydrophobic environment, these main chain polar groups
must be neutralized by the formation of hydrogen bonds. This problem is solved by the
formation of secondary structures.
Proteins adapt secondary structures to stabilize their structures to protect themselves from
other proteins which can digest them.
In proteins, the formation of secondary structures appears to result from the combination of
both the entropic effect of compaction and local energetic effects.
In globular proteins, the three basic units of secondary structure are the α helix, β strand and
Why do proteins form only these kinds of secondary structures?
This is because of local interactions. It is the specific hydrogen bonding patterns in protein
which favor the formation of α helices and β strands.
In secondary structure the main chain amides and carbonyls participate in H-bonds to each
other. This neutralizes the polar nature of the peptide bond and enables the main chain to fold
into the hydrophobic core.
The a helix
The hydrogen bonds occur between the backbone
carbonyl oxygen (C=O acceptor) of one residue (i) and the
amide hydrogen (N-H donor) of residue (i+4) ahead in the
polypeptide chain (i+4 i).
In 1954 Pauling was awarded his first Nobel Prize "for his
research into the nature of the chemical bond and its
application to the elucidation of the structure of complex
As the first four N-H groups and the last four C=O groups are normally not involved in the
hydrogen bonds, the ends of α helices are polar and are almost always at the surface of protein
The regular α helix has 3.6 residues per turn with each
residue offset from the preceding residue by 0.15 nm
(translation per residue distance) along the helix axis.
Thus, the pitch (the vertical distance between one
consecutive turn of the helix) of the α helix is 3.6 x 0.15
= 0.54 nm.
The width of the helix is 5 Å.
The hydrogen bonds are 0.286 nm long from oxygen to
nitrogen atoms, linear and lie parallel to the helical axis.
What about turn angle of each amino acid along the
Each amino acid participates in 2 H-bonds. Thus all the
main chain C=O and N-H participate in H-bonds.
In globular proteins, α helices vary in length, ranging
from four to over forty amino acids. The average length
is around ten residues.
What will be the average length of an α helix?
Proline does not form helical structure for the obvious
reason that the absence of an amide proton (NH)
precludes hydrogen bonding whilst the side chain
covalently bonded to the N atom restricts backbone
Helical conformations of peptide chains may also be
described by a two number term, nm, where n is the
number of amino acid units per turn and m is the number
of atoms in the smallest ring defined by the hydrogen
bond. Thus, a helix is 3.613-helix denoting a hydrogen
bond between every carbonyl oxygen and the alpha-
amino nitrogen of the fourth residue toward the C-
terminus, and 13 atoms being involved in the ring formed
by the hydrogen bond.
Assignment No. 2:
Show that left-handed helices are not permissible?
The 310 helix
The designation 310 refers to the number of backbone
atoms located between the donor and acceptor atoms (10)
and the fact that there are three residues per turn.
The hydrogen bonds in 310 helix are formed between
residues (i, i+3) in contrast to (i, i+4) bonds in regular α
The angles are: Φ = -49 and Ψ = -26 . The rise for one
residue is 2.0 Å.
What about turn angle of each amino acid and the pitch
along the helix?
The 310 helices do occur, but are not very long, they are
sometimes found at the end of an α helix.
The π helix
Whilst 310 helix is a narrower structure than the α helix, a third possibility is a more loosely
coiled helix with hydrogen bonds formed between the C=O and N-H groups separated by five
residues (i, i+5). There are 4.4 residues per turn and 16 atoms in the H-bonded ring.
The angles are: Φ = -57 and Ψ = -70 . The π helix is
more compact, more compressed than the α helix. The
H-bonds in the π helix are not straight and side chains
The larger radius of the π helix means that backbone
atoms do not make van der Waals contact across the
helix axis leading to the formation of a hole down the
middle of the helix that is too small for solvent
If the pitch of the helix is 5.06 Å per turn, what will be the rise for one residue?
Dipole moment has directionality
The magnitude of the dipole moment is about 0.5-0.7 unit charge at each end of the helix.
These charges attract ligands of opposite charge such as phosphate ions.
Why does C-terminus generally not attract positively charged ligands?
Class Number of folds Number of super families Number of families
All α proteins 284 507 871
1. Combined pattern of pitch and hydrogen bonding.
2. In terms of repeating φ and ψ torsion angles.
How do we find the segments of a given protein structure that belong to the α helix?
The β strand
β strand is a helical arrangement although an extremely elongated form with two residues per
turn. The side chains are oriented alternating up and down.
This leads to a pitch or repeat distance of ~0.7 nm in a regular β strand.
If the pitch of an anti-parallel and parallel β strands are (i.e. 6.84 Å per turn) and (i.e.
6.4 Å per turn), what will be the length between two adjacent Cα atoms?
β strands are stable in the sheet form where adjacent strands can align in parallel or anti-
parallel arrangements with the orientation established by determining the direction of the
polypeptide chain from the N- to the C-terminal.
β strands are quite extended but normally don't reach the 180 for the angles completely, thus
are not flat, but pleated. Average values for the angles are: Φ = -139 and Ψ = 135 in anti-
parallel β sheets and Φ = -119 and Ψ = 113 in parallel β sheets.
Class Number of folds Number of super families Number of families
All β proteins 174 354 742
More commonly found in protein structures are four residues turns (β turns). Loops which
connect 2 adjacent anti-parallel β strands are called hairpin loops.
A γ turn contains three residues and frequently links adjacent strands of anti-parallel β sheet.
Analysis of the amino acid composition of turns reveals that bulky or branched side chains
occur rarely. Instead residues with small side chain such as Gly, Asp, Asn, Ser, Cys and Pro
are frequently found.
In some proteins, the proportion of residues found in turns can exceed 30 percent and in view
of this high value it is unlikely that turns represent random structures (Intrinsically
Loop regions are found at the surface of the protein molecules mostly because the main chain
groups of these loops do not form hydrogen bonds to each other and hence are exposed to the
solvent to form hydrogen bonds to water molecules.
Some amino acids prefer to be in α helices.
However, amino acids are also dependent on the position of the α helix in the protein or the
position of the α helix depends on the amino acids it contains.
• An α helix buried in the hydrophobic core (of citrate synthase) contains only uncharged
and mostly nonpolar amino acids.
• A partially exposed α helix (of alcohol dehydrogenase) contains polar or charged residues
at the exposed side and non-polar ones at the other side.
• A fully exposed α helix (of troponin C) contains a lot of charged residues.
α helix: Glu, Ala, Leu, Met, Gln, Lys, Arg, His
β strand: Val, Ile, Tyr, Cys, Trp, Phe, Thr
Reverse turn: Gly, Asn, Pro, Ser, Asp
Conformational Preferences of Amino Acids
Propensity = (# of a particular a.a. in a particular secondary structure / # of a particular a.a. in
the whole protein) / (# of all a.a. in the particular structure / # of all a.a. in whole
For example, in a protein if there are 30% of all amino acids in α helices and 50% of Glu in α
helices, then in this protein the propensity value for Glu being in α helix is 50/30 = 1.66.)
How do you calculate the propensity of an a.a. being in a secondary structure?
The helical propensity of amino acid residues substituted into Alanine polymers
Residue Helix propensity, ΔG (kJ mol-1) Residue Helix propensity, ΔG (kJ mol-1)
Ala 0 Ile 0.41
Arg 0.21 Leu 0.21
Asn 0.65 Lys 0.26
Asp0 0.43 Met 0.24
Asp- 0.69 Phe 0.54
Cys 0.68 Pro 3.16
Gln 0.39 Ser 0.50
Glu0 0.16 Thr 0.66
Glu- 0.40 Tyr 0.53
Gly 1.00 Trp 0.49
His0 0.56 Val 0.61
All residues form helices with less propensity than poly-Ala hence the positive values for ΔG.
In globular proteins, over 30% of all residues are found in helices.
The role of a particular amino acid in the conformational change of a protein is studied by
considering homopolymers (e.g. poly-Ala) and mutating single amino acids into another and
measuring the stability, solubility or secondary structure properties of the mutant compared to
the wild type.
How do you find the preference of an amino acid to be in a particular secondary
structure using ab-initio methods?
The Ramachandran Plot
Dihedral angles, translation distances and number of
residues per turn for regular secondary structure
conformations. In poly(Pro) I ω is 0 whilst in poly(Pro) II, ω
Dihedral angle ( ) Residues/
α helix -57 -47 3.6 0.150
310 helix -49 -26 3.0 0.200
π helix -57 -70 4.4 0.115
-119 +113 2.0 0.320
Parallel β strand -139 +135 2.0 0.340
Poly(Pro) I -83 +158 3.3 0.190
Poly(Pro) II -78 +149 3.0 0.312
Poly(Gly) II -80 +150 - -
Assignment No. 3:
Submit the following details about the protein on which you are working or might work.
1. Protein name
2. Protein’s primary structure
3. Protein’s primary structure a.a. composition
4. Percentage of the residues found in the secondary structure
5. Secondary structure composition
6. Secondary structure propensity of all 20 amino acids for your protein.