SlideShare a Scribd company logo
Protein Sequencing
Introduction to Protein
Sequencing
What is Protein
• Any of a class of nitrogenous organic
compounds which have large molecules
composed of one or more long chains of
amino acids and are an essential part of all
living organisms, especially as structural
components of body tissues such as muscle,
hair, etc., and as enzymes and antibodies."a
protein found in wheat"
What is sequence
• a particular order in which related things
follow each other.
• a set of related events, movements, or items
that follow each other in a particular order.
Protein Sequencing
• Protein sequencing is the practical process of
determining the amino acid sequence of all or
part of a protein or peptide. This may serve to
identify the protein or characterize its post-
translational modifications.
• Typically, partial sequencing of a protein
provides sufficient information (one or more
sequence tags) to identify it with reference to
databases of protein sequences derived from
the conceptual translation of genes.
• The two major direct methods of protein
sequencing are mass spectrometry and Edman
degradation using a protein
sequenator (sequencer). Mass spectrometry
methods are now the most widely used for
protein sequencing and identification but
Edman degradation remains a valuable tool
for characterizing a protein's N-terminus.
Why we do Protein sequencing??
• Determining amino acid composition.
• It is often desirable to know the unordered
amino acid composition of a protein prior to
attempting to find the ordered sequence, as
this knowledge can be used to facilitate the
discovery of errors in the sequencing process
or to distinguish between ambiguous results
• . Knowledge of the frequency of certain amino
acids may also be used to choose
which protease to use for digestion of the
protein. The disincorporation of low levels of
non-standard amino acids (e.g. norleucine) into
proteins may also be determined.
• A generalized method often referred to as amino
acid analysis for determining amino acid
frequency is as follows:
• Hydrolyse a known quantity of protein into its
constituent amino acids.
• Separate and quantify the amino acids in some
way.
• Hydrolysis
• Hydrolysis is done by heating a sample of the
protein in 6 M hydrochloric acid to 100–110 °C
for 24 hours or longer. Proteins with many
bulky hydrophobic groups may require longer
heating periods. However, these conditions
are so vigorous that some amino acids
(serine, threonine, tyrosine, tryptophan, gluta
mine, and cysteine) are degraded. To
circumvent this problem,
• Biochemistry Online suggests heating separate
samples for different times, analysing each
resulting solution, and extrapolating back to zero
hydrolysis time. Rastall suggests a variety of
reagents to prevent or reduce degradation, such
as thiol reagents or phenol to protect tryptophan
and tyrosine from attack by chlorine, and pre-
oxidising cysteine. He also suggests measuring
the quantity of ammonia evolved to determine
the extent of amide hydrolysis.
• Separation and quantitation
• The amino acids can be separated by ion-
exchange chromatography then derivatized to
facilitate their detection. More commonly, the
amino acids are derivatized then resolved
by reversed phase HPLC.
• An example of the ion-exchange chromatography
is given by the NTRC using sulfonated polystyrene
as a matrix, adding the amino acids in acid
solution and passing a buffer of steadily
increasing pH through the column. Amino acids
are eluted when the pH reaches their
respective isoelectric points. Once the amino
acids have been separated, their respective
quantities are determined by adding a reagent
that will form a coloured derivative.
History Of Protein Sequencing
• The advent of protein sequencing can be
traced to two almost parallel discoveries by
Frederick Sanger and Pehr Edman.
• In 1950, Pehr Edman published a paper
demonstrating a label-cleavage method for
protein sequencing which was later termed
“Edman degradation”.
• Pehr Edman began his work in the Northrop-
Kunitz laboratory at the Princeton branch of the
Rockefeller Institute of Medical Research in 1947
• where he attempted to find a method to decode
the amino acid sequence of a protein using
chemicals; specifically he had early success with
• fluorodinitrobenzene (FDNB) and
phenylisothiocyanate (PITC).
• Throughout his year at Princeton, Edman was able to
conduct enough experiments to understand that it was
feasible to use reagents like FDNB and PITC to determine
amino acid sequence.
• Edman returned to Sweden in 1947 and after two more
years of work he was able to publish his paper that would
describe the first successful method to sequence proteins
[1]
• This ground breaking paper described a method to
determine the amino acid sequence of a protein and would
come to be known as the Edman Degradation.
F.SANGER
• Around the same time Fred Sanger was
developing his own labeling and separation
method which led to the sequencing of
insulin.
• For this work, Sanger was awarded the 1958
Nobel Prize for Chemistry.
Plus and minus in the 1970’s
• Fast-forward once again to the 1970’s and we find Fred
Sanger still at the forefront of nucleic acid sequencing.
• In 1975 whilst at the Laboratory of Molecular Biology in
Cambridge, Fred Sanger developed the “plus and minus”
method for DNA sequencing (Sanger and Coulson, 1975).
• Again there was competition in the field with Maxam and
Gilbert working on degradation sequencing (Maxam and
Glibert, 1977) however, their method was ultimately to
falter due to the ease and quality of the Sanger method.
plus and minus method
• A primer is extended by a polymerase to generate a population of
newly synthesized deoxyribonucleotides of assorted lengths; the
unused dNTPs are removed, and polymerization continues in four
pairs of plus and minus reaction mixtures; the minus mixtures have
three NTPs and the plus mixtures have only one.
• After a second polymerization, the mixtures are fractionated by gel
electrophoresis, and each plus and minus pair is compared to
indicate the length of the new polydeoxyribonucleotide (by the
mobilities of the bands) and the position at which polymerization
had terminated as a result of the absence of the missing dNTP
• Five years earlier, Frederick Sanger had demonstrated a
method to determine the amino acid residue located
on the N-terminal end of a polypeptide chain by using
the reagent fluorodinitrobenzene.
• While it was thought, that at most, this method could
only provide the sequences found on the N-terminal,
• Sanger was able to take the method one step further.
• By using several proteolytic enzymes, partial
hydrolysis and early version of chromatography,
Sanger was able to cleave the protein into
fragments and piece together the residues like a
jigsaw puzzle.
• It wasn’t until 1955 that Sanger was able to
present the complete sequence of insulin which
led to him being awarded a Nobel Prize in
Chemistry in 1958.
Other scientist
• Emile Zuckerkandl and Linus Pauling, whose
work in the mid1960s advanced the use of
nucleotide and protein sequences to explore
evolution
• In the 1970s,Carl Woese used ribosomal RNA
sequences to define archaebacteria as a group
of living organisms distinct from other
bacteria and eukaryotes
Methods Of Protein
Sequencing
Protein sequencing
• Technique to find out the sequence of amino
acids in a protein
Sequencing methods
1-N-terminal sequencing
(Edman degradation)
2-C-terminal sequencing
3-Prediction from DNA sequence
Edman degradation
N-terminal sequencing
STEPS
• Protein purification
• Protein denaturation
• Protein digestion
• N-terminal labeling
• Separation of labeled amino acid by
chromatography
• Detection through mass spectrometry
• Data analysis
Protein isolation(purification)
• 1-SDS-PAGE
(sodium dodecyl sulfate-poly
acryl amide gel)
2-Two dimensional gels
Protein of interest is
immobilized by being
absorbed onto a chemically
modified glass or by electro
blotting onto a porous
polyvinylidene fluoride
(PVDF) membrane.
by heating a sample of the
protein in 6 Molar HCL up
to 100-110 degrees Celsius
for 24 hours or longer
It may degrade some amino
acids
To avoid this
Thiol reagents or phenol are
used
- Performic acid for intra
chain or inter chain S-S
bonds
Protein hydrolysis(denaturation)
Protein digestion
• Use Endoproteinase Lys-C, CNBr, Pepsin or
trypsin to digest proteins into a population of
peptides
• Other enzymes include Glu-C and
chymotrypsin
• Add enzyme at 1:20 enzyme: protein ratio
• incubate at room temperature for 6-9hrs
• For better results use mixture of enzymes
N-terminal labeling
• The Edman reagent, phenylisothiocyanate (PTC), is
added to the adsorbed peptide, together with a mildly
basic buffer solution of 12% trimethylamine
• This reacts with the amine group of the N-terminal
amino acid
• The terminal amino acid can then be selectively
detached by the addition of anhydrous acid
• The derivative then isomerises to give a substituted
phenylthiohydantoin which can be washed off and
identified by chromatography, and the cycle can be
repeated
CHROMATOGRAPHY
• Chromatography is a
technique in which
molecules are separated
based on volatility and bond
characteristics when
subjected to a carrier
• Derivatives of amino acid
can be separated by
• 1-HPLC
• 2-Gas chromatography
• In gas chromatography (GC),
the mobile phase is an inert
gas such as helium
MASS SPECTROMETERY
• Mass spectrometry (MS) is an analytical
technique that measures the mass-to-charge
ratio of charged particles
• The MS principle consists of ionizing chemical
compounds to generate charged molecules or
molecule fragments and measuring their
mass-to-charge ratios
• Separated amino acid derivatives are analyzed
by mass spectrometer
MS procedure
• A sample is loaded onto the MS instrument, and
undergoes vaporization
• The components of the sample are ionized by one of a
variety of methods (e.g., by impacting them with an
electron beam), which results in the formation of
charged particles (ions)
• The ions are separated according to their mass-to-
charge ratio in an analyzer by electromagnetic fields
• The ions are detected, usually by a quantitative
method
• The ion signal is processed into mass spectra
Mass spectrometer
• first strategy for
identifying an unknown
compound is to compare
its experimental mass
spectrum against a library
of mass spectra
• Standard solutions of
amino acids are also used
and the resulting pattern
is compared with
standard spectrum.
MS data analysis
Limitations of Edman degradation
• Need Pure Samples of Peptides
• Requires 40-60 min / Amino Acid
• Can’t Analyze N-Terminally Modified Peptides
• Advantages
• Most Reliable Sequencing Technique
C terminal sequence
Definition:
The C-terminus (also known as
the carboxyl-terminus, carboxyl-terminus, C-
terminal tail, C-terminal end, or COOH-
terminus) is the end of an amino acid chain
(protein or polypeptide), terminated by a
free carboxyl group (-COOH).
C terminal
C-terminal retention signals
• Proteins are naturally synthesized starting from
the N-terminus and ending at the C-terminus.
• While the N-terminus of a protein often
contains targeting signals, the C-terminus can
contain retention signals for protein sorting.
• The most common ER retention signal is the
amino acid sequence -KDEL (Lys-Asp-Glu-Leu)
or -HDEL (His-Asp-Glu-Leu) at the C-terminus.
This keeps the protein in the endoplasmic
reticulum and prevents it from entering
the secretory pathway.
C-terminal modifications
• The C-terminus of proteins can be modified post
translationally, most commonly by the addition of
a lipid anchor to the C-terminus that allows the
protein to be inserted into a membrane without
having a trans membrane domain.
• Another form of C-terminal modification is the
addition of a
phosphoglycan, glycosylphosphatidylinositol (GPI),
as a membrane anchor. The GPI anchor is attached
to the C-terminus after proteolytic cleavage of a C-
terminal propeptied. The most prominent example
for this type of modification is the prion protein.
C-terminal domain:
• The C-terminal domain of some proteins has
specialized functions. In humans, the CTD
of RNA polymerase II typically consists of up to
52 repeats of the sequence Tyr-Ser-Pro-Thr-
Ser-Pro-Ser.[1] This allows other proteins to
bind to the C-terminal domain of RNA
polymerase in order to activate polymerase
activity. These domains then involved in
the initiation of DNA transcription.
C terminal sequencing technique
• Top Down sequencing by MALDI ISD is used to
sequence the c terminal of amino acid chain.
• MALDI MS: “matrix-assisted laser
desorption/ionization mass spectrometry”
through which the c-terminal can be analyzed.
• This method is used when the N-terminal is
blocked and there is only C-terminal available.
• The technique can fragment and sequence
both the N- and C-terminal in the same mass
spectrum.
• Admen degradation is only used for N-
terminal sequencing.
• The most common method is to add
carboxy peptidases to a solution of the
protein.
• Take a sample at regular at regular intervals
and determine the terminal amino acid by
analyzing a plot amino acid concentration
and time.
• A peptide mixture is generated by cleavage of the
protein with cyanogen bromide and is incubated
with carboxy peptidase Y.
• The enzyme is only able to act on the C-terminal
fragment, because this is the only peptide without a
homoserine lactone residue at its C terminus.
• The resulting fragments, forming a peptide ladder,
are analyzed by matrix-assisted laser
desorption/ionization mass spectrometry (MALDI-
MS).
• The entire protocol, including the CNBr cleavage,
takes 21 h and can be applied to proteins purified
either by SDS-PAGE or by 2D PAGE or in solution.
Use of peptidase:
Top down sequencing:
• Top-down proteomics is a method
of protein identification that uses an ion trapping mass
spectrometer to store an isolated protein ion for mass
measurement and tandem mass spectrometry analysis.
• Top-down proteomics is capable of identifying and
quantitating unique proteoforms through the analysis
of intact protein.
• Top-down proteomics interrogates protein structure
through measurement of an intact mass followed by
direct ion dissociation in the gas phase.
• Fragmentation for tandem mass spectrometry
is accomplished by electron-capture
dissociation or electron-transfer dissociation.
Effective fractionation is critical for sample
handling before mass-spectrometry-based
proteomics. Proteome analysis routinely
involves digesting intact proteins followed by
inferred protein identification using mass
spectrometry.
• The main advantages of the top-down
approach include the ability to detect
degradation products, sequence variants, and
combinations of post-translational
modifications.
MALDI MS top down sequencing:
• 0.5-1 ml salt-free protein solution placed on
a MALDI-plate, covered with the MALDI Matrix
solution, is analyzed in the in-source decay mode
on an UltrafleXtreme mass spectrometer. The
generated mass spectrum (a complex mass
spectrum, exhibiting mainly c- and y ions) is
further analyzed with the software Bio. tools or is
processed via a Mascot search.
• A .pdf result file with sequence coverage of the
target sequence would be the result output.
UltrafleXtreme mass spectrometer
• The UTX is used for a
variety of MALDI
applications, including
mass spectrometry imaging
(MSI), protein
identification, peptide
fingerprinting, and
structure identification for
a wide spectrum of
biomoles (including lipids,
polymers, glycans).
MALDI ISD
Peptide Sequencing by Mass
Spectrometry
Introduction
• MS/MS plays important role in protein identification (fast
and sensitive)
• Derivation of peptide sequence an important task in
proteomics
• Derivation without help from a protein database (“de novo
sequencing”), especially important in identification of
unknown protein
Basic lab experimental steps
1. Proteins digested w/ an enzyme to produce peptides
2. Peptides charged (ionized) and separated according
to their different m/z ratios
3. Each peptide fragmented into ions and m/z values of
fragment ions are measured
• Steps 2 and 3 performed within a tandem mass
spectrometer.
Mass spectrum
• Proteins consist of 20 different types of a. a. with
different masses (except for one pair Leu and Ile)
• Different peptides produce different spectra
• Use the spectrum of a peptide to determine its
sequence
Objectives
• Describe the steps of a typical peptide analysis
by MS (proteomic experiment)
• Explain peptide ionization, fragmentation,
identification
Why are peptides, and not proteins,
sequenced?
• Solubility under the same conditions
• Sensitivity of MS much higher for peptides
• MS efficiency
MS Peptide Experiment
Choice of Enzyme
Cleaving
agent/Proteases
Specificity
A. HIGHLY SPECIFIC
Trypsin Arg-X, Lys-X
Endoproteinase Glu-C Glu-X
Endoproteinase Lys-C Lys-X
Endoproteinase Arg-C Arg-X
Endoproteinase Asp-N X-Asp
B. NONSPECIFIC
Chymotrypsin Phe-X, Tyr-X, Trp-X, Leu-X
Thermolysin X-Phe, X-Leu, X-Ile, X-Met, X-Val, X-Ala
ESI
Liquid flow
Q or Ion Trap
analyzer
ESI is a solution technique that gives a continuous stream of ions,
best for quadrupoles, ion traps, etc.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++ ++++++ ++++++ ++++++ +++
+++ ++++++ ++++++ ++++++ +++ +
+
+
+
+
++
+
+
+
+
++
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
++
+
+
+
+
++
+
+
+
+
+
MALDI
3 nS LASER PULSE
Sample (solid) on target at
high voltage/ high vacuum
MALDI is a solid-state technique that gives ions in pulses,
best suited to time-of-flight MS.
TOF analyzer
Atmosphere Low vac. High vac.
High vacuum
….MALDI or Electrospray ?
MALDI is limited to solid state, ESI to liquid
ESI is better for the analysis of complex mixture as it is directly interfaced to a
separation techniques (i.e. HPLC or CE)
MALDI is more “flexible” (MW from 200 to 400,000 Da)
Q2
Collision Cell
Q3
I
II
III
Correlative
sequence database
searching
Theoretical Acquired
Protein identification
Peptides
1D, 2D, 3D peptide separation
200 400 600 80010001200
m/z
200 400 600 80010001200
m/z
200 400 600 80010001200
m/z
12 14 16
Time (min)
Tandem mass spectrum
Protein Identification Strategy
Q1
*
*
Protein
mixture
10-Mar-200514:28:10
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600
m/z0
100
%
CAL050310A 71 (1.353) Cm (1:96) TOF MSMS 785.60ES+
2.94e3684.17
333.15
187.07
175.12
169.06
246.13
286.11
480.16
382.11
480.08
497.09
627.17
612.08
498.09
813.16
785.62
685.18
740.09
1285.141056.17942.16
814.17
924.16
943.17
1039.13
1038.17
1171.14
1057.18
1058.17
1172.15
1173.16
1286.14
1287.13
1296.10
10-Mar-200514:28:10
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600
m/z0
100
%
CAL050310A 71 (1.353) Cm (1:96) TOF MSMS 785.60ES+
2.94e3684.17
333.15
187.07
175.12
169.06
246.13
286.11
480.16
382.11
480.08
497.09
627.17
612.08
498.09
813.16
785.62
685.18
740.09
1285.141056.17942.16
814.17
924.16
943.17
1039.13
1038.17
1171.14
1057.18
1058.17
1172.15
1173.16
1286.14
1287.13
1296.10
10-Mar-200514:28:10
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600
m/z0
100
%
CAL050310A 71 (1.353) Cm (1:96) TOF MSMS 785.60ES+
2.94e3684.17
333.15
187.07
175.12
169.06
246.13
286.11
480.16
382.11
480.08
497.09
627.17
612.08
498.09
813.16
785.62
685.18
740.09
1285.141056.17942.16
814.17
924.16
943.17
1039.13
1038.17
1171.14
1057.18
1058.17
1172.15
1173.16
1286.14
1287.13
1296.10
Breaking Protein into Peptides and Peptides into
Fragment Ions
• Proteases, e.g. trypsin, break protein into
peptides
• MS/MS breaks the peptides down into fragment
ions and measures the mass of each piece
• MS measure m/z ratio of an ion
Peptide fragmentation
Amino acids differ
in their side chains
Predominant
fragmentation
Weakest bonds
Tendency of peptides to fragment at Asp (D)
Mass Spectrometry in Proteomics
Ruedi Aebersold* and David R. Goodlett
269 Chem. Rev. 2001, 101, 269-295
C-terminal side of Asp
Protein Identification by MS
Artificial
spectra built
Artificially
trypsinated
Database of
sequences
(i.e. SwissProt)
Spot removed
from gel
Fragmented
using trypsin
Spectrum of
fragments
generated
MATCH
Library
Conclusions
• MS of peptides enables high throughput
identification and characterization of proteins in
biological systems
• “de novo sequencing” can be used to identify
unknown proteins not found in protein databases
Prediction From DNA Sequence
• The rapid increase of publicly available sequences and protein
structures means that an increasing amount of information can be
obtained for any protein sequence through its relatedness to
others.
• If a set of homologous proteins can be found and aligned, the
information content at each position in the alignment profile is far
greater than in any single member of the family, and any structural
or functional prediction algorithm should utilize this collective
information. Profile information of this type is extremely sensitive
to the quality of the multiple alignment, and distant homologues
should only be included in the alignment if they can be aligned with
confidence.
Figure 17.3b-3
RNA PROCESSING
Nuclear
envelope
DNA
Pre-mRNA
mRNA
TRANSCRIPTION
TRANSLATION Ribosome
Polypeptide
DNA
template
strand
TRANSCRIPTION
mRNA
TRANSLATION
Protein
Amino acid
Codon
Trp Phe Gly
5
5
Ser
U U U U U
3
3
53
G
G
G G C C
T
C
A
A
AAAAA
T T T T
T
G
G G G
C C C G G
DNA
molecule
Gene 1
Gene 2
Gene 3
C C
• Protein structure prediction is the inference
of the three-dimensional structure of
a protein from its amino acid sequence—that
is, the prediction of its folding and
its secondary and tertiary structure from
its primary structure.
• Structure prediction is fundamentally
different from the inverse problem of protein
design.
• Protein structure prediction is one of the most
important goals pursued
by bioinformatics and theoretical chemistry; it is
highly important in medicine (for example, in drug
design) and biotechnology (for example, in the
design of novel enzymes).
• Protein structure and terminology
• Proteins are chains of amino acids joined together by peptide
bonds. Many conformations of this chain are possible due to the
rotation of the chain about each Cα atom. It is these
conformational changes that are responsible for differences in the
three dimensional structure of proteins.
• Each amino acid in the chain is polar, i.e. it has separated positive
and negative charged regions with a free C=O group, which can act
as hydrogen bond acceptor and an NH group, which can act as
hydrogen bond donor. These groups can therefore interact in the
protein structure.
• The 20 amino acids can be classified according to the chemistry of
the side chain which also plays an important structural
role. Glycine takes on a special position, as it has the smallest side
chain, only one Hydrogen atom, and therefore can increase the
local flexibility in the protein structure. Cysteine on the other hand
can react with another cysteine residue and thereby form a cross
link stabilizing the whole structure.
• The protein structure can be considered as a sequence of secondary
structure elements, such as α helices and β sheets, which together
constitute the overall three-dimensional configuration of the
protein chain. In these secondary structures regular patterns of H
bonds are formed between neighboring amino acids, and the amino
acids have similar Φ and Ψ angles.
• Bond angles for ψ and ω
• The formation of these structures neutralizes the polar groups on
each amino acid. The secondary structures are tightly packed in the
protein core in a hydrophobic environment. Each amino acid side
group has a limited volume to occupy and a limited number of
possible interactions with other nearby side chains, a situation that
must be taken into account in molecular modeling and alignments. [
• α Helix
• The α helix is the most abundant type of secondary
structure in proteins. The α helix has 3.6 amino acids per
turn with an H bond formed between every fourth residue;
the average length is 10 amino acids (3 turns) or 10 Å but
varies from 5 to 40 (1.5 to 11 turns).
• The alignment of the H bonds creates a dipole moment for
the helix with a resulting partial positive charge at the
amino end of the helix. Because this region has free
NH2 groups, it will interact with negatively charged groups
such as phosphates.
• The most common location of α helices is at the surface of
protein cores, where they provide an interface with the
aqueous environment. The inner-facing side of the helix
tends to have longer helices, forming a bend.
• hydrophobic amino acids and the outer-facing
side hydrophilic amino acids.
• Thus, every third of four amino acids along the
chain will tend to be hydrophobic, a pattern
that can be quite readily detected. In the
leucine zipper motif, a repeating pattern of
leucines on the facing sides of two adjacent
helices is highly predictive of the motif.
• β sheet
• β sheets are formed by H bonds between an
average of 5–10 consecutive amino acids in one
portion of the chain with another 5–10 farther
down the chain.
• The interacting regions may be adjacent, with a
short loop in between, or far apart, with other
structures in between. Every chain may run in the
same direction to form a parallel sheet, every
other chain may run in the reverse chemical
direction to form an anti parallel sheet, or the
chains may be parallel and anti parallel to form a
mixed sheet.
• The pattern of H bonding is different in the parallel and
anti parallel configurations. Each amino acid in the
interior strands of the sheet forms two H bonds with
neighboring amino acids, whereas each amino acid on
the outside strands forms only one bond with an
interior strand
• . Looking across the sheet at right angles to the
strands, more distant strands are rotated slightly
counterclockwise to form a left-handed twist. The Cα
atoms alternate above and below the sheet in a
pleated structure, and the R side groups of the amino
acids alternate above and below the pleats.
• Loop
• Loops are regions of a protein chain that are
• (1) between α helices and β sheets,
• (2) of various lengths and three-dimensional
configurations, and
• (3) on the surface of the structure.
• Hairpin loops that represent a complete turn
in the polypeptide chain joining two
antiparallel β strands may be as short as two
amino acids in length.
• Loops interact with the surrounding aqueous
environment and other proteins. Because amino
acids in loops are not constrained by space and
environment as are amino acids in the core region,
and do not have an effect on the arrangement of
secondary structures in the core, more substitutions,
insertions, and deletions may occur. Thus, in a
sequence alignment, the presence of these features
may be an indication of a loop.
• The positions of introns in genomic DNA sometimes
correspond to the locations of loops in the encoded protein[.
Loops also tend to have charged and polar amino acids and
are frequently a component of active sites. A detailed
examination of loop structures has shown that they fall into
distinct families.
• Coils
• A region of secondary structure that is not a α
helix, a β sheet, or a recognizable turn is
commonly referred to as a coil.
Applications of
Protein Sequencing
• In Functional genomics:
functional genomics is a field of molecular
biology that attempts to make use of the vast
wealth of data produced
by genomic and transcriptomic projects (such
as genome sequencing projects and RNA
sequencing) to describe gene (and protein)
functions and interactions. Unlike genomics,
functional genomics focuses on the dynamic
aspects such as
• gene transcription, translation, regulation of
gene expression and protein–protein
interactions, as opposed to the static aspects
of the genomic information such as DNA
sequence or structures.
• The goal of functional genomics is to
understand the relationship between an
organism's genome and its phenotype. The
term functional genomics is often used
broadly to refer to the many possible
approaches to understanding the properties
and function of the entirety of an organism's
genes and gene products.
• The promise of functional genomics is to expand and
synthesize genomic and proteomic knowledge into an
understanding of the dynamic properties of an
organism at cellular and/or organismal levels. This
would provide a more complete picture of how
biological function arises from the information
encoded in an organism's genome.
• The possibility of understanding how a particular
mutation leads to a given phenotype has important
implications for human genetic diseases, as answering
these questions could point scientists in the direction
of a treatment or cure.
Prediction of protein function from protein
sequence and structure
The sequence of a genome contains the plans of the possible life
of an organism, but implementation of genetic information
depends on the functions of the proteins and nucleic acids that it
encodes. Many individual proteins of known sequence and
structure present challenges to the understanding of their
function.
• In particular, a number of genes responsible for
diseases have been identified but their specific
functions are unknown. Whole-genome sequencing
projects are a major source of proteins of unknown
function. Annotation of a genome involves assignment
of functions to gene products, in most cases on the
basis of amino-acid sequence alone.
3D structure can aid the assignment of function, motivating the
challenge of structural genomics projects to make structural
information available for novel uncharacterized proteins.
Structure-based identification of homologues often succeeds
where sequence-alone-based methods fail, because in many
cases evolution retains the folding pattern long after sequence
similarity becomes undetectable.
• Nevertheless, prediction of protein function from sequence and
structure is a difficult problem, because homologous proteins
often have different functions. Many methods of function
prediction rely on identifying similarity in sequence and/or
structure between a protein of unknown function and one or
more well-understood proteins. Alternative methods include
inferring conservation patterns in members of a functionally
uncharacterized family for which many sequences and
structures are known.
In Proteomics
Proteomics is the large-scale study of proteomes. A proteome is
a set of proteins produced in an organism, system, or biological
context.
The proteome is not constant; it differs from cell to cell and
changes over time. To some degree, the proteome reflects the
underlying transcriptome. However, protein activity (often
assessed by the reaction rate of the processes in which the
protein is involved) is also modulated by many factors in
addition to the expression level of the relevant gene.
Protein sequencing denotes the process of finding the amino acid
sequence, or primary structure of a protein. Sequencing plays a
very vital role in Proteomics as the information obtained can be
used to deduce function, structure, and location which in turn
aids in identifying new or novel proteins as well as understanding
of cellular processes. Better understanding of these processes
allows for creation of drugs that target specific metabolic
pathways among other things.
In Bioinformatics
What is bioinformatics?
In recent years, molecular biology has witnessed an
information revolution as a result of the development of
rapid DNA sequencing techniques and the
corresponding progress in computer-based
technologies, which are allowing us to cope with this
information deluge in increasingly efficient ways. The
term that was coined to encompass computer
applications in biological sciences is bioinformatics.
The term bioinformatics is now used to mean
rather different things, from artificial
intelligence and robotics to genome analysis.
The term was originally applied to the
computational manipulation and analysis of
biological sequence data (DNA and/or protein),
but now tends also to be used to embrace the
manipulation and analysis of 3D structural data.
Identifying protein-coding genes in
genomic sequences
The vast majority of the biology of a newly sequenced genome is
inferred from the set of encoded proteins. Predicting this set is
therefore invariably the first step after the completion of the
genome DNA sequence.
The genome sequence is an organism's blueprint: the set of
instructions dictating its biological traits. The unfolding of these
instructions is initiated by the transcription of the DNA into RNA
sequences. According to the standard model, the majority of RNA
sequences originate from protein-coding genes; that is, they are
processed into messenger RNAs (mRNAs) which, after their export
to the cytosol, are translated into proteins.
•To Determine the protein folding
•Protein folding is the process by which a protein structure assumes
its functional shape or conformation.
•Protein folding is the physical process by which a protein chain
acquires its native 3-dimensional structure,
a conformation that is usually biologically functional, in an
expeditious and reproducible manner.
•It is the physical process by which a polypeptide folds into its
characteristic and functional three-dimensional
structure from random coil. Each protein exists as an unfolded
polypeptide or random coil when translated from a sequence
of mRNA to a linear chain of amino acids.
•All protein molecules are heterogeneous unbranched chains
of amino acids.
•By coiling and folding into a specific three-dimensional shape
they are able to perform their biological function.
•Proteins are formed from long chains of amino acids; they
exist in an array of different structures which often dictate
their functions. Proteins follow energetically favorable
pathways to form stable, orderly, structures; this is known as
the proteins’ native structure.
• Most proteins can only perform their various functions when
they are folded. Scientists believe that the instructions for
folding a protein are encoded in the sequence. Researchers
and scientists can easily determine the sequence of a protein,
but have not cracked the code that governs folding .
In Drugs production
What is Protein Drug
A type of drugs made of protein. These drugs usually have large molecular weight
with protein characteristics.
structure of an unusual class of proteins called beta-peptides. Eventually, these peptides
could become the basis for drugs that are cheaper to manufacture than existing protein-
based pharmaceuticals and last longer in the body. A drug's efficiency may be affected
by the degree to which it binds to the proteins within blood plasma. The
less bound a drug is, the more efficiently it can traverse cell membranes or diffuse.
Common blood proteins that drugs bind to are human serum albumin, lipoprotein,
glycoprotein, α, β‚ and γ globulins
Protein sequencing

More Related Content

What's hot

Protein sequencing
Protein sequencingProtein sequencing
Protein sequencing
Vikas K Singh
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
SHRIKANT YANKANCHI
 
Methods of enzyme isolation and purification
Methods of enzyme isolation and purificationMethods of enzyme isolation and purification
Methods of enzyme isolation and purification
Akshay Wakte
 
Protein and nucleic acid sequencing
Protein and nucleic acid sequencing Protein and nucleic acid sequencing
Protein and nucleic acid sequencing
KAUSHAL SAHU
 
Ramachandran plot
Ramachandran plotRamachandran plot
Ramachandran plot
Radhakrishna Gopala Pillai
 
Cot curve
Cot curve Cot curve
Cot curve
EmaSushan
 
Protein sequencing
Protein sequencing Protein sequencing
Protein sequencing
PrashantSharma807
 
post translational modifications of protein
post translational modifications of proteinpost translational modifications of protein
post translational modifications of protein
Anandhan Ctry
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
Mariya Raju
 
Nuclic acd sequencing by kk sahu
Nuclic acd sequencing by kk sahuNuclic acd sequencing by kk sahu
Nuclic acd sequencing by kk sahu
KAUSHAL SAHU
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
Ashwini
 
Isolation, purification and characterisation of protein
Isolation, purification and characterisation of proteinIsolation, purification and characterisation of protein
Isolation, purification and characterisation of protein
saumya pandey
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Ramya S
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
ruchibioinfo
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
Site directed mutagenesis
Site  directed mutagenesisSite  directed mutagenesis
Site directed mutagenesis
Zain Khadim
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
Aashish Patel
 
Dna binding proteins
Dna binding proteinsDna binding proteins
Dna binding proteins
Hari Sharan Makaju
 
Biological database
Biological databaseBiological database
Biological database
Iqbal college Peringammala TVM
 
C value paradox
C value paradoxC value paradox

What's hot (20)

Protein sequencing
Protein sequencingProtein sequencing
Protein sequencing
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Methods of enzyme isolation and purification
Methods of enzyme isolation and purificationMethods of enzyme isolation and purification
Methods of enzyme isolation and purification
 
Protein and nucleic acid sequencing
Protein and nucleic acid sequencing Protein and nucleic acid sequencing
Protein and nucleic acid sequencing
 
Ramachandran plot
Ramachandran plotRamachandran plot
Ramachandran plot
 
Cot curve
Cot curve Cot curve
Cot curve
 
Protein sequencing
Protein sequencing Protein sequencing
Protein sequencing
 
post translational modifications of protein
post translational modifications of proteinpost translational modifications of protein
post translational modifications of protein
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Nuclic acd sequencing by kk sahu
Nuclic acd sequencing by kk sahuNuclic acd sequencing by kk sahu
Nuclic acd sequencing by kk sahu
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Isolation, purification and characterisation of protein
Isolation, purification and characterisation of proteinIsolation, purification and characterisation of protein
Isolation, purification and characterisation of protein
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Site directed mutagenesis
Site  directed mutagenesisSite  directed mutagenesis
Site directed mutagenesis
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Dna binding proteins
Dna binding proteinsDna binding proteins
Dna binding proteins
 
Biological database
Biological databaseBiological database
Biological database
 
C value paradox
C value paradoxC value paradox
C value paradox
 

Similar to Protein sequencing

Protein sequencing. Protein expression analysis using protein microarray. Pro...
Protein sequencing. Protein expression analysis using protein microarray. Pro...Protein sequencing. Protein expression analysis using protein microarray. Pro...
Protein sequencing. Protein expression analysis using protein microarray. Pro...
Cherry
 
protein sequencing -edman degradation.pptx
protein sequencing -edman degradation.pptxprotein sequencing -edman degradation.pptx
protein sequencing -edman degradation.pptx
rajufahmed1234
 
Automated DNA sequencing ; Protein sequencing
Automated DNA sequencing ; Protein sequencingAutomated DNA sequencing ; Protein sequencing
Automated DNA sequencing ; Protein sequencing
Rima Joseph
 
Protein sequence determinatiom
Protein sequence determinatiomProtein sequence determinatiom
Protein sequence determinatiom
dravidjanardhan
 
Protein seqencing by kk sahu
Protein seqencing by kk sahuProtein seqencing by kk sahu
Protein seqencing by kk sahu
KAUSHAL SAHU
 
Proteomics
ProteomicsProteomics
Proteomics
Seshan Siva
 
03 biochemistry
03 biochemistry03 biochemistry
03 biochemistry
Hazel Joy Chong
 
Gene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptxGene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptx
TanveerAhmadRather
 
Western blot
Western blotWestern blot
Western blot
Saranya Ganesh
 
6. aa sequencing site directed application of biotechnology.ppt
6.  aa sequencing site directed application of biotechnology.ppt6.  aa sequencing site directed application of biotechnology.ppt
6. aa sequencing site directed application of biotechnology.ppt
habtamu biazin
 
Basics of Molecular Biology.ppt
Basics of Molecular Biology.pptBasics of Molecular Biology.ppt
Basics of Molecular Biology.ppt
eman badr
 
METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE
Sabahat Ali
 
TRANSLATION (Protein synthesis) presentation.pdf
TRANSLATION (Protein synthesis) presentation.pdfTRANSLATION (Protein synthesis) presentation.pdf
TRANSLATION (Protein synthesis) presentation.pdf
TakondwaMitomoni
 
subas ihc 1.pptx
subas ihc 1.pptxsubas ihc 1.pptx
subas ihc 1.pptx
Amadeus Mee
 
primary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptxprimary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptx
azizulislampatna
 
Protein Engineering Strategies
Protein Engineering StrategiesProtein Engineering Strategies
Protein Engineering Strategies
SOURIKDEY1
 
Western blotting and elisa
Western blotting and elisaWestern blotting and elisa
Western blotting and elisa
Majid KB
 
Western blotting
Western blottingWestern blotting
Western blotting
Ashfaq Ahmad
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
priya1111
 

Similar to Protein sequencing (20)

Protein sequencing. Protein expression analysis using protein microarray. Pro...
Protein sequencing. Protein expression analysis using protein microarray. Pro...Protein sequencing. Protein expression analysis using protein microarray. Pro...
Protein sequencing. Protein expression analysis using protein microarray. Pro...
 
protein sequencing -edman degradation.pptx
protein sequencing -edman degradation.pptxprotein sequencing -edman degradation.pptx
protein sequencing -edman degradation.pptx
 
Automated DNA sequencing ; Protein sequencing
Automated DNA sequencing ; Protein sequencingAutomated DNA sequencing ; Protein sequencing
Automated DNA sequencing ; Protein sequencing
 
Protein sequence determinatiom
Protein sequence determinatiomProtein sequence determinatiom
Protein sequence determinatiom
 
Protein seqencing by kk sahu
Protein seqencing by kk sahuProtein seqencing by kk sahu
Protein seqencing by kk sahu
 
Proteomics
ProteomicsProteomics
Proteomics
 
03 biochemistry
03 biochemistry03 biochemistry
03 biochemistry
 
Gene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptxGene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptx
 
Western blot
Western blotWestern blot
Western blot
 
6. aa sequencing site directed application of biotechnology.ppt
6.  aa sequencing site directed application of biotechnology.ppt6.  aa sequencing site directed application of biotechnology.ppt
6. aa sequencing site directed application of biotechnology.ppt
 
Basics of Molecular Biology.ppt
Basics of Molecular Biology.pptBasics of Molecular Biology.ppt
Basics of Molecular Biology.ppt
 
METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE
 
TRANSLATION (Protein synthesis) presentation.pdf
TRANSLATION (Protein synthesis) presentation.pdfTRANSLATION (Protein synthesis) presentation.pdf
TRANSLATION (Protein synthesis) presentation.pdf
 
subas ihc 1.pptx
subas ihc 1.pptxsubas ihc 1.pptx
subas ihc 1.pptx
 
Unit 5.ppt
Unit 5.pptUnit 5.ppt
Unit 5.ppt
 
primary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptxprimary structure of proteins by ifrah.pptx
primary structure of proteins by ifrah.pptx
 
Protein Engineering Strategies
Protein Engineering StrategiesProtein Engineering Strategies
Protein Engineering Strategies
 
Western blotting and elisa
Western blotting and elisaWestern blotting and elisa
Western blotting and elisa
 
Western blotting
Western blottingWestern blotting
Western blotting
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 

More from M Nadeem Akram

Impact of exotic Species On Local Climate
Impact of exotic Species On Local ClimateImpact of exotic Species On Local Climate
Impact of exotic Species On Local Climate
M Nadeem Akram
 
Rabbit Colour Genetics
Rabbit Colour GeneticsRabbit Colour Genetics
Rabbit Colour Genetics
M Nadeem Akram
 
Gene and Human Clonning
Gene and Human ClonningGene and Human Clonning
Gene and Human Clonning
M Nadeem Akram
 
Vegetation of Pakistan
Vegetation of PakistanVegetation of Pakistan
Vegetation of Pakistan
M Nadeem Akram
 
Ethnobotany - Relation between Plants and Human
Ethnobotany - Relation between Plants and Human Ethnobotany - Relation between Plants and Human
Ethnobotany - Relation between Plants and Human
M Nadeem Akram
 
Replication In Eukaryotes and Prokaryotes
Replication In Eukaryotes and ProkaryotesReplication In Eukaryotes and Prokaryotes
Replication In Eukaryotes and Prokaryotes
M Nadeem Akram
 
Population Structure
Population StructurePopulation Structure
Population Structure
M Nadeem Akram
 
Insectivorous Plants
Insectivorous PlantsInsectivorous Plants
Insectivorous Plants
M Nadeem Akram
 
Plant anatomy presentation
Plant anatomy presentationPlant anatomy presentation
Plant anatomy presentation
M Nadeem Akram
 
History of Ecology
History of EcologyHistory of Ecology
History of Ecology
M Nadeem Akram
 
Temperature as ecological factors
Temperature as ecological factorsTemperature as ecological factors
Temperature as ecological factors
M Nadeem Akram
 
Lycopodium
LycopodiumLycopodium
Lycopodium
M Nadeem Akram
 
Mushroom cultivation
Mushroom cultivationMushroom cultivation
Mushroom cultivation
M Nadeem Akram
 

More from M Nadeem Akram (13)

Impact of exotic Species On Local Climate
Impact of exotic Species On Local ClimateImpact of exotic Species On Local Climate
Impact of exotic Species On Local Climate
 
Rabbit Colour Genetics
Rabbit Colour GeneticsRabbit Colour Genetics
Rabbit Colour Genetics
 
Gene and Human Clonning
Gene and Human ClonningGene and Human Clonning
Gene and Human Clonning
 
Vegetation of Pakistan
Vegetation of PakistanVegetation of Pakistan
Vegetation of Pakistan
 
Ethnobotany - Relation between Plants and Human
Ethnobotany - Relation between Plants and Human Ethnobotany - Relation between Plants and Human
Ethnobotany - Relation between Plants and Human
 
Replication In Eukaryotes and Prokaryotes
Replication In Eukaryotes and ProkaryotesReplication In Eukaryotes and Prokaryotes
Replication In Eukaryotes and Prokaryotes
 
Population Structure
Population StructurePopulation Structure
Population Structure
 
Insectivorous Plants
Insectivorous PlantsInsectivorous Plants
Insectivorous Plants
 
Plant anatomy presentation
Plant anatomy presentationPlant anatomy presentation
Plant anatomy presentation
 
History of Ecology
History of EcologyHistory of Ecology
History of Ecology
 
Temperature as ecological factors
Temperature as ecological factorsTemperature as ecological factors
Temperature as ecological factors
 
Lycopodium
LycopodiumLycopodium
Lycopodium
 
Mushroom cultivation
Mushroom cultivationMushroom cultivation
Mushroom cultivation
 

Recently uploaded

Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 

Recently uploaded (20)

Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 

Protein sequencing

  • 1.
  • 4. What is Protein • Any of a class of nitrogenous organic compounds which have large molecules composed of one or more long chains of amino acids and are an essential part of all living organisms, especially as structural components of body tissues such as muscle, hair, etc., and as enzymes and antibodies."a protein found in wheat"
  • 5. What is sequence • a particular order in which related things follow each other. • a set of related events, movements, or items that follow each other in a particular order.
  • 6. Protein Sequencing • Protein sequencing is the practical process of determining the amino acid sequence of all or part of a protein or peptide. This may serve to identify the protein or characterize its post- translational modifications.
  • 7. • Typically, partial sequencing of a protein provides sufficient information (one or more sequence tags) to identify it with reference to databases of protein sequences derived from the conceptual translation of genes.
  • 8. • The two major direct methods of protein sequencing are mass spectrometry and Edman degradation using a protein sequenator (sequencer). Mass spectrometry methods are now the most widely used for protein sequencing and identification but Edman degradation remains a valuable tool for characterizing a protein's N-terminus.
  • 9. Why we do Protein sequencing?? • Determining amino acid composition. • It is often desirable to know the unordered amino acid composition of a protein prior to attempting to find the ordered sequence, as this knowledge can be used to facilitate the discovery of errors in the sequencing process or to distinguish between ambiguous results
  • 10. • . Knowledge of the frequency of certain amino acids may also be used to choose which protease to use for digestion of the protein. The disincorporation of low levels of non-standard amino acids (e.g. norleucine) into proteins may also be determined. • A generalized method often referred to as amino acid analysis for determining amino acid frequency is as follows: • Hydrolyse a known quantity of protein into its constituent amino acids. • Separate and quantify the amino acids in some way.
  • 11. • Hydrolysis • Hydrolysis is done by heating a sample of the protein in 6 M hydrochloric acid to 100–110 °C for 24 hours or longer. Proteins with many bulky hydrophobic groups may require longer heating periods. However, these conditions are so vigorous that some amino acids (serine, threonine, tyrosine, tryptophan, gluta mine, and cysteine) are degraded. To circumvent this problem,
  • 12. • Biochemistry Online suggests heating separate samples for different times, analysing each resulting solution, and extrapolating back to zero hydrolysis time. Rastall suggests a variety of reagents to prevent or reduce degradation, such as thiol reagents or phenol to protect tryptophan and tyrosine from attack by chlorine, and pre- oxidising cysteine. He also suggests measuring the quantity of ammonia evolved to determine the extent of amide hydrolysis.
  • 13. • Separation and quantitation • The amino acids can be separated by ion- exchange chromatography then derivatized to facilitate their detection. More commonly, the amino acids are derivatized then resolved by reversed phase HPLC.
  • 14. • An example of the ion-exchange chromatography is given by the NTRC using sulfonated polystyrene as a matrix, adding the amino acids in acid solution and passing a buffer of steadily increasing pH through the column. Amino acids are eluted when the pH reaches their respective isoelectric points. Once the amino acids have been separated, their respective quantities are determined by adding a reagent that will form a coloured derivative.
  • 15. History Of Protein Sequencing
  • 16. • The advent of protein sequencing can be traced to two almost parallel discoveries by Frederick Sanger and Pehr Edman. • In 1950, Pehr Edman published a paper demonstrating a label-cleavage method for protein sequencing which was later termed “Edman degradation”.
  • 17. • Pehr Edman began his work in the Northrop- Kunitz laboratory at the Princeton branch of the Rockefeller Institute of Medical Research in 1947 • where he attempted to find a method to decode the amino acid sequence of a protein using chemicals; specifically he had early success with • fluorodinitrobenzene (FDNB) and phenylisothiocyanate (PITC).
  • 18.
  • 19. • Throughout his year at Princeton, Edman was able to conduct enough experiments to understand that it was feasible to use reagents like FDNB and PITC to determine amino acid sequence. • Edman returned to Sweden in 1947 and after two more years of work he was able to publish his paper that would describe the first successful method to sequence proteins [1] • This ground breaking paper described a method to determine the amino acid sequence of a protein and would come to be known as the Edman Degradation.
  • 20. F.SANGER • Around the same time Fred Sanger was developing his own labeling and separation method which led to the sequencing of insulin. • For this work, Sanger was awarded the 1958 Nobel Prize for Chemistry.
  • 21.
  • 22. Plus and minus in the 1970’s • Fast-forward once again to the 1970’s and we find Fred Sanger still at the forefront of nucleic acid sequencing. • In 1975 whilst at the Laboratory of Molecular Biology in Cambridge, Fred Sanger developed the “plus and minus” method for DNA sequencing (Sanger and Coulson, 1975). • Again there was competition in the field with Maxam and Gilbert working on degradation sequencing (Maxam and Glibert, 1977) however, their method was ultimately to falter due to the ease and quality of the Sanger method.
  • 23. plus and minus method • A primer is extended by a polymerase to generate a population of newly synthesized deoxyribonucleotides of assorted lengths; the unused dNTPs are removed, and polymerization continues in four pairs of plus and minus reaction mixtures; the minus mixtures have three NTPs and the plus mixtures have only one. • After a second polymerization, the mixtures are fractionated by gel electrophoresis, and each plus and minus pair is compared to indicate the length of the new polydeoxyribonucleotide (by the mobilities of the bands) and the position at which polymerization had terminated as a result of the absence of the missing dNTP
  • 24. • Five years earlier, Frederick Sanger had demonstrated a method to determine the amino acid residue located on the N-terminal end of a polypeptide chain by using the reagent fluorodinitrobenzene. • While it was thought, that at most, this method could only provide the sequences found on the N-terminal, • Sanger was able to take the method one step further.
  • 25. • By using several proteolytic enzymes, partial hydrolysis and early version of chromatography, Sanger was able to cleave the protein into fragments and piece together the residues like a jigsaw puzzle. • It wasn’t until 1955 that Sanger was able to present the complete sequence of insulin which led to him being awarded a Nobel Prize in Chemistry in 1958.
  • 26. Other scientist • Emile Zuckerkandl and Linus Pauling, whose work in the mid1960s advanced the use of nucleotide and protein sequences to explore evolution • In the 1970s,Carl Woese used ribosomal RNA sequences to define archaebacteria as a group of living organisms distinct from other bacteria and eukaryotes
  • 28. Protein sequencing • Technique to find out the sequence of amino acids in a protein Sequencing methods 1-N-terminal sequencing (Edman degradation) 2-C-terminal sequencing 3-Prediction from DNA sequence
  • 30. STEPS • Protein purification • Protein denaturation • Protein digestion • N-terminal labeling • Separation of labeled amino acid by chromatography • Detection through mass spectrometry • Data analysis
  • 31. Protein isolation(purification) • 1-SDS-PAGE (sodium dodecyl sulfate-poly acryl amide gel) 2-Two dimensional gels Protein of interest is immobilized by being absorbed onto a chemically modified glass or by electro blotting onto a porous polyvinylidene fluoride (PVDF) membrane.
  • 32. by heating a sample of the protein in 6 Molar HCL up to 100-110 degrees Celsius for 24 hours or longer It may degrade some amino acids To avoid this Thiol reagents or phenol are used - Performic acid for intra chain or inter chain S-S bonds Protein hydrolysis(denaturation)
  • 33. Protein digestion • Use Endoproteinase Lys-C, CNBr, Pepsin or trypsin to digest proteins into a population of peptides • Other enzymes include Glu-C and chymotrypsin • Add enzyme at 1:20 enzyme: protein ratio • incubate at room temperature for 6-9hrs • For better results use mixture of enzymes
  • 34. N-terminal labeling • The Edman reagent, phenylisothiocyanate (PTC), is added to the adsorbed peptide, together with a mildly basic buffer solution of 12% trimethylamine • This reacts with the amine group of the N-terminal amino acid • The terminal amino acid can then be selectively detached by the addition of anhydrous acid • The derivative then isomerises to give a substituted phenylthiohydantoin which can be washed off and identified by chromatography, and the cycle can be repeated
  • 35.
  • 36. CHROMATOGRAPHY • Chromatography is a technique in which molecules are separated based on volatility and bond characteristics when subjected to a carrier • Derivatives of amino acid can be separated by • 1-HPLC • 2-Gas chromatography • In gas chromatography (GC), the mobile phase is an inert gas such as helium
  • 37. MASS SPECTROMETERY • Mass spectrometry (MS) is an analytical technique that measures the mass-to-charge ratio of charged particles • The MS principle consists of ionizing chemical compounds to generate charged molecules or molecule fragments and measuring their mass-to-charge ratios • Separated amino acid derivatives are analyzed by mass spectrometer
  • 38. MS procedure • A sample is loaded onto the MS instrument, and undergoes vaporization • The components of the sample are ionized by one of a variety of methods (e.g., by impacting them with an electron beam), which results in the formation of charged particles (ions) • The ions are separated according to their mass-to- charge ratio in an analyzer by electromagnetic fields • The ions are detected, usually by a quantitative method • The ion signal is processed into mass spectra
  • 40. • first strategy for identifying an unknown compound is to compare its experimental mass spectrum against a library of mass spectra • Standard solutions of amino acids are also used and the resulting pattern is compared with standard spectrum. MS data analysis
  • 41. Limitations of Edman degradation • Need Pure Samples of Peptides • Requires 40-60 min / Amino Acid • Can’t Analyze N-Terminally Modified Peptides • Advantages • Most Reliable Sequencing Technique
  • 43. Definition: The C-terminus (also known as the carboxyl-terminus, carboxyl-terminus, C- terminal tail, C-terminal end, or COOH- terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). C terminal
  • 44.
  • 45. C-terminal retention signals • Proteins are naturally synthesized starting from the N-terminus and ending at the C-terminus. • While the N-terminus of a protein often contains targeting signals, the C-terminus can contain retention signals for protein sorting. • The most common ER retention signal is the amino acid sequence -KDEL (Lys-Asp-Glu-Leu) or -HDEL (His-Asp-Glu-Leu) at the C-terminus. This keeps the protein in the endoplasmic reticulum and prevents it from entering the secretory pathway.
  • 46. C-terminal modifications • The C-terminus of proteins can be modified post translationally, most commonly by the addition of a lipid anchor to the C-terminus that allows the protein to be inserted into a membrane without having a trans membrane domain. • Another form of C-terminal modification is the addition of a phosphoglycan, glycosylphosphatidylinositol (GPI), as a membrane anchor. The GPI anchor is attached to the C-terminus after proteolytic cleavage of a C- terminal propeptied. The most prominent example for this type of modification is the prion protein.
  • 47. C-terminal domain: • The C-terminal domain of some proteins has specialized functions. In humans, the CTD of RNA polymerase II typically consists of up to 52 repeats of the sequence Tyr-Ser-Pro-Thr- Ser-Pro-Ser.[1] This allows other proteins to bind to the C-terminal domain of RNA polymerase in order to activate polymerase activity. These domains then involved in the initiation of DNA transcription.
  • 48. C terminal sequencing technique • Top Down sequencing by MALDI ISD is used to sequence the c terminal of amino acid chain. • MALDI MS: “matrix-assisted laser desorption/ionization mass spectrometry” through which the c-terminal can be analyzed. • This method is used when the N-terminal is blocked and there is only C-terminal available. • The technique can fragment and sequence both the N- and C-terminal in the same mass spectrum.
  • 49. • Admen degradation is only used for N- terminal sequencing. • The most common method is to add carboxy peptidases to a solution of the protein. • Take a sample at regular at regular intervals and determine the terminal amino acid by analyzing a plot amino acid concentration and time.
  • 50. • A peptide mixture is generated by cleavage of the protein with cyanogen bromide and is incubated with carboxy peptidase Y. • The enzyme is only able to act on the C-terminal fragment, because this is the only peptide without a homoserine lactone residue at its C terminus. • The resulting fragments, forming a peptide ladder, are analyzed by matrix-assisted laser desorption/ionization mass spectrometry (MALDI- MS). • The entire protocol, including the CNBr cleavage, takes 21 h and can be applied to proteins purified either by SDS-PAGE or by 2D PAGE or in solution. Use of peptidase:
  • 51. Top down sequencing: • Top-down proteomics is a method of protein identification that uses an ion trapping mass spectrometer to store an isolated protein ion for mass measurement and tandem mass spectrometry analysis. • Top-down proteomics is capable of identifying and quantitating unique proteoforms through the analysis of intact protein. • Top-down proteomics interrogates protein structure through measurement of an intact mass followed by direct ion dissociation in the gas phase.
  • 52. • Fragmentation for tandem mass spectrometry is accomplished by electron-capture dissociation or electron-transfer dissociation. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry. • The main advantages of the top-down approach include the ability to detect degradation products, sequence variants, and combinations of post-translational modifications.
  • 53. MALDI MS top down sequencing: • 0.5-1 ml salt-free protein solution placed on a MALDI-plate, covered with the MALDI Matrix solution, is analyzed in the in-source decay mode on an UltrafleXtreme mass spectrometer. The generated mass spectrum (a complex mass spectrum, exhibiting mainly c- and y ions) is further analyzed with the software Bio. tools or is processed via a Mascot search. • A .pdf result file with sequence coverage of the target sequence would be the result output.
  • 54. UltrafleXtreme mass spectrometer • The UTX is used for a variety of MALDI applications, including mass spectrometry imaging (MSI), protein identification, peptide fingerprinting, and structure identification for a wide spectrum of biomoles (including lipids, polymers, glycans).
  • 56.
  • 57. Peptide Sequencing by Mass Spectrometry
  • 58. Introduction • MS/MS plays important role in protein identification (fast and sensitive) • Derivation of peptide sequence an important task in proteomics • Derivation without help from a protein database (“de novo sequencing”), especially important in identification of unknown protein
  • 59. Basic lab experimental steps 1. Proteins digested w/ an enzyme to produce peptides 2. Peptides charged (ionized) and separated according to their different m/z ratios 3. Each peptide fragmented into ions and m/z values of fragment ions are measured • Steps 2 and 3 performed within a tandem mass spectrometer.
  • 60. Mass spectrum • Proteins consist of 20 different types of a. a. with different masses (except for one pair Leu and Ile) • Different peptides produce different spectra • Use the spectrum of a peptide to determine its sequence
  • 61. Objectives • Describe the steps of a typical peptide analysis by MS (proteomic experiment) • Explain peptide ionization, fragmentation, identification
  • 62. Why are peptides, and not proteins, sequenced? • Solubility under the same conditions • Sensitivity of MS much higher for peptides • MS efficiency
  • 64. Choice of Enzyme Cleaving agent/Proteases Specificity A. HIGHLY SPECIFIC Trypsin Arg-X, Lys-X Endoproteinase Glu-C Glu-X Endoproteinase Lys-C Lys-X Endoproteinase Arg-C Arg-X Endoproteinase Asp-N X-Asp B. NONSPECIFIC Chymotrypsin Phe-X, Tyr-X, Trp-X, Leu-X Thermolysin X-Phe, X-Leu, X-Ile, X-Met, X-Val, X-Ala
  • 65. ESI Liquid flow Q or Ion Trap analyzer ESI is a solution technique that gives a continuous stream of ions, best for quadrupoles, ion traps, etc. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ ++++++ ++++++ ++++++ +++ +++ ++++++ ++++++ ++++++ +++ + + + + + ++ + + + + ++ + + + + ++ + + + + + + + + + + ++ + + + + ++ + + + + ++ + + + + + MALDI 3 nS LASER PULSE Sample (solid) on target at high voltage/ high vacuum MALDI is a solid-state technique that gives ions in pulses, best suited to time-of-flight MS. TOF analyzer Atmosphere Low vac. High vac. High vacuum
  • 66.
  • 67. ….MALDI or Electrospray ? MALDI is limited to solid state, ESI to liquid ESI is better for the analysis of complex mixture as it is directly interfaced to a separation techniques (i.e. HPLC or CE) MALDI is more “flexible” (MW from 200 to 400,000 Da)
  • 68. Q2 Collision Cell Q3 I II III Correlative sequence database searching Theoretical Acquired Protein identification Peptides 1D, 2D, 3D peptide separation 200 400 600 80010001200 m/z 200 400 600 80010001200 m/z 200 400 600 80010001200 m/z 12 14 16 Time (min) Tandem mass spectrum Protein Identification Strategy Q1 * * Protein mixture 10-Mar-200514:28:10 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z0 100 % CAL050310A 71 (1.353) Cm (1:96) TOF MSMS 785.60ES+ 2.94e3684.17 333.15 187.07 175.12 169.06 246.13 286.11 480.16 382.11 480.08 497.09 627.17 612.08 498.09 813.16 785.62 685.18 740.09 1285.141056.17942.16 814.17 924.16 943.17 1039.13 1038.17 1171.14 1057.18 1058.17 1172.15 1173.16 1286.14 1287.13 1296.10 10-Mar-200514:28:10 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z0 100 % CAL050310A 71 (1.353) Cm (1:96) TOF MSMS 785.60ES+ 2.94e3684.17 333.15 187.07 175.12 169.06 246.13 286.11 480.16 382.11 480.08 497.09 627.17 612.08 498.09 813.16 785.62 685.18 740.09 1285.141056.17942.16 814.17 924.16 943.17 1039.13 1038.17 1171.14 1057.18 1058.17 1172.15 1173.16 1286.14 1287.13 1296.10 10-Mar-200514:28:10 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z0 100 % CAL050310A 71 (1.353) Cm (1:96) TOF MSMS 785.60ES+ 2.94e3684.17 333.15 187.07 175.12 169.06 246.13 286.11 480.16 382.11 480.08 497.09 627.17 612.08 498.09 813.16 785.62 685.18 740.09 1285.141056.17942.16 814.17 924.16 943.17 1039.13 1038.17 1171.14 1057.18 1058.17 1172.15 1173.16 1286.14 1287.13 1296.10
  • 69. Breaking Protein into Peptides and Peptides into Fragment Ions • Proteases, e.g. trypsin, break protein into peptides • MS/MS breaks the peptides down into fragment ions and measures the mass of each piece • MS measure m/z ratio of an ion
  • 70. Peptide fragmentation Amino acids differ in their side chains Predominant fragmentation Weakest bonds
  • 71. Tendency of peptides to fragment at Asp (D) Mass Spectrometry in Proteomics Ruedi Aebersold* and David R. Goodlett 269 Chem. Rev. 2001, 101, 269-295 C-terminal side of Asp
  • 72. Protein Identification by MS Artificial spectra built Artificially trypsinated Database of sequences (i.e. SwissProt) Spot removed from gel Fragmented using trypsin Spectrum of fragments generated MATCH Library
  • 73. Conclusions • MS of peptides enables high throughput identification and characterization of proteins in biological systems • “de novo sequencing” can be used to identify unknown proteins not found in protein databases
  • 75. • The rapid increase of publicly available sequences and protein structures means that an increasing amount of information can be obtained for any protein sequence through its relatedness to others. • If a set of homologous proteins can be found and aligned, the information content at each position in the alignment profile is far greater than in any single member of the family, and any structural or functional prediction algorithm should utilize this collective information. Profile information of this type is extremely sensitive to the quality of the multiple alignment, and distant homologues should only be included in the alignment if they can be aligned with confidence.
  • 76.
  • 78. DNA template strand TRANSCRIPTION mRNA TRANSLATION Protein Amino acid Codon Trp Phe Gly 5 5 Ser U U U U U 3 3 53 G G G G C C T C A A AAAAA T T T T T G G G G C C C G G DNA molecule Gene 1 Gene 2 Gene 3 C C
  • 79. • Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its folding and its secondary and tertiary structure from its primary structure. • Structure prediction is fundamentally different from the inverse problem of protein design.
  • 80. • Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes).
  • 81. • Protein structure and terminology • Proteins are chains of amino acids joined together by peptide bonds. Many conformations of this chain are possible due to the rotation of the chain about each Cα atom. It is these conformational changes that are responsible for differences in the three dimensional structure of proteins. • Each amino acid in the chain is polar, i.e. it has separated positive and negative charged regions with a free C=O group, which can act as hydrogen bond acceptor and an NH group, which can act as hydrogen bond donor. These groups can therefore interact in the protein structure. • The 20 amino acids can be classified according to the chemistry of the side chain which also plays an important structural role. Glycine takes on a special position, as it has the smallest side chain, only one Hydrogen atom, and therefore can increase the local flexibility in the protein structure. Cysteine on the other hand can react with another cysteine residue and thereby form a cross link stabilizing the whole structure.
  • 82. • The protein structure can be considered as a sequence of secondary structure elements, such as α helices and β sheets, which together constitute the overall three-dimensional configuration of the protein chain. In these secondary structures regular patterns of H bonds are formed between neighboring amino acids, and the amino acids have similar Φ and Ψ angles. • Bond angles for ψ and ω • The formation of these structures neutralizes the polar groups on each amino acid. The secondary structures are tightly packed in the protein core in a hydrophobic environment. Each amino acid side group has a limited volume to occupy and a limited number of possible interactions with other nearby side chains, a situation that must be taken into account in molecular modeling and alignments. [
  • 83. • α Helix • The α helix is the most abundant type of secondary structure in proteins. The α helix has 3.6 amino acids per turn with an H bond formed between every fourth residue; the average length is 10 amino acids (3 turns) or 10 Å but varies from 5 to 40 (1.5 to 11 turns). • The alignment of the H bonds creates a dipole moment for the helix with a resulting partial positive charge at the amino end of the helix. Because this region has free NH2 groups, it will interact with negatively charged groups such as phosphates. • The most common location of α helices is at the surface of protein cores, where they provide an interface with the aqueous environment. The inner-facing side of the helix tends to have longer helices, forming a bend.
  • 84. • hydrophobic amino acids and the outer-facing side hydrophilic amino acids. • Thus, every third of four amino acids along the chain will tend to be hydrophobic, a pattern that can be quite readily detected. In the leucine zipper motif, a repeating pattern of leucines on the facing sides of two adjacent helices is highly predictive of the motif.
  • 85.
  • 86. • β sheet • β sheets are formed by H bonds between an average of 5–10 consecutive amino acids in one portion of the chain with another 5–10 farther down the chain. • The interacting regions may be adjacent, with a short loop in between, or far apart, with other structures in between. Every chain may run in the same direction to form a parallel sheet, every other chain may run in the reverse chemical direction to form an anti parallel sheet, or the chains may be parallel and anti parallel to form a mixed sheet.
  • 87. • The pattern of H bonding is different in the parallel and anti parallel configurations. Each amino acid in the interior strands of the sheet forms two H bonds with neighboring amino acids, whereas each amino acid on the outside strands forms only one bond with an interior strand • . Looking across the sheet at right angles to the strands, more distant strands are rotated slightly counterclockwise to form a left-handed twist. The Cα atoms alternate above and below the sheet in a pleated structure, and the R side groups of the amino acids alternate above and below the pleats.
  • 88. • Loop • Loops are regions of a protein chain that are • (1) between α helices and β sheets, • (2) of various lengths and three-dimensional configurations, and • (3) on the surface of the structure. • Hairpin loops that represent a complete turn in the polypeptide chain joining two antiparallel β strands may be as short as two amino acids in length.
  • 89. • Loops interact with the surrounding aqueous environment and other proteins. Because amino acids in loops are not constrained by space and environment as are amino acids in the core region, and do not have an effect on the arrangement of secondary structures in the core, more substitutions, insertions, and deletions may occur. Thus, in a sequence alignment, the presence of these features may be an indication of a loop.
  • 90. • The positions of introns in genomic DNA sometimes correspond to the locations of loops in the encoded protein[. Loops also tend to have charged and polar amino acids and are frequently a component of active sites. A detailed examination of loop structures has shown that they fall into distinct families.
  • 91. • Coils • A region of secondary structure that is not a α helix, a β sheet, or a recognizable turn is commonly referred to as a coil.
  • 93. • In Functional genomics: functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic and transcriptomic projects (such as genome sequencing projects and RNA sequencing) to describe gene (and protein) functions and interactions. Unlike genomics, functional genomics focuses on the dynamic aspects such as
  • 94. • gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures.
  • 95. • The goal of functional genomics is to understand the relationship between an organism's genome and its phenotype. The term functional genomics is often used broadly to refer to the many possible approaches to understanding the properties and function of the entirety of an organism's genes and gene products.
  • 96. • The promise of functional genomics is to expand and synthesize genomic and proteomic knowledge into an understanding of the dynamic properties of an organism at cellular and/or organismal levels. This would provide a more complete picture of how biological function arises from the information encoded in an organism's genome. • The possibility of understanding how a particular mutation leads to a given phenotype has important implications for human genetic diseases, as answering these questions could point scientists in the direction of a treatment or cure.
  • 97. Prediction of protein function from protein sequence and structure The sequence of a genome contains the plans of the possible life of an organism, but implementation of genetic information depends on the functions of the proteins and nucleic acids that it encodes. Many individual proteins of known sequence and structure present challenges to the understanding of their function.
  • 98. • In particular, a number of genes responsible for diseases have been identified but their specific functions are unknown. Whole-genome sequencing projects are a major source of proteins of unknown function. Annotation of a genome involves assignment of functions to gene products, in most cases on the basis of amino-acid sequence alone.
  • 99. 3D structure can aid the assignment of function, motivating the challenge of structural genomics projects to make structural information available for novel uncharacterized proteins. Structure-based identification of homologues often succeeds where sequence-alone-based methods fail, because in many cases evolution retains the folding pattern long after sequence similarity becomes undetectable.
  • 100. • Nevertheless, prediction of protein function from sequence and structure is a difficult problem, because homologous proteins often have different functions. Many methods of function prediction rely on identifying similarity in sequence and/or structure between a protein of unknown function and one or more well-understood proteins. Alternative methods include inferring conservation patterns in members of a functionally uncharacterized family for which many sequences and structures are known.
  • 101. In Proteomics Proteomics is the large-scale study of proteomes. A proteome is a set of proteins produced in an organism, system, or biological context. The proteome is not constant; it differs from cell to cell and changes over time. To some degree, the proteome reflects the underlying transcriptome. However, protein activity (often assessed by the reaction rate of the processes in which the protein is involved) is also modulated by many factors in addition to the expression level of the relevant gene.
  • 102. Protein sequencing denotes the process of finding the amino acid sequence, or primary structure of a protein. Sequencing plays a very vital role in Proteomics as the information obtained can be used to deduce function, structure, and location which in turn aids in identifying new or novel proteins as well as understanding of cellular processes. Better understanding of these processes allows for creation of drugs that target specific metabolic pathways among other things.
  • 103. In Bioinformatics What is bioinformatics? In recent years, molecular biology has witnessed an information revolution as a result of the development of rapid DNA sequencing techniques and the corresponding progress in computer-based technologies, which are allowing us to cope with this information deluge in increasingly efficient ways. The term that was coined to encompass computer applications in biological sciences is bioinformatics.
  • 104. The term bioinformatics is now used to mean rather different things, from artificial intelligence and robotics to genome analysis. The term was originally applied to the computational manipulation and analysis of biological sequence data (DNA and/or protein), but now tends also to be used to embrace the manipulation and analysis of 3D structural data.
  • 105. Identifying protein-coding genes in genomic sequences The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. The genome sequence is an organism's blueprint: the set of instructions dictating its biological traits. The unfolding of these instructions is initiated by the transcription of the DNA into RNA sequences. According to the standard model, the majority of RNA sequences originate from protein-coding genes; that is, they are processed into messenger RNAs (mRNAs) which, after their export to the cytosol, are translated into proteins.
  • 106. •To Determine the protein folding •Protein folding is the process by which a protein structure assumes its functional shape or conformation. •Protein folding is the physical process by which a protein chain acquires its native 3-dimensional structure, a conformation that is usually biologically functional, in an expeditious and reproducible manner. •It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil. Each protein exists as an unfolded polypeptide or random coil when translated from a sequence of mRNA to a linear chain of amino acids.
  • 107. •All protein molecules are heterogeneous unbranched chains of amino acids. •By coiling and folding into a specific three-dimensional shape they are able to perform their biological function. •Proteins are formed from long chains of amino acids; they exist in an array of different structures which often dictate their functions. Proteins follow energetically favorable pathways to form stable, orderly, structures; this is known as the proteins’ native structure. • Most proteins can only perform their various functions when they are folded. Scientists believe that the instructions for folding a protein are encoded in the sequence. Researchers and scientists can easily determine the sequence of a protein, but have not cracked the code that governs folding .
  • 108. In Drugs production What is Protein Drug A type of drugs made of protein. These drugs usually have large molecular weight with protein characteristics. structure of an unusual class of proteins called beta-peptides. Eventually, these peptides could become the basis for drugs that are cheaper to manufacture than existing protein- based pharmaceuticals and last longer in the body. A drug's efficiency may be affected by the degree to which it binds to the proteins within blood plasma. The less bound a drug is, the more efficiently it can traverse cell membranes or diffuse. Common blood proteins that drugs bind to are human serum albumin, lipoprotein, glycoprotein, α, β‚ and γ globulins