KVA DAV COLLEGE FOR WOMEN KARNAL
DEPARTMENT OF BIOTECHNOLOGY
Presented by-
Ramika
MSC Biotechnology 1st year
2399620006
PROTEIN SEQUENCE ANALYSIS
CONTENTS
 Introduction
 History
 Prepare the proteins for sequencing
 Sequencing methods
 N-terminal sequencing
 C-terminal sequencing
 DNA sequencing
 Protein mass spectrometry
 Bioinformatics tools
INTRODUCTION
Protein:
 Polymer of amino acids
 Protein structure and function depends upon amino
acid sequence.
Protein Sequencing:
 Technique to find out amino acid sequences in
protein.
 Imp for understanding cellular functions.
 Imp in targeting drugs to specific metabolic pathways.
HISTORY
1951: The very first sequence of insulin protein were characterized by Fred Sanger.
The method used in this study , which is called “SANGER METHOD” was a milestone in sequencing long strand
molecule such as DNA.
This method was eventually used in human genome project.
1969: Analysis of sequence of tRNA were used to infer residues interactions from corelated changes in nucleotide
sequence, giving rise to tRNA secondary structure.
1970: Saul B.Needleman and Christain D.Wunsh published the first computer algorithm for aligning two sequences.
1977: Publication of first complete genome of bacteriophage.
Prepare the proteins for sequencing
 If the protein contains more than one polypeptide chain, the chains are separated and purified.
 Intrachain S--S (disulfide) cross-bridges between cysteine residues in the polypeptide chain are cleaved. If these
disulfides are interchain linkages, then step 2 precedes step1.
 The amino acid composition of each polypeptide chain is determined.
 The N-terminal and C-terminal residues are identified.
 Each polypeptide chain is cleaved into smaller fragments.
 Sequence determination of peptide fragments.
 The overall amino acid sequence of the protein is reconstructed from the sequences in overlapping fragments.
 The positions of S--S cross-bridges formed between cysteine residues are located.
Separation of Polypeptide Chains:
Subunit associations in multimeric proteins are typically maintained solely by
noncovalent forces, and therefore most multimeric proteins can usually be
dissociated by exposure to pH extremes, 8 M urea, 6 M guanidinium hydrochloride,
or high salt concentrations.
Cleavage of Disulfide Bridges:
Oxidation of a disulfide by performic acid results in the formation of
two equivalents of cysteic acid.
SEQUENCING METHODS
N-terminal sequencing
C-terminal sequencing
Prediction from DNA sequence
N-TERMINAL SEQUENCING
The N-terminal sequencing is done through:
Sanger’s method
Dansyl chloride method
Edman’s degradation method
Sanger’s method
• Treat with DNFB to form a derivative of amino terminal amino acid.
• Acid hydrolysis.
• Extraction of DNP-derivative with organic solvent.
• Identification of DNP-derivative by chromatography and comparison
with standards.
Dansyl chloride method
• Reagent:1-dimethyl aminophthalene-5-sulfonyl chloride (dansyl chloride)
• Dansyl polypeptide chain is prepared.
• Acidic hydrolysis liberates all amino acid and N terminal dansyl amino acid.
• Amino acids are separated.
• Fluorescence of dansyl amino acid is detected.
• Types of amino acid is obtained from comparison with standard dansylated amino
acids.
Edman ‘s degradation method
Principle : It sequentially remove one residue at a time from amino end of a
peptide.
Mechanism : Phenyl isothiocyanate is reacted with uncharged N-terminal amino
group to form phenylthiocarbamoyl derivative.
• Then under acidic conditions it is cleaved to form thiazolinone derivative.
• This thiazolinone derivative is extracted into organic solvent and treated with
acid to form more stable phenylthiohydantoin that can be identified using
chromatography.
C-TERMINAL SEQUENCING
Add carboxypeptidases to a solution of protein.
Take sample at regular intervals.
Determine the terminal amino acid by analyzing a plot of
amino acid concentration against time.
DNA Sequencing
• Protein sequence can also be determined indirectly from
mRNa
• Design primers from the amino acid sequene and amplify
the gene.
• Sequence the gene and determine the amino acid sequence
of proteins.
MASS SPECTROMETRY
It is an important method for accurate mass determination and characterization of protein.
Basic Principle: This technique basically studies the effect of ionizing energy on molecules
. It depends upon chemical reactions in the gas phase in which sample molecules are
consumed during the formation of ionic and neutral species.
Components: The instrument consists of three major components:
 Ion source: For producing gaseous ions from the substance being studied.
 Analyzer: For resolving the ions into their characteristics mass components according to their
mass to charge ratio.
 Detector system : For detecting the ions and recording the relative abundance of each of
resolved ionic species.
BIOINFORMATICS TOOLS
Bioinformatics: The collection, classification ,storage and analysis of biochemical and
biological information using computers especially as applied to moleculer genetics
and genomics.
 It is an interdisciplinary field that develops method and software tools for
understanding biological data.
 It combines biology, computer, science, information engineering, mathematics
and statistics to analyze and interpret biological data.
MASTER LAYOUT FOR PROTEIN SEQUENCING
On the basis of number of comparing sequencing strand,
it is of two types:
 Pairwise alignment
 Multiple alignment
Types
Pairwise Sequence Alignment
 Pairwise sequences alignment only compares two sequences at a time.
a b a c d
a b _ c d
 Optimality is based on SCORE.
 A pairwise alignment consist of series of paired bases, one base from each
sequence.
 There are three types of pairs:
I. Matches: the same nucleotide appears in both sequence.
II. Mismatches: different nucleotides are found in two sequences.
III. Gaps: a base in one sequence and null base in the other.
 ALGORITHM used are Needleman-Wunsh algorithm and the Smith-Waterman algorithm.
 BLAST (Basic Local Alignment Search Tool)
BLAST encompasses many different implementations and enhancements
to a search algorithm that finds
“High Scoring Pairs” of sequence alignment in databases.
It is a Fast way to find similar sequences.
It is not the most sensitive way to search.
It is by a wide margin the most commonly used tool in bioinformatics.
BLAST Steps
Seeding:
Prepare a list of short, fixed length segments from the query.
 Searching:
Find highly similar or exact match for each word.
 Extension:
Extend each match to a longer match.
 Evaluation:
Evaluation the results using E values.
Multiple Sequence Alignment
Multiple Sequence Alignment can be seen as a generalization of Pairwise Sequence
Alignment . Instead of aligning just two sequences , three or more sequences are aligned
simultaneously.
a b a c d
a b _ c d
x b a c e
MSA is used for:
a. Detection of conserved domains in a group of genes or proteins.
b. Construction of a phylogenetic tree.
c. Prediction of protein structure.
d. Determination of consensus sequences.
CLUSTAL
 A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul
Sharp(1988)
 CLUSTAL makes a global multiple alignment using a “progressive
alignment” approach.
 First computes all pairwise alignments and calculates sequence
similarity between pairs.
 These similarities are used to build a rough guide tree.
Basic Information Comes From Sequence
 One sequence -can get some information eg-amino acid
properties.
 More than one sequence- get more info on conserved residues ,
fold and function.
 Multiple alignments of related sequence- can build up consensus
sequences of known families , domains , motifs or sites.
 Sequence alignments can give information on loops, families and
function from conserved regions.
APPLICATIONS OF PROTEIN SEQUENCING
Recombinant protein synthesis.
Drugs production.
Antibiotic production.
Functional genomics.
Determination of protein folding patterns.
In bioinformatics.
It plays vital role in proteomics.
Used for the prediction of final structure, function and location of protein.
To find out location of gene coding for that protein.
Genetic diseases.
Identification of sequence differences and variations such as point mutations.
Revealing the evolution and genetic diversity of sequence and organisms.
THANK YOU

protein sequence analysis

  • 1.
    KVA DAV COLLEGEFOR WOMEN KARNAL DEPARTMENT OF BIOTECHNOLOGY Presented by- Ramika MSC Biotechnology 1st year 2399620006 PROTEIN SEQUENCE ANALYSIS
  • 2.
    CONTENTS  Introduction  History Prepare the proteins for sequencing  Sequencing methods  N-terminal sequencing  C-terminal sequencing  DNA sequencing  Protein mass spectrometry  Bioinformatics tools
  • 3.
    INTRODUCTION Protein:  Polymer ofamino acids  Protein structure and function depends upon amino acid sequence. Protein Sequencing:  Technique to find out amino acid sequences in protein.  Imp for understanding cellular functions.  Imp in targeting drugs to specific metabolic pathways.
  • 4.
    HISTORY 1951: The veryfirst sequence of insulin protein were characterized by Fred Sanger. The method used in this study , which is called “SANGER METHOD” was a milestone in sequencing long strand molecule such as DNA. This method was eventually used in human genome project. 1969: Analysis of sequence of tRNA were used to infer residues interactions from corelated changes in nucleotide sequence, giving rise to tRNA secondary structure. 1970: Saul B.Needleman and Christain D.Wunsh published the first computer algorithm for aligning two sequences. 1977: Publication of first complete genome of bacteriophage.
  • 5.
    Prepare the proteinsfor sequencing  If the protein contains more than one polypeptide chain, the chains are separated and purified.  Intrachain S--S (disulfide) cross-bridges between cysteine residues in the polypeptide chain are cleaved. If these disulfides are interchain linkages, then step 2 precedes step1.  The amino acid composition of each polypeptide chain is determined.  The N-terminal and C-terminal residues are identified.  Each polypeptide chain is cleaved into smaller fragments.  Sequence determination of peptide fragments.  The overall amino acid sequence of the protein is reconstructed from the sequences in overlapping fragments.  The positions of S--S cross-bridges formed between cysteine residues are located.
  • 6.
    Separation of PolypeptideChains: Subunit associations in multimeric proteins are typically maintained solely by noncovalent forces, and therefore most multimeric proteins can usually be dissociated by exposure to pH extremes, 8 M urea, 6 M guanidinium hydrochloride, or high salt concentrations. Cleavage of Disulfide Bridges: Oxidation of a disulfide by performic acid results in the formation of two equivalents of cysteic acid.
  • 7.
    SEQUENCING METHODS N-terminal sequencing C-terminalsequencing Prediction from DNA sequence
  • 8.
    N-TERMINAL SEQUENCING The N-terminalsequencing is done through: Sanger’s method Dansyl chloride method Edman’s degradation method
  • 9.
    Sanger’s method • Treatwith DNFB to form a derivative of amino terminal amino acid. • Acid hydrolysis. • Extraction of DNP-derivative with organic solvent. • Identification of DNP-derivative by chromatography and comparison with standards.
  • 11.
    Dansyl chloride method •Reagent:1-dimethyl aminophthalene-5-sulfonyl chloride (dansyl chloride) • Dansyl polypeptide chain is prepared. • Acidic hydrolysis liberates all amino acid and N terminal dansyl amino acid. • Amino acids are separated. • Fluorescence of dansyl amino acid is detected. • Types of amino acid is obtained from comparison with standard dansylated amino acids.
  • 13.
    Edman ‘s degradationmethod Principle : It sequentially remove one residue at a time from amino end of a peptide. Mechanism : Phenyl isothiocyanate is reacted with uncharged N-terminal amino group to form phenylthiocarbamoyl derivative. • Then under acidic conditions it is cleaved to form thiazolinone derivative. • This thiazolinone derivative is extracted into organic solvent and treated with acid to form more stable phenylthiohydantoin that can be identified using chromatography.
  • 15.
    C-TERMINAL SEQUENCING Add carboxypeptidasesto a solution of protein. Take sample at regular intervals. Determine the terminal amino acid by analyzing a plot of amino acid concentration against time.
  • 16.
    DNA Sequencing • Proteinsequence can also be determined indirectly from mRNa • Design primers from the amino acid sequene and amplify the gene. • Sequence the gene and determine the amino acid sequence of proteins.
  • 17.
    MASS SPECTROMETRY It isan important method for accurate mass determination and characterization of protein. Basic Principle: This technique basically studies the effect of ionizing energy on molecules . It depends upon chemical reactions in the gas phase in which sample molecules are consumed during the formation of ionic and neutral species. Components: The instrument consists of three major components:  Ion source: For producing gaseous ions from the substance being studied.  Analyzer: For resolving the ions into their characteristics mass components according to their mass to charge ratio.  Detector system : For detecting the ions and recording the relative abundance of each of resolved ionic species.
  • 19.
    BIOINFORMATICS TOOLS Bioinformatics: Thecollection, classification ,storage and analysis of biochemical and biological information using computers especially as applied to moleculer genetics and genomics.  It is an interdisciplinary field that develops method and software tools for understanding biological data.  It combines biology, computer, science, information engineering, mathematics and statistics to analyze and interpret biological data.
  • 20.
    MASTER LAYOUT FORPROTEIN SEQUENCING
  • 21.
    On the basisof number of comparing sequencing strand, it is of two types:  Pairwise alignment  Multiple alignment Types
  • 22.
    Pairwise Sequence Alignment Pairwise sequences alignment only compares two sequences at a time. a b a c d a b _ c d  Optimality is based on SCORE.  A pairwise alignment consist of series of paired bases, one base from each sequence.  There are three types of pairs: I. Matches: the same nucleotide appears in both sequence. II. Mismatches: different nucleotides are found in two sequences. III. Gaps: a base in one sequence and null base in the other.
  • 23.
     ALGORITHM usedare Needleman-Wunsh algorithm and the Smith-Waterman algorithm.  BLAST (Basic Local Alignment Search Tool) BLAST encompasses many different implementations and enhancements to a search algorithm that finds “High Scoring Pairs” of sequence alignment in databases. It is a Fast way to find similar sequences. It is not the most sensitive way to search. It is by a wide margin the most commonly used tool in bioinformatics.
  • 24.
    BLAST Steps Seeding: Prepare alist of short, fixed length segments from the query.  Searching: Find highly similar or exact match for each word.  Extension: Extend each match to a longer match.  Evaluation: Evaluation the results using E values.
  • 26.
    Multiple Sequence Alignment MultipleSequence Alignment can be seen as a generalization of Pairwise Sequence Alignment . Instead of aligning just two sequences , three or more sequences are aligned simultaneously. a b a c d a b _ c d x b a c e MSA is used for: a. Detection of conserved domains in a group of genes or proteins. b. Construction of a phylogenetic tree. c. Prediction of protein structure. d. Determination of consensus sequences.
  • 27.
    CLUSTAL  A popularheuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp(1988)  CLUSTAL makes a global multiple alignment using a “progressive alignment” approach.  First computes all pairwise alignments and calculates sequence similarity between pairs.  These similarities are used to build a rough guide tree.
  • 28.
    Basic Information ComesFrom Sequence  One sequence -can get some information eg-amino acid properties.  More than one sequence- get more info on conserved residues , fold and function.  Multiple alignments of related sequence- can build up consensus sequences of known families , domains , motifs or sites.  Sequence alignments can give information on loops, families and function from conserved regions.
  • 29.
    APPLICATIONS OF PROTEINSEQUENCING Recombinant protein synthesis. Drugs production. Antibiotic production. Functional genomics. Determination of protein folding patterns. In bioinformatics. It plays vital role in proteomics. Used for the prediction of final structure, function and location of protein. To find out location of gene coding for that protein. Genetic diseases. Identification of sequence differences and variations such as point mutations. Revealing the evolution and genetic diversity of sequence and organisms.
  • 30.