Mass Spectrometry: Protein Identification Strategies

  • 8,838 views
Uploaded on

A talk on the basics of protein identification for mass spectrometry.

A talk on the basics of protein identification for mass spectrometry.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • thank you
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
8,838
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
368
Comments
1
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Mass SpectrometryProtein Identification Strategies
    Michel Dumontier
    Carleton University
    2/4/2010
    1
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 2. Typical MS experiment
    Protein Identification
    2/4/2010
    2
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 3. Protein identification strategies
    Mass Spectrometry
    Peptide Mass Fingerprinting
    Tandem Mass Spectrometry
    Spectral alignment
    de novo sequencing
    2/4/2010
    3
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 4. Peptide Mass Fingerprinting (PMF)
    2/4/2010
    4
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 5. Matrix-Assisted Laser Desorption/Ionization (MALDI)
    2/4/2010
    5
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 6. Electrospray Ionization (ESI)
    2/4/2010
    6
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 7. 2/4/2010
    7
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 8. Peptide Mass Fingerprinting
    Identify a protein from peptide signature
    MALDI-TOF, ESI-TOF
    Approach
    Compare observed with theoretical masses
    Requirements
    Protease & cleavage pattern
    Database of known sequences
    2/4/2010
    8
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 9. Principles of Fingerprinting
    SequenceMass (M+H)Tryptic Fragments
    >Protein A
    acedfhsakdfqea
    sdfpkivtmeeewe
    ndadnfekqwfe
    >Protein B
    acekdfhsadfqea
    sdfpkivtmeeewe
    nkdadnfeqwfe
    >Protein C
    acedfhsadfqeka
    sdfpkivtmeeewe
    ndakdnfeqwfe
    acedfhsak
    dfgeasdfpk
    ivtmeeewendadnfek
    gwfe
    acek
    dfhsadfgeasdfpk
    ivtmeeewenk
    dadnfeqwfe
    acedfhsadfgek
    asdfpk
    ivtmeeewendak
    dnfegwfe
    4842.05
    4842.05
    4842.05
    2/4/2010
    9
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 10. Principles of Fingerprinting
    SequenceMass (M+H)Mass Spectrum
    >Protein A
    acedfhsakdfqea
    sdfpkivtmeeewe
    ndadnfekqwfe
    >Protein B
    acekdfhsadfqea
    sdfpkivtmeeewe
    nkdadnfeqwfe
    >Protein C
    acedfhsadfqeka
    sdfpkivtmeeewe
    ndakdnfeqwfe
    4842.05
    4842.05
    4842.05
    2/4/2010
    10
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 11. Mass Calculation (Glycine)
    NH2—CH2—COOH
    free amino acid
    R1—NH—CH2—CO—R3
    amino acid
    residue
    Glycine Free Amino Acid Mass
    5xH + 2xC + 2xO + 1xN
    = 75.032015 amu
    Glycine Residue Mass
    3xH + 2xC + 1xO + 1xN
    =57.021455 amu
    Monoisotopic Mass
    1H = 1.007825
    12C = 12.00000
    14N = 14.00307
    16O = 15.99491
    2/4/2010
    11
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 12. Monoisotopicvs average mass
    Monoisotopicmassis the mass determined using the masses of the most abundant isotopes
    Average massis the abundance weighted mass of all isotopic components
  • 13. Amino Acid ResiduesMonoisotopicMasses
    Glycine 57.02147
    Alanine 71.03712
    Serine 87.03203
    Proline 97.05277
    Valine 99.06842
    Threonine 101.04768
    Cysteine 103.00919
    Isoleucine 113.08407
    Leucine 113.08407
    Asparagine 114.04293
    Aspartic acid 115.02695
    Glutamine 128.05858
    Lysine 128.09497
    Glutamic acid 129.0426
    Methionine 131.04049
    Histidine 137.05891
    Phenylalanine 147.06842
    Arginine 156.10112
    Tyrosine 163.06333
    Tryptophan 186.07932
    2/4/2010
    13
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 14. Building a PMF Database
    Download protein sequence database
    SwissProtor GenBank’s NR (non-redundant)
    Pick a protease, determine cleavage sites and identify resulting peptides for each protein entry
    Calculate the mass (M+H) for each peptide
    Sort the mass list
    2/4/2010
    14
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 15. Building A PMF Database
    Sequence DBCalc.TrypticFragsMass List
    >Protein A
    acedfhsakdfqea
    sdfpkivtmeeewe
    ndadnfekqwfe
    >Protein B
    acekdfhsadfqea
    sdfpkivtmeeewe
    nkdadnfeqwfe
    >Protein C
    acedfhsadfqeka
    sdfpkivtmeeewe
    ndakdnfeqwfe
    acedfhsak
    dfgeasdfpk
    ivtmeeewendadnfek
    gwfe
    acek
    dfhsadfgeasdfpk
    ivtmeeewenk
    dadnfeqwfe
    acedfhsadfgek
    asdfpk
    ivtmeeewendak
    dnfegwfe
    450.2017 (B-1)
    538.2296 (A-4)
    664.3300 (C-2)
    1007.4251 (A-1)
    1112.4894 (A-2)
    1114.4416 (C-4)
    1300.5116 (B-4)
    1407.6462 (B-3)
    1526.6211 (C-1)
    1593.7101 (C-3)
    1740.7500 (B-2)
    2098.8909 (A-3)
    2/4/2010
    15
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 16. The Fingerprint (PMF) Approach
    Take a mass spectrum of a protease-cleaved protein (from gel or HPLC peak)
    Identify as many peaks as possible in spectrum
    Compare query peaks with database peaks and calculate # of matches or matching score (based on length and mass difference)
    Rank hits and return top scoring entry (having the most matching peptides) – the protein of interest
    2/4/2010
    16
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 17. Query (MALDI) Spectrum
    1007
    1199
    2211
    (trypsin)
    538
    2098
    450
    1940
    (trypsin)
    698
    500 1000 1500 2000 2500
    2/4/2010
    17
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 18. Query vs. Database
    Query Masses Database Mass List Results
    450.2017 (B)
    538.2296 (A)
    664.3300 (C)
    1007.4251 (A)
    1112.4894 (A)
    1114.4416 (C)
    1300.5116 (B)
    1407.6462 (B)
    1526.6211 (C)
    1593.7101 (C)
    1740.7501 (B)
    2098.8909 (A)
    2 Unknown masses
    1 hit on B
    3 hits on A
    Conclude the query
    protein is A
    450.2201
    538.2296
    698.3100
    1007.5391
    1199.4916
    2098.9909
    2/4/2010
    18
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 19. What You Need To Do PMF
    A list of query masses (as many as possible)
    Protease(s) used or cleavage reagents
    Databases to search (SP, NR)
    Estimated mass and pI of protein spot (opt)
    Cysteine (or other) modifications
    Minimum number of hits for significance
    Mass tolerance (100 ppm = 1000.0 ± 0.1 Da)
    A PMF website (Prowl, ProFound, Mascot, PepIdent)
    2/4/2010
    19
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 20. Challenge 1:Overlap in combined masses
    Gly+ Gly = 114.043 -> Asn= 114.043
    Ala + Gly = 128.059 -> Gln= 128.059
    -> Lys = 128.095
    Gly+ Val = 156.090 -> Arg= 156.101
    Ala + Asp = Glu + Gly = 186.064
    Trp= 186.079
    Ser + Val = 186.100 -> Trp= 186.079 u
    Leu= Ile = 113.084
    2/4/2010
    20
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 21. Challenge 2:Missed Cleavage
    SequenceTrypticFragments (no missed cleavage)
    >Protein A
    acedfhsakdfqea
    sdfpkivtmeeewe
    ndadnfekqwfe
    acedfhsak (1007.4251)
    dfgeasdfpk (1183.5266)
    ivtmeeewendadnfek (2098.8909)
    gwfe(538.2296)
    TrypticFragments (1 missed cleavage)
    acedfhsak (1007.4251)
    dfgeasdfpk (1183.5266)
    ivtmeeewendadnfek 2098.8909)
    gwfe (609.2667)
    acedfhsakdfgeasdfpk (2171.9338)
    ivtmeeewendadnfekgwfe (2689.1398)
    dfgeasdfpkivtmeeewendadnfek (3263.2997)
    2/4/2010
    21
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 22. Advantages of PMF
    Uses a “robust” & inexpensive form of MS (MALDI)
    Doesn’t require too much sample optimization
    Can be done by a moderately skilled operator (don’t need to be an MS expert)
    Widely supported by web servers
    Improves as DB’s get larger & instrumentation gets better
    Very amenable to high throughput robotics (up to 500 samples a day)
    2/4/2010
    22
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 23. Limitations With PMF
    Requires that the protein of interest already be in a sequence database
    Not good for 3+ protein mixtures
    Spurious or missing critical mass peaks always lead to problems
    Mass resolution/accuracy is critical, best to have <20 ppm mass resolution
    Generally found to only be about 40% effective in positively identifying gel spots
    2/4/2010
    23
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 24. Protein identification strategies
    Mass Spectrometry
    Peptide Mass Fingerprinting
    Tandem Mass Spectrometry
    Spectral alignment
    de novo sequencing
    2/4/2010
    24
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 25. Tandem Mass Spectrometry
    2/4/2010
    25
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 26. MS-MS Peptide Fragmentation
    2/4/2010
    26
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 27. b-ions (prefix or N-terminal ions)
    S E Q U E N C E
    Mass/Charge (M/Z)
    2/4/2010
    27
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 28. a-ions = b-ions - CO = b-ions - 28
    S E Q U E N C E
    Mass/Charge (M/Z)
    2/4/2010
    28
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 29. y-ions (suffix of C-terminal ions)
    E C N E U Q E S
    Mass/Charge (M/Z)
    2/4/2010
    29
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 30. Intensity
    Mass/Charge (M/Z)
    2/4/2010
    30
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 31. noise
    Mass/Charge (M/Z)
    2/4/2010
    31
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 32. MS/MS Spectrum
    Intensity
    Mass/Charge (M/z)
    2/4/2010
    32
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 33. Some Mass Differences between Peaks Correspond to Amino Acids
    u
    q
    e
    e
    q
    s
    u
    e
    n
    n
    c
    e
    e
    e
    q
    c
    s
    n
    e
    s
    u
    e
    c
    e
    2/4/2010
    33
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 34. database search vsde novo
    W
    R
    V
    A
    L
    T
    Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN..
    G
    E
    P
    L
    K
    C
    W
    D
    T
    W
    R
    V
    A
    L
    T
    G
    E
    P
    L
    K
    C
    W
    D
    T
    Database Search
    de novo
    AVGELTK
    2/4/2010
    34
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 35. SEQUEST Algorithm
    SEQUEST correlates uninterpreted tandem mass (MS-MS) spectra of peptides with amino acid sequences from protein and nucleotide databases
    2/4/2010
    35
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 36. SEQUEST Algorithm
    Sequence DBCalc.TrypticFragsCalc. MS-MS Spec.
    >A
    acedfhsakdfqea
    sdfpkivtmeeewe
    ndadnfekgpfna
    >B
    acekdfhsadfqea
    sdfpkivtmeeewe
    nkdadnfeqwfe
    >C
    acedfhsadfqeka
    sdfpkivtmeeewe
    ndakdnfeqwfe
    acedfhsak
    dfgeasdfpk
    ivtmeeewendadnfek
    gpfna
    acek
    dfhsadfgeasdfpk
    ivtmeeewenk
    dadnfeqwfe
    acedfhsadfgek
    asdfpk
    ivtmeeewendak
    dnfegwfe
    2/4/2010
    36
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 37. Creating a Synthetic MS-MS Spectrum for GPFNA
    b ions y ions
    G
    57
    P
    97
    F
    147
    N
    114
    A
    71
    A
    71
    N
    114
    F
    147
    P
    97
    G
    57
    57 154 301 415 486 71 185 332 429 486
    combine
    2/4/2010
    37
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 38. SEQUEST Algorithm
    Query Spectrum Spectral Database Result
    acedfhsak
    mtlsyk
    giqwemncyk
    nmqtydr
    Score = 128
    Accession P12345
    Protein = p53
    Org. Homo sapiens
    giqwemncyk
    2/4/2010
    38
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 39. SEQUEST Xcorrhigher is better
    Cross Correlation
    (direct comparison)
    Auto Correlation
    (background)
    Correlation Score
    Offset (AMU)
    XCorr =
    Gentzel M. et al
    Proteomics3 (2003) 1597-1610
    2/4/2010
    39
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 40. Accuracy Score
    Relative Score
    Weak
    (DeltaCn)
    Strong
    (XCorr)
    SEQUEST
    Alternate
    Method
    Strong
    Weak
    Mascot and X! Tandem
    2/4/2010
    40
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 41. Mascot
    • Scoring based on peptide frequency distribution from a non-redundant database (MOWSE – Molecular Weight SEarch)
    • 42. The significance of that result depends on the size of the database being searched. Mascot shades in green the insignificant hits using a P=0.05 cutoff.
    In this example,
    scores less than 74 are
    insignificant
    Mascot Score:
    120 = 1x10-12
    2/4/2010
    41
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 43. 2/4/2010
    42
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 44. 2/4/2010
    43
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 45. 2/4/2010
    44
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 46. 2/4/2010
    45
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 47. Each search engine scores differently
    SEQUEST
    But the overlap is surprisingly small.
    Different search engines match different spectra.
    Each search engine identifies about the same number of spectra,
    9%
    4%
    22%
    34%
    19%
    7%
    Mascot
    X!tandem
    5%
    Courtesy: Proteome Software Inc.
    2/4/2010
    46
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 48. database search vsde novo
    W
    R
    V
    A
    L
    T
    Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN..
    G
    E
    P
    L
    K
    C
    W
    D
    T
    W
    R
    V
    A
    L
    T
    G
    E
    P
    L
    K
    C
    W
    D
    T
    Database Search
    de novo
    AVGELTK
    2/4/2010
    47
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 49. de novovs Database Search: A Paradox
    A database search scans all peptides to find the best one.
    de novo eliminates the need to scan all peptides by modeling the problem as a graph search.
    de novo algorithms are much faster, even though their search space is much larger!
    Done when no PMF or ms/ms spectral match
    Advantage:
    Gets the sequences that are not necessarily in the database.
    Disadvantage:
    Requires higher quality spectra to be accurate.
    2/4/2010
    48
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 50. de novo sequencing is not very accurate:
    Less than 30% of the peptides sequenced were completely correct!
    2/4/2010
    49
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 51. Protein identification strategies
    Mass Spectrometry
    Peptide Mass Fingerprinting
    Tandem Mass Spectrometry
    Spectral alignment
    de novo sequencing
    2/4/2010
    50
    OISB: The ABC of Mass Spectrometry for Biology Workshop
  • 52. References
    SLIDES
    Proteomics. 2005 Canadian Bioinformatics Workshops. David Wishart, Gary Van Domselaar. http://bioinformatics.ca/workshop_pages/proteomics2005/index.html
    Protein Sequencing and Identification by Mass Spectrometry. http://bioalgorithms.info
    Interpreting MS/MS Proteomics Results. Brian C. Searle. Proteome Software Inc
    Aebersold R, Mann M.Mass spectrometry-based proteomics. Nature. 2003 Mar 13;422(6928):198-207. Review.
    Mueller LN, Brusniak MY, Mani DR, Aebersold R. An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. J Proteome Res. 2008 Jan;7(1):51-61.
    MOWSE: Pappin DJC, Hojrup P, and Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr. Biol.3:327-332
    MASCOT: Perkins DN, Pappin DJC, Creasy DM, and Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis20:3551-3567.
    2/4/2010
    51
    OISB: The ABC of Mass Spectrometry for Biology Workshop