Confident Phosphopeptide Identificationand Phosphosite Localization by LC-MS/MS                   Karl R. Clauser         ...
Topics Covered•   Basics of phospho site identification and localization•   Evolution of phosphoproteomic literature MS/MS...
Localizing a Phosphorylation Site                  L/F|P/A/D|T/s/P/S T AT K                  L/F|P/A/D|t S/P/S T AT K     ...
PTM Site Localization                   Test all Locations, Examine Score Gaps            Locations Tested                ...
PTM Site Localization – Confident Localization              (K)A/P|s|L/T D|LV K(S)              APS(0.99)LT(0.0)DLVK      ...
PTM Site Localization – Ambiguous Localization          (R)S s/S/A/G/P E/G/P Q L|D|V|P R(E)         S(0.50)S(0.50)S(0.0)AG...
PTM Site Localization – Ambiguous Localization      2 sites: 1 confident, 1 ambiguous        (R)V T N D|I|s/P E|s S/P G VG...
Reliability of LC/MS/MS Phosphoproteomic Literature ~2005Citation                             Approach             Instrum...
MCP Guideline for publishing PTM data ~2010                                                  http://www.mcponline.org/III....
Supplemental Table Links to Each Labeled Spectrum                                                    10
Spectrum Mill Scoring of MS/MS Interpretations                                                                      Score ...
Spectrum Mill Variable Modification Localization ScoreVML score = Difference in Score of same identified sequences with di...
**        13
VML Scoring - Room for Improvement           b2          VML score: 1.09     S(0.50)Q T(0.50)PPGVAT(0.0)PPIPK             ...
VML Scoring - Room for Improvement                 VML score: 0.49 S(0.0)T(0.0)S(0.25)T(0.25)PT(0.25)S(0.25)PGPR  S(0.0)T(...
Phosphosite Localization Scoring - Ascore                                             7                                   ...
Phosphosite Localization Scoring - Andromeda                                                     P = (k!/[n!(n-k)!] [pk] [...
True Probability or Just Effective Scores?Peak selection assumptions   • All regions of spectrum equally likely        • m...
Phosphosite Localization Scoring - PhosphoRS                                                                        N: tot...
Key Aspects of Scoring Localizations• Select peaks in spectrum to be used for identification/localization• Test all sequen...
A B     R FProteome Informatics   Research Group                         iPRG: Informatic Evaluation of                   ...
A B     R FProteome Informatics                       Study Goals   Research Group        1. Evaluate the consistency of r...
A B     R FProteome Informatics                       Study Design   Research Group        • Use a common dataset        •...
A B     R FProteome Informatics                       Study Materials and Instructions to Participants   Research Group   ...
A B     R FProteome Informatics                          Reporting Template   Research Group                              ...
A B     R F               • 59 requests / 32 submissions (54% return)Proteome Informatics              2 retractions     ...
M                                                                                                                         ...
A B     R F               The SCX/IMAC Enrichment Approach for PhosphoproteomicsProteome Informatics   Research Group  Sam...
A B                        R F                    Preliminary Analysis of SCX Fractions and Dataset SelectionProteome Info...
A B     R FProteome Informatics                         From 30,000 Ft.   Research Group                    8000          ...
A B     R F               Software Program AbbreviationsProteome Informatics   Research Group                             ...
A B     R FProteome Informatics                                                       Relative Performance: Identification...
A B                R F Proteome Informatics                             Room for Improvement in ID Certainty Thresholds   ...
A B     R F               Resource for Inspecting Peptide Id Certainty Overlaps - Frxn 4Proteome Informatics   Research Gr...
A B             R F                      Subset of Participants Used for Localization AnalysisProteome Informatics   Resea...
A B                       If Participants Agree on the Identity, Do They Also Agree     R FProteome Informatics   Research...
A B     R F               What Fraction of the Time Do They Agree On Localization(s)?Proteome Informatics   Research Group...
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser
Upcoming SlideShare
Loading in...5
×

Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

2,814

Published on

Topics covered:
Basics of phosphosite identification and localization
Evolution of phosphoproteomic literature MS/MS reporting
Modification site localization algorithm development
2010 ABRF-iPRG study of phosphopeptide ID and site localization
Emerging false localization rate (FLR) metrics

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,814
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Figure 1. Overview of the processing steps performed by phosphoRS. (A) The MS/MS spectrum is divided in windows of 100 m/z. (B) Dynamic peakextraction. For each m/z window the optimal peak depth is determined by calculating cumulative binomial probabilities for each isoform. (C)Cumulative binomial probabilities and peptide scores are calculated for each isoform, considering the optimal number of most intense peaks for each m/zwindow. (D) phosphoRS sequence and site probabilities are calculated, using the inverse probabilities for a random match obtained for each isoform. Figure 6. Analysis of a titania-enriched peptide mixture from a HeLacell lysate. (A) The absolute numbers of phosphorylation sites at thePSM level localized with phosphoRS (red), Ascore (gold), and MDscore(black) from a MSA-generated data set are shown. (B) Thenumbers of nonredundantphosphorylation sites localized using therespective cutoff values for the different software tools are depicted. Theoverlap and the uniquely localized sites are visualized. (C) Thepercentage and absolute numbers of localized phosphorylation sites atthe PSM level are shown for MSA (red), ETD (violet), and HCD(green) generated data sets applying phosphoRS site probability cutoffsof 0.99 and 0.75.
  • Counts of the number of Id=Y, unique peptides and, Loc=Y are given for all 3 fractions combined. Underneath is a summary of methods used to arrive at these values. The choice to more than 2 of these options appears to have improved performance.86010 – search engine is pFindThe phosphosite was localized by using the score difference betweenrank1 and rank2 provided by pFind. Then the relative difference (scoredifference was divided by top1 score) was used to determine whetherthe site was located confidently with the cut-off value of 0.05. 20109Peptides present in all mzXML files were matched against the provided Swiss-Prot database using three independent search engines: MyriMatch(2), X! Tandem(3), and InsPecT(4).   …we extracted the “consensus” sequence for each spectrum using a script written in AWK programming language. When search engines assigned the same sequence (including modification sites) to the spectrum, the consensus is considered unambiguous. Otherwise, the sequence assignments were concatenated with “/” and the consensus is considered ambiguous. We accepted all PSMs that satisfy the following criteria: 1. The PSM has either an unambiguous “consensus” sequence assignment or the ambiguity is due to differing modification locations. 2. The PSM has an FDR of < 1%, and mass error of < 50 PPM.  20441…using the UniProt FASTA file supplied for iPRG. The FASTA file was used to create an annotation file, based on version 15.11 of the UniProt Knowledgebase. ….  All phosphorylation modification (+80 Da) testing was based on the UniProt annotation file, which was used for all rounds of identification.*
  • Relative performance across fractions was not the same for all participants. CVs were lowest in Fr4 and highest in Fr12 before removing outliers.
  • Some participants reported low-scoring identifications which were indicated using Id=N. Those lower scoring hits were evaluated across participants on a spectrum-by-spectrum basis. Yellow indicates matches that were shared with the majority but indicated as below threshold. Dark Red indicates matches that were below threshold but different that the majority, and pink indicates matches that were indicated above threshold but differ than the majority of the participants.
  • FIG. 2. FLR as a function of SLIP score for an ion trap CID dataset. Histograms of FLR for a given SLIP score are plotted for thesearches considering phosphoglutamate and phosphoprolinemodification.
  • Confident Phosphopeptide Identification and Phosphosite Localization by LC-MS/MS - Karl R. Clauser

    1. 1. Confident Phosphopeptide Identificationand Phosphosite Localization by LC-MS/MS Karl R. Clauser Broad Institute of MIT and Harvard BioInfoSummer 2012 University of Adelaide December, 2012 1
    2. 2. Topics Covered• Basics of phospho site identification and localization• Evolution of phosphoproteomic literature MS/MS reporting• Modification site localization algorithm development• 2010 ABRF-iPRG study of phosphopeptide ID and site localization• Emerging false localization rate (FLR) metrics 2
    3. 3. Localizing a Phosphorylation Site L/F|P/A/D|T/s/P/S T AT K L/F|P/A/D|t S/P/S T AT K 3
    4. 4. PTM Site Localization Test all Locations, Examine Score Gaps Locations Tested ConclusionNo possible AVsEEQQPALK AVS(1.0)EEQQPALK ambiguity # PO4 sites = # S,T, or Y Single APsLTDLVK * APS(0.99)LT(0.0)DLVK Site APSLtDLVK - sSSAGPEGPQLDVPR * S(0.50)S(0.50)S(0.0)AGPEGPQLDVPR SsSAGPEGPQLDVPR * SSsAGPEGPQLDVPR - Multiple VTNDIsPEsSPGVGR * VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGR Sites VTNDIsPESsPGVGR * VTNDISPEssPGVGR - VtNDIsPESSPGVGR - VtNDISPEsSPGVGR - VtNDISPESsPGVGR - 4
    5. 5. PTM Site Localization – Confident Localization (K)A/P|s|L/T D|LV K(S) APS(0.99)LT(0.0)DLVK 5
    6. 6. PTM Site Localization – Ambiguous Localization (R)S s/S/A/G/P E/G/P Q L|D|V|P R(E) S(0.50)S(0.50)S(0.0)AGPEGPQLDVPR 6
    7. 7. PTM Site Localization – Ambiguous Localization 2 sites: 1 confident, 1 ambiguous (R)V T N D|I|s/P E|s S/P G VG R(R) VT(0.0)NDIS(0.99)PES(0.50)S(0.50)PGVGR 7
    8. 8. Reliability of LC/MS/MS Phosphoproteomic Literature ~2005Citation Approach Instrument #sites #ambiguous Scores Site Supplem. sites Shown Ambiq Labeled Shown SpectraBallif, BA,…Gygi, SP 1DGel LCQ Deca XP 546 86 yes yes no2004 MCP, 3, digest, SCX1093-1101 LC/MS/MSRush, J, … Comb, MJ digest lysate LCQ Deca XP 628 0 yes no no2005, Nat Biotech, 23, pTyr Ab94-101 LC/MS/MSCollins, MO, …Grant, SGN protein IMAC Q-Tof Ultima 331 42 no yes no2005, J Biol Chem, 280, peptide IMAC5972-5982 LC/MS/MSGruhler, A, … Jensen, ON digest lysate LTQ-FT 729 0 yes no no2005 MCP, 4, SCX, IMAC310-327 LC/MS/MS“Resulting sequences were inspected manually …. When the exact site of phosphorylation could not be assigned for a given phosphopeptide, it wastabulated as ambiguous.”“All spectra supporting the final list of assigned peptides used to build the tables shown here were reviewed by at least three people to establishtheir credibility.”“Assignment of phosphorylation sites was verified manually with the aid of PEAK Studio (Bioinformatics Solutions) software.”“All identified phosphopeptides were manually validated, and localization of phosphorylated residues within the individual peptide sequences weremanually assigned…” 8
    9. 9. MCP Guideline for publishing PTM data ~2010 http://www.mcponline.org/III. POST-TRANSLATIONAL MODIFICATIONSStudies focusing on posttranslational modifications (PTMs) require specialized methodology and documentation to assign thetype(s) and site(s) of the modification(s). The guidelines in this section apply to PTMs that occur under physiological conditionsand to which biological significance may be assigned, such as phosphorylation, glycosylation, etc. as well as purposefullyinduced chemical modifications of central importance to the results of the study, such as chemical cross‐linking. These guidelinesdo not apply to common modifications arising from sample handling or preparation such as oxidation of Met or alkylation of Cys.In addition to the tabular presentation(s) of the data described in guideline II, the following information is required: • The site(s) of modification Within each peptide sequence, all modifications must be clearly located (unless ambiguous; see below) and the manner in which this was accomplished (through computation or manual inspection) must be described. • A justification for any localization score threshold employed. • Ambiguous assignments: Peptides containing ambiguous PTM site localizations must be listed in a separate table from those with unambiguous site localizations. In cases where there are multiple modification sites and at least one is ambiguous, then these peptides should be listed with the ambiguous assignments. Ambiguous assignments must clearly labeled as such. Examples of ambiguities include: • Modified peptides in which one or more modification sites are ambiguous. • Instances where the peptide sequence is repeated in the same protein so the specific modification site cannot be assigned. • Instances in which the same peptide is repeated in multiple proteins, e.g. paralogs and splice variants (See also Section IV). • Isobaric modifications (e.g., acetylation vs. trimethylation, phosphorylation vs. sulfonation etc), where the possibilities may not be distinguished. Examples of methods able to distinguish between these include mass spectrometric approaches such as accurate mass determination, observation of signature fragment ions (e.g. m/z 79 vs. m/z 80 in negative ion mode for assignment of phosphorylation over sulfonation), or biological or chemical strategies. • Annotated, mass labeled spectra: Spectra for ALL modified peptides must be either submitted to a public repository or accompany the manuscript as described in guideline II. 9
    10. 10. Supplemental Table Links to Each Labeled Spectrum 10
    11. 11. Spectrum Mill Scoring of MS/MS Interpretations Score = Assignment Bonus (Ion Type Weighted) + Marker Ion Bonus (Ion Type Weighted) - Non-assignment Penalty (Intensity Weighted)Peak Selection: De-Isotoping, S/N thresholding, Parent - neutral removal, Charge assignmentMatch to Database Candidate Sequences SPI (%) Scored Peak Intensity 12.68 92% 11
    12. 12. Spectrum Mill Variable Modification Localization ScoreVML score = Difference in Score of same identified sequences with different variable modification localizationsVML score > 1.1 indicates confident localizationWhy a threshold value of 1.1? 1 implies that there is a distinguishing ion of b or y ion type 0.1 means that when unassigned, the peak is 10% the intensity of the base peak 12
    13. 13. ** 13
    14. 14. VML Scoring - Room for Improvement b2 VML score: 1.09 S(0.50)Q T(0.50)PPGVAT(0.0)PPIPK y12 14
    15. 15. VML Scoring - Room for Improvement VML score: 0.49 S(0.0)T(0.0)S(0.25)T(0.25)PT(0.25)S(0.25)PGPR S(0.0)T(0.0)[S(0.5)T(0.5)]P[T(0.5)S(0.5)]PGPR 15
    16. 16. Phosphosite Localization Scoring - Ascore 7 0.07 0.07 http://ascore.med.harvard.edu/Beausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP (2006) Nat Biotechnol 24:1285–1292. Supports Sequest results only, Linux only 16
    17. 17. Phosphosite Localization Scoring - Andromeda P = (k!/[n!(n-k)!] [pk] [(1-p) (n-k) ]) = (k!/[n!(n-k)!] [0.04k] [(0.96) (n-k) ]) PTM score = -10 x log (P)p: 0.04 - use the 4 most intense fragment ions per 100 m/z unitsn: total num possible b/y ions in the observed mass range for all possible combinations of PO 4 sites in a peptidek: number of peaks matching n Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Cell (2006), 127 (3), 635–48. Olsen, J.V., and Mann, M. Proc. Natl. Acad. Sci. USA. (2004) 101, 13417–13422. 17
    18. 18. True Probability or Just Effective Scores?Peak selection assumptions • All regions of spectrum equally likely • multiply charged fragments below precursor • some 100-300 m/z values not possible, dipeptide AA combinations • tolerance in Da, not ppm • Tall and short peak intensities equally diagnosticFragment ion type assumptions • All ion types equally probable • Neutral losses ignored, y-H3P04, y-H2O 18
    19. 19. Phosphosite Localization Scoring - PhosphoRS N: total # of extracted peaks d: fragment ion mass tolerance w: full mass range of spectrum Score all theoretical fragment ions, not just site determining ions.Taus, T., Kocher, T., Pichler, P., Paschke, C., Schmidt, A., Henrich, C., and Mechtler, K. (2011) J Proteome Res. 10(12): 5354-62. 19
    20. 20. Key Aspects of Scoring Localizations• Select peaks in spectrum to be used for identification/localization• Test all sequence/location possibilities• Assign fragment ion types to peaks • Allow for peaks to have different ion type assignments for conflicting localization possibilities• Use score differences to make decision on localization certainty/ambiguity • Decide upon conservative/aggressive thresholds.• Provide a clear representation of the certainty/ambiguity in localization of each site • Allow for multiple sites with mix of certainty and ambiguity in localization • Distinguish between: • Ambiguity – no distinguishing evidence, i.e. either possibility • Ambiguity – conflicting evidence, multiple co-eluting isoforms present How can we calculate a false localization rate as a standard measure of certainty for phosphosite assignment across a dataset? 20
    21. 21. A B R FProteome Informatics Research Group iPRG: Informatic Evaluation of Phosphopeptide Identification and Phosphosite Localization ABRF 2010, Sacramento, CA March 22, 2010 21
    22. 22. A B R FProteome Informatics Study Goals Research Group 1. Evaluate the consistency of reporting phosphopeptide identifications and phosphosite localization across laboratories 2. Characterize the underlying reasons why result sets differ 3. Produce a benchmark phosphopeptide dataset, spectral library and analysis resource 22
    23. 23. A B R FProteome Informatics Study Design Research Group • Use a common dataset • Use a common sequence database • Allow participants to use the bioinformatic tools and methods of their choosing • Use a common reporting template • Fix the identification confidence (1% FDR) • Require an indication of phosphosite ambiguity per spectrum • Ignore protein inference – for now 23
    24. 24. A B R FProteome Informatics Study Materials and Instructions to Participants Research Group • 1 Orbitrap XL dataset (3 1. Analyze the dataset files) 2. Report the phosphopeptide – RAW, mzML, mzXML, spectrum matches in the MGF, pkl or dta – provided template conversions by 3. Complete an on-line survey ProteoWizard 4. Attach a 1-2 page description • 1 FASTA file (SwissProt of your methodology human seq’s. v57.1) • 1 template (Excel) • 1 on-line survey (Survey Monkey) 24
    25. 25. A B R FProteome Informatics Reporting Template Research Group ABRF iPRG 2010 Study Template: Phosphorylated Peptide Analysis Instructions: Please fill in all REQUIRED fields. After deleting the example rows, create a new row for eachphosphopeptide spectrum match. Multiple rows MAY be used to report ambiguous phosphosite localizations. Phosphorylated residues MUST be indicated in the Peptide Sequence field, and results should be sorted by Peptide Identification Score from most to least confident. Additional instructions can be found above each field header. Results should be emailed to anonymous.iPRG2010@gmail.com no later than Jan. 10, 2010. Please make sure to fill out the REQUIRED survey ---------------------> REQUIRED FIELDS Y indicates this match Indicate Y if ALL is BETTER than the phosphorylations Identifiers should be Use lowercase s, t or y (e.g. SLsGSsPCPK) OR a Protein identifier(s) confidence threshold. have been Peptide unique scan numbers trailing symbol (e.g. SLS#GS#PCPK) OR a string from Fasta file. Use Total number N indicates the match confidently identification from data file but in parentheses (e.g. SLS(ph)GS(ph)PCPK) multiple values if of is WORSE. Please localized. N if score reported may also refer to a immediately following each phosphorylated peptide is found in phosphorylati report BOTH types of one or more by search engine Name of data file merged range of Precursor Precursor residue. Only phosphorylation of S, T and Y will multiple proteins, ons as identifications in your have not. Are (e.g., E-value, p- (e.g., MS/MS scans (e.g., m/z as charge be compared; all other modifications (e.g., e.g., Q9NZ18; evidenced by ranked list. Is this ALL value, D20090930_PM_ Scan:19, submited reported by oxidized M) will be ignored. It will be assumed Q9UQ35. Protein the precursor match above 1% FDR phosphosites probability, K562_SCX- 2316.19.19.3.dta, to search search that all modifications indicated on S, T or Y are inference will not m/z and MS2 identification threshold unambiguously Mascot score, IMAC_fxn03) 2316.19.19.3.pkl). engine engine phosphorylations. be scored. spectrum. (Y|N)? localized (Y|N)? etc.) Phosphosite Peptide Precursor Precursor Num. Peptide Identification Localization Identification File Spectrum Identifier m/z Charge Peptide Sequence Accession(s) Phospho sites Certainty Certainty Score D20090930_PM_K562_SCX-IMAC_fxn03 Scan:908 558.7576 2 qGsPVAAGAPAK Q9NZI8 1Y Y 0.0002097 D20090930_PM_K562_SCX-IMAC_fxn04 Scan:2017 710.82233 2 TsPDPSPVSAAPSK Q13469 1Y N 45.41 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:683 692.28891 2 _APQTS(ph)S(ph)SPPPVR_ Q8IYB3 2Y N 30.09 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:4832 775.3548 2 SQtPPGVAtPPIPK Q15648 2Y N 31.79 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:641 590.2127 2 SLsGSsPcPK Q9UQ35 2Y N 0.0112023 D20090930_PM_K562_SCX-IMAC_fxn03 Scan:641 590.2127 2 sLSGSsPcPK Q9UQ35 2Y N 0.0915611 25
    26. 26. A B R F • 59 requests / 32 submissions (54% return)Proteome Informatics  2 retractions Resource Lab Status Research Group  + 7 iPRG members and 1 guest 3% Conduct both core functions and non- Membership (n=33) 43% 39% core lab research Core only 15% 45% Non-core research lab 55% ABRF Member Non-member Primary Job Function 3% Type of Lab 18% Bioinformatician/Developer 6% 6% Director/Manager 6% Academic 12% 58% Lab Scientist 9% Biotech/Pharma/Industry 9% Mass Spectrometrist 73% Contract Research Org Other Government Other Location Proteomics Experience 20 Asia 6% 9% 15 15% Australia/New Zealand 10 70% Europe 5 0 North Amercia 1-2 years 3-4 years 5-10 years >10 years Unanswered 26
    27. 27. M 0 2 4 6 8 10 12 14 16 0 1 2 3 4 5 6 as R F A B A X! co sc Ta t Research Group or nd e Proteome Informatics em O M cu SS st SE A om Q U M ES yr T iM In at -h ch ou Pe in se pt -h id ou M eP se ax ro Q ph ua et nt Sc af fo m ld sI In ns sP pe ec ct Pe T pA RM M yr Pe L iM pt at iz er Software Tools Used ch pF in N d N Sc or TP e iP P ro ph M et ax Q P e p tid e Id e n tificatio n ua P h o sp h o site Lo calizatio n PL S m nt sIPh M ns os SP pe ph ep ct in O Se at pe or nM arPh ch Pr S/ os ot TO ph ei PP oS nP co ro re ph et Pr Pv ie op Sp w ho ec ss Sp tr i ec aSSp tr T ec um tr M um ill M th il l eg pm27
    28. 28. A B R F The SCX/IMAC Enrichment Approach for PhosphoproteomicsProteome Informatics Research Group Sample: 7.5x10e7 human K562 human chronic myelogenous leukemia cells, 4mg lysate Protocol: Villen, J, and Gygi, SP, Nat Prot, 2208, 3, 1630-1638. Lysis: 8M urea, 75mM NaCl, 50 mM Tris pH 8.2, phosphatase inhibitors SCX: PolyLC - Polysulfoethyl A 9.4 mm X 200mm, elute: 0-105mM KCl , 30% Acn . IMAC: Sigma - PhosSelect Fe IMAC beads, bind: 40% Acn, 0.1% formic acid, elute: 500 mM K2HPO4 pH 7 MS/MS: Thermo Fisher Orbitrap XL, high-res MS1 scans in the Orbitrap (60k), Top-8 fragmented in LTQ, exclude +1 and precursors w/ unassigned charges, 20s exclusion time, precursor mass error +/- 10 ppm 28
    29. 29. A B R F Preliminary Analysis of SCX Fractions and Dataset SelectionProteome Informatics Research Group 3500 3000 Precursor z 2500 # spectra 2000 z4 Frxn 3: multi-phosphosites z3 1500 z2 Frxn 4: single phospho, single basic 1000 Frxn 12: multi-basic residues (RHK) 500 0 2 3 4 5 6 7 8 9 10 11 12 SCX fr # 100% 100% 80% 80% % distinct peptides % distinct peptides 6SC 5SC 60% 60% Solution charge 4SC # phosphosites 3P 3SC 2P 40% 40% 1P 2SC 1SC 20% 0SC 20% -1SC 0% 0% 2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8 9 10 11 12 SCX fr # SCX fr# 29 29
    30. 30. A B R FProteome Informatics From 30,000 Ft. Research Group 8000 7000 # spectra Id Yes # spectra Loc Yes 6000 # unique Peptides UC ID Yes 5000 4000 3000 2000 1000 0 14941 87133 22730 86010 13800 20899i 53706 92536i 870486i 45682 870484i 85246 13867 40816i 20109 50308i 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 18621 74637 15769 77114 66514 77115 84940v 20441v 29850v 61963v 870486i 870484i 84940v 20441v 29850v 61963v 20899i 92536i 40816i 50308i 91943i 97219i 14941 87133 22730 86010 13800 53706 45682 85246 13867 20109 56365 66398 47587 71263 65211 63103 20814 18621 74637 15769 77114 66514 77115Participant alias Rr, Rr, Di, R,Spectral pre-processing Ih Ih Ih Ih Ih Bw Ih Ih Mq Sm Sm Mc Xc Mq Mq Bw Ih Em Ih Xc Ihprecursor m/z adjusted Y Y Y Y Y Y Y Y Y Y Y Y Ynterm acetyl Y Y Y Y Y Y Y Y Y Y Om, Xt, My, Pp, Ma, Ma, Om, TPP My, My, Ma, Se, , Om, Om, Ma, My, In, Ma, Xt* Ma, Se, Ma, Xt, Ip, Se, Ma, Xt, Xt, Om, Xt, Ma, Op, Sc, , Xt, Xt, Ma, Pp, Xt,Peptide identification Pp Sp Se Pf Pf Pp Mp Om Mq Sm Pl Sm Pl Xt Xt Ma In Ih Ma In Ma Ma Se Pz Se Xt Sc Ma Gp Sc Pv Ih Ih Sc Pf, As, Ih,Phosphosite localization Ih Ih As Ih As As Ih Ph Mq Sm Ih Sm Ih As Id Ih Ma In Mq Ps In Ih Ih As Ih Ih Pr As Ih 30
    31. 31. A B R F Software Program AbbreviationsProteome Informatics Research Group Software Program Key Ascore As Bioworks Bw Distiller Di extract_msn Em TheGPM Gp in-house Ih Inspect In IdPicker Ip The data analysis tools used by the participants iProphet Id were collected from the on-line survey as Mascot Ma msconvert Mc reported by the participants. Many participants msInspect Mi MyriMatch Mm used multiple search engines and most used a MSPepSearch + Spec Lib. Mp software tool to localize the phosphosites. MaxQuant Mq msInspect Ms Moreover, many in-house (Ih) or custom OMSSA Om OpenMS Op software tools were used in the study, only pFind Pf some of which are published. The key at the Phosphinator Ph pepARML Pl left can be used to decode the names of the PeptideProphet Pp Peptizer Pz software tools in the table above, and the table Prophossi Pr PhosphoScore Ps is sorted (by number of confident peptide Pview Pv identifications), exactly as in the histogram ReAdW Rr Scaffold Sc above. SEQUEST Se Spectrum Mill Sm SpectraST + Spec Lib. Sp Xcalibur Xc X!Tandem Xt X!Tandem (k-score) Xt* 31
    32. 32. A B R FProteome Informatics Relative Performance: Identification By Fraction Research Group 4000 # spectra Id Yes Frxn 3 3500 # spectra Id Yes Frxn 4 Performance was 3000 # spectra Id Yes Frxn 12 not equivalent # spectra Id Yes 2500 2000 across the 3 1500 fractions for all 1000 participants. 500 0 14941 87133 22730 86010 13800 84940v 53706 45682 85246 13867 20441v 20109 29850v 56365 66398 47587 71263 65211 63103 20814 61963v 18621 74637 15769 77114 66514 77115 20899i 92536i 870486i 870484i 40816i 50308i 91943i 97219i 4000 # unique peptides UC Id Yes Frxn 3 # unique peptides UC Id Yes 3500 # unique peptides UC Id Yes Frxn 4 3000 # unique peptides UC Id Yes Frxn 12 Some participants 2500 saw more unique 2000 1500 peptides than 1000 others. 500 0 14941 87133 22730 86010 13800 84940v 53706 45682 85246 13867 20441v 20109 29850v 56365 66398 47587 71263 65211 63103 20814 61963v 18621 74637 15769 77114 66514 77115 20899i 92536i 870486i 870484i 40816i 50308i 91943i 97219i 32
    33. 33. A B R F Proteome Informatics Room for Improvement in ID Certainty Thresholds Research Group Frxn 3 – most multiple phos per peptide Frxn 4 – most phosphopeptides 1800 4000 #DN Diff Id No #DN Diff Id No #SN Same Id No 1600 #SN Same Id No #DY Diff Id Yes #DY Diff Id Yes #SY Same Id Yes 1400 3000 #Y1P Id Yes single #SY Same Id Yes 1200 #Y1P Id Yes single # spectra# spectra 1000 2000 800 600 1000 400 200 0 0 870486 i 870484 i 870484 i 870486 i 14941 87133 22730 86010 13800 84940v 53706 45682 85246 13867 20441v 20109 29850v 56365 66398 47587 71263 65211 63103 20814 61963v 18621 74637 15769 77114 66514 77115 20899i 92536i 40816i 50308i 91943i 97219i 14941 87133 22730 86010 13800 84940v 53706 45682 85246 13867 20441v 20109 29850v 56365 66398 47587 71263 65211 63103 20814 61963v 18621 74637 15769 77114 66514 77115 20899i 92536i 40816i 50308i 91943i 97219i Frxn 12 – highest precursor charges Gray means – Number of spectra where < 2 2800 #DN Diff Id No #SN Same Id No people agreed on the Id 2400 #DY Diff Id Yes #SY Same Id Yes 2000 #Y1P Id Yes single 85246: 1205 spectra with 3-15 # spectra 1600 phosphosites, 624 spectra with 4-15 1200 20814: ?, Frxn 12 >> Frxn 3,4 800 77114, 77115: merged multiple scans, so 400 can’t be compared with other 33 0 870486 i 870484 i 14941 87133 22730 86010 13800 84940v 53706 45682 85246 13867 20441v 20109 29850v 56365 66398 47587 71263 65211 63103 20814 61963v 18621 74637 15769 77114 66514 77115 20899i 92536i 40816i 50308i 91943i 97219i 33
    34. 34. A B R F Resource for Inspecting Peptide Id Certainty Overlaps - Frxn 4Proteome Informatics Research Group YY: Y – identification Y – localization YN: Y – identification N – localization NS: N – identification, but top sequence same as consensus ND: N – identification, and top sequence different than consensus 34
    35. 35. A B R F Subset of Participants Used for Localization AnalysisProteome Informatics Research Group 8000 7000 # spectra Id Yes # spectra Loc Yes 6000 5000# spectra 4000 3000 2000 1000 Excluded 0 0 0% localization 1 100% localization 84940v 20441v 29850v 61963v 14941 87133 22730 86010 13800 20899i 53706 92536i 870486i 45682 870484i 85246 13867 40816i 20109 50308i 56365 66398 91943i 47587 71263 65211 63103 97219i 20814 18621 74637 15769 77114 66514 77115 35 F FDR - very high? R Replicate submission 22 RF 1 0 1 A0 F 1 CM 0 M M Merged spectra C Categorization Errors 8000 # spectra Id Yes A Y Loc only when 7000 no possible ambiguity # spectra Loc Yes 6000 5000# spectra 4000 3000 2000 1000 0 14941 87133 22730 86010 13800 20899i 53706 92536i 45682 13867 20109 50308i 56365 91943i 47587 71263 97219i 18621 870486i 84940v 20441v 61963v 35
    36. 36. A B If Participants Agree on the Identity, Do They Also Agree R FProteome Informatics Research Group Site Localization Can be Certain? No possibility of ambiguity 10.0% Frxn 4 8.0% Subset of % of spectra 6.0% 472 spectra for which 4.0% 20/22 participants 2.0% all agree on Identity 0.0% 100% 85% 70% 55% 40% 25% 10% NPA % participants indicating localization Yes 36
    37. 37. A B R F What Fraction of the Time Do They Agree On Localization(s)?Proteome Informatics Research Group 8050 spectra with > 2/22 Id Yes (Frxn 3, 4, 12) 5918/8050 spectra with > 2/22 Loc Yes and Site Ambiguity Possible # spectra 670, 11% 100% partic agree 0 1000 2000 3000 4000 5000 6000 7000 67-99% partic agree 563, 10% < 67% partic agree # Y loc 2-22 partic 5918 #Y loc 1 partic 798 # N loc all partic 498 5918 Y loc no ambiguity 836 4685, 79% For all of the participants that agree on identity when • site ambiguity is possible (#S,T,Y > # phos) • >2 participants mark Loc=Y  For 79% (4,685 of 5,918) of the spectra, all participants who mark Loc=Y unanimously agree on the localization of the phosphosites 37
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×