Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
PRIDE: Quality control in a proteomicsdata repositoryAttila CsordasProteomics Services TeamBiocuration ConferenceApril 2nd...
Overview              who are we?             what are we dealing with?              manual curation and submission       ...
PRIDE: http://www.ebi.ac.uk/pride       The PRoteomics IDEntifications database is       a centralised, primary, archival,...
Acknowledgements                 colleagues at the PRIDE team                             @pride_ebi                      ...
Mass spectrometryanalytical technique measuring the mass-to-charge (m/z) ratio of charged        particles to determine ma...
Shotgun/bottom-up proteomics                                                      Ppeptides                             MS...
What is a PRIDE submission?7/23        April 2, 2012
growth ofcore data types                   130 million                                   23 million                       ...
Manual curation and submission process       Search   Engine + spectra                                   PRIDE            ...
PRIDE Inspectorinitial assessmenton data qualityvisualise/check datasummary chartssupport for submitters &reviewers/editor...
Frequent Data Quality Issues                           <SearchEngine>PeptideShaker</SearchEngine>  1. syntactic problems  ...
Delta m/z of detected peptide precursorsexperimental precursor ion m/z - theoretical precursor ion m/z   source of delta m...
Fixing modifications based on delta m/z outliers13/23            April 2, 2012
Fixing modifications based on delta m/z outliers14/23            April 2, 2012
but the manual approach does not scale!15/23         April 2, 2012
10 times as many & big submissions/ day?16/23        April 2, 2012
single point of submission of data to the main repositories to encourage data exchange                          Published ...
PX submission pipeline                                                                    ProteomePX Tool                 ...
Automated regular submission pipeline         curation-submission time is ~1/6th of manual time                           ...
Conclusion                growing amount of data                growingly complex data                scalability issues  ...
21/23        April 2, 2012
Thanks for the attention!22/23        April 2, 2012
acsordas@ebi.ac.uk        Q&A                 @attilacsordas23/23        April 2, 2012
Upcoming SlideShare
Loading in …5
×

Pride quality controlattilacsordasbiocuration2012

889 views

Published on

The ppt version of a talk I gave at the Biocuration 2012 Conference in Washington DC at Georgetown University in front of ~300 people.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Pride quality controlattilacsordasbiocuration2012

  1. 1. PRIDE: Quality control in a proteomicsdata repositoryAttila CsordasProteomics Services TeamBiocuration ConferenceApril 2nd, 20121/23
  2. 2. Overview who are we? what are we dealing with? manual curation and submission quick detour: ProteomeXchange automated curation & submission pipeline conclusion April 2, 20122/23
  3. 3. PRIDE: http://www.ebi.ac.uk/pride The PRoteomics IDEntifications database is a centralised, primary, archival, public data repository for MS/MS proteomics data containing peptide ids, protein ids, mass spectra, protein expression values, metadata.3/23 April 2, 2012
  4. 4. Acknowledgements colleagues at the PRIDE team @pride_ebi pride-ebi@ebi.ac.uk pride-support@ebi.ac.uk http://code.google.com/p/pride-toolsuite/ http://code.google.com/p/pride-converter-2/4/23 April 2, 2012
  5. 5. Mass spectrometryanalytical technique measuring the mass-to-charge (m/z) ratio of charged particles to determine masses of particles, composition of samples/molecules and chemical structures of molecules April 2, 20125/23
  6. 6. Shotgun/bottom-up proteomics Ppeptides MS/MS analysis R O sequence database Tproteins O fragmentation C MS analysis O L April 2, 2012 6/23
  7. 7. What is a PRIDE submission?7/23 April 2, 2012
  8. 8. growth ofcore data types 130 million 23 million 4.6 million 8/23 April 2, 2012
  9. 9. Manual curation and submission process Search Engine + spectra PRIDE Converter pride xmlMascot (.dat),X!Tandem (.xml) + mgf9/23 April 2, 2012
  10. 10. PRIDE Inspectorinitial assessmenton data qualityvisualise/check datasummary chartssupport for submitters &reviewers/editorsmore flexible than webinterface 10/23 April 2, 2012
  11. 11. Frequent Data Quality Issues <SearchEngine>PeptideShaker</SearchEngine> 1. syntactic problems <PeptideItem> 2a. core data missing no protein/peptide identifications 2b. or metadata missing no species 3.inconsistent/incorrect data protein modifications11/23 April 2, 2012
  12. 12. Delta m/z of detected peptide precursorsexperimental precursor ion m/z - theoretical precursor ion m/z source of delta m/z outliers: incorrect or missing protein modifications and charge state misassignments 12/23 April 2, 2012
  13. 13. Fixing modifications based on delta m/z outliers13/23 April 2, 2012
  14. 14. Fixing modifications based on delta m/z outliers14/23 April 2, 2012
  15. 15. but the manual approach does not scale!15/23 April 2, 2012
  16. 16. 10 times as many & big submissions/ day?16/23 April 2, 2012
  17. 17. single point of submission of data to the main repositories to encourage data exchange Published Raw Reprocessed Individualsubmissions PeptideAtlas EBI PRIDE Raw files Users archiveLarge-scalesubmissions UniProt Other DBs (GPMDB, …)17/23 April 2, 2012
  18. 18. PX submission pipeline ProteomePX Tool Validation Submission Publication Central Files Raw PRIDE Files XML Summary18/23 April 2, 2012
  19. 19. Automated regular submission pipeline curation-submission time is ~1/6th of manual time actionable curation summary number of files: 3 Project: Combined personal saliva proteome and microbioproteome XML generator software PRIDE Converter Toolsuite 2.0- SNAPSHOTFilename size Species #Proteins #Peptides #Spectra #Unid-d PTMs % delta spectra m/z outlier22143. 3.3 GB Homo 4128 60544 184209 123665 3 0.0xml sapiens spectra spectra 19/23 April 2, 2012
  20. 20. Conclusion growing amount of data growingly complex data scalability issues overcoming them by automation and new, smarter curation strategies20/23 April 2, 2012
  21. 21. 21/23 April 2, 2012
  22. 22. Thanks for the attention!22/23 April 2, 2012
  23. 23. acsordas@ebi.ac.uk Q&A @attilacsordas23/23 April 2, 2012

×