Pride quality controlattilacsordasbiocuration2012

782 views
686 views

Published on

The ppt version of a talk I gave at the Biocuration 2012 Conference in Washington DC at Georgetown University in front of ~300 people.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
782
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Pride quality controlattilacsordasbiocuration2012

  1. 1. PRIDE: Quality control in a proteomicsdata repositoryAttila CsordasProteomics Services TeamBiocuration ConferenceApril 2nd, 20121/23
  2. 2. Overview who are we? what are we dealing with? manual curation and submission quick detour: ProteomeXchange automated curation & submission pipeline conclusion April 2, 20122/23
  3. 3. PRIDE: http://www.ebi.ac.uk/pride The PRoteomics IDEntifications database is a centralised, primary, archival, public data repository for MS/MS proteomics data containing peptide ids, protein ids, mass spectra, protein expression values, metadata.3/23 April 2, 2012
  4. 4. Acknowledgements colleagues at the PRIDE team @pride_ebi pride-ebi@ebi.ac.uk pride-support@ebi.ac.uk http://code.google.com/p/pride-toolsuite/ http://code.google.com/p/pride-converter-2/4/23 April 2, 2012
  5. 5. Mass spectrometryanalytical technique measuring the mass-to-charge (m/z) ratio of charged particles to determine masses of particles, composition of samples/molecules and chemical structures of molecules April 2, 20125/23
  6. 6. Shotgun/bottom-up proteomics Ppeptides MS/MS analysis R O sequence database Tproteins O fragmentation C MS analysis O L April 2, 2012 6/23
  7. 7. What is a PRIDE submission?7/23 April 2, 2012
  8. 8. growth ofcore data types 130 million 23 million 4.6 million 8/23 April 2, 2012
  9. 9. Manual curation and submission process Search Engine + spectra PRIDE Converter pride xmlMascot (.dat),X!Tandem (.xml) + mgf9/23 April 2, 2012
  10. 10. PRIDE Inspectorinitial assessmenton data qualityvisualise/check datasummary chartssupport for submitters &reviewers/editorsmore flexible than webinterface 10/23 April 2, 2012
  11. 11. Frequent Data Quality Issues <SearchEngine>PeptideShaker</SearchEngine> 1. syntactic problems <PeptideItem> 2a. core data missing no protein/peptide identifications 2b. or metadata missing no species 3.inconsistent/incorrect data protein modifications11/23 April 2, 2012
  12. 12. Delta m/z of detected peptide precursorsexperimental precursor ion m/z - theoretical precursor ion m/z source of delta m/z outliers: incorrect or missing protein modifications and charge state misassignments 12/23 April 2, 2012
  13. 13. Fixing modifications based on delta m/z outliers13/23 April 2, 2012
  14. 14. Fixing modifications based on delta m/z outliers14/23 April 2, 2012
  15. 15. but the manual approach does not scale!15/23 April 2, 2012
  16. 16. 10 times as many & big submissions/ day?16/23 April 2, 2012
  17. 17. single point of submission of data to the main repositories to encourage data exchange Published Raw Reprocessed Individualsubmissions PeptideAtlas EBI PRIDE Raw files Users archiveLarge-scalesubmissions UniProt Other DBs (GPMDB, …)17/23 April 2, 2012
  18. 18. PX submission pipeline ProteomePX Tool Validation Submission Publication Central Files Raw PRIDE Files XML Summary18/23 April 2, 2012
  19. 19. Automated regular submission pipeline curation-submission time is ~1/6th of manual time actionable curation summary number of files: 3 Project: Combined personal saliva proteome and microbioproteome XML generator software PRIDE Converter Toolsuite 2.0- SNAPSHOTFilename size Species #Proteins #Peptides #Spectra #Unid-d PTMs % delta spectra m/z outlier22143. 3.3 GB Homo 4128 60544 184209 123665 3 0.0xml sapiens spectra spectra 19/23 April 2, 2012
  20. 20. Conclusion growing amount of data growingly complex data scalability issues overcoming them by automation and new, smarter curation strategies20/23 April 2, 2012
  21. 21. 21/23 April 2, 2012
  22. 22. Thanks for the attention!22/23 April 2, 2012
  23. 23. acsordas@ebi.ac.uk Q&A @attilacsordas23/23 April 2, 2012

×