Data Integration, Mass Spectrometry Proteomics Software Development

1,753 views

Published on

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,753
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
90
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Data Integration, Mass Spectrometry Proteomics Software Development

  1. 2. Overview <ul><li>Quantitative proteomics </li></ul><ul><li>Data integration in kinetic modelling in systems biology </li></ul>
  2. 3. A typical proteomics experiment <ul><li>Various routes through this map </li></ul>Separating by size or charge in most cases Identify peptides as a proxy for proteins, comparing theoretical to experimental spectra
  3. 4. Quantitative proteomics <ul><li>Approach described is qualitative </li></ul><ul><ul><li>Peptides / proteins identified but not quantified </li></ul></ul><ul><li>Mass spectrometry is not quantitative per se </li></ul><ul><ul><li>Different compounds have different physiochemical properties </li></ul></ul><ul><ul><li>May ionise differently, more / less readily </li></ul></ul><ul><li>Therefore peak intensities cannot be compared between two different compounds </li></ul><ul><ul><li>Applies to peptides / proteins </li></ul></ul>
  4. 5. Quantitative proteomics <ul><li>BUT peak intensities can be compared between compounds sharing the same physiochemical properties </li></ul><ul><ul><li>Isotopes </li></ul></ul><ul><ul><li>Same physiochemical properties </li></ul></ul><ul><ul><li>Different molecular masses (ΔM = 6Da) </li></ul></ul>
  5. 6. Quantitative proteomics
  6. 7. Quantitative proteomics <ul><li>Can apply the same principle for peptides : </li></ul><ul><ul><li>IDVAVDSTGVFK </li></ul></ul><ul><ul><li>IDVAVDSTGVF K* </li></ul></ul><ul><li>Lysine (K) residue is labelled with C 13 </li></ul><ul><ul><li>Same physiochemical properties </li></ul></ul><ul><ul><li>Different molecular masses (ΔM = 6Da) </li></ul></ul>
  7. 8. Quantitative proteomics <ul><li>Absolute quantitative proteomics requires isotopically-labelled peptide of known concentration spiked into sample </li></ul><ul><li>Isotopically-identical peptides behave consistently </li></ul><ul><ul><li>Comparable peak intensity, comparable retention time </li></ul></ul><ul><li>Ratio of labelled over non-labelled peptide can be used to determined absolute concentration of sample peptide </li></ul>
  8. 9. Mixture 40:60 Data: Kathleen Carroll (Orbitrap MS) Quantitative proteomics: QconCAT
  9. 10. Quantitative proteomics: QconCAT <ul><li>Requirements: </li></ul><ul><ul><li>Determine absolute protein concentrations under a given cellular condition </li></ul></ul><ul><ul><li>Quantify a number (~50) proteins simultaneously </li></ul></ul><ul><li>Apply QconCAT methodology </li></ul><ul><ul><li>Allows simultaneous introduction of many labelled peptides into sample </li></ul></ul><ul><ul><li>Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nat Protoc . 2006, 1 :1029-43. </li></ul></ul>
  10. 11. Quantitative proteomics: QconCAT <ul><li>Construct an artificial protein containing many peptides </li></ul><ul><ul><li>At least one from each protein of interest </li></ul></ul><ul><ul><li>Ensure that the artificial protein is isotopically-labelled </li></ul></ul>
  11. 12. <ul><li>Numerous absolute protein quantitations can be performed simultaneously </li></ul>Quantitative proteomics: QconCAT
  12. 13. … from instrument to browser <ul><li>From an QconCAT informatics perspective, there are three steps … </li></ul><ul><ul><li>Selection of QconCAT peptides </li></ul></ul><ul><ul><li>Analysis and submission of data </li></ul></ul><ul><ul><li>Browsing / querying </li></ul></ul>
  13. 14. Selection of QconCAT peptides <ul><li>Q. Given a given protein, which peptides are suitable candidates for QconCAT peptides? </li></ul><ul><li>Must… </li></ul><ul><li>Be unique across organism </li></ul><ul><li>Be detectable (digestible, flyable) </li></ul><ul><li>Preferably… </li></ul><ul><li>Be unmodified </li></ul>
  14. 15. QconCAT Selection Wizard <ul><li>Takes protein accession numbers as input (and other parameters) </li></ul><ul><li>Provides list of potential QconCAT peptides </li></ul><ul><ul><li>Downloads sequence </li></ul></ul><ul><ul><li>Performs BLAST against species-specific UniProt (tests uniqueness) </li></ul></ul><ul><ul><li>Filters peptides “appropriately” </li></ul></ul><ul><ul><li>Applies score to peptide, using PeptideSieve (predict flyability) </li></ul></ul><ul><ul><li>Computational prediction of proteotypic peptides for quantitative proteomics. Mallick P, et al. Nat Biotechnol . 2007, 25 :125-31. </li></ul></ul>
  15. 19. QconCAT… Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Pratt JM, et al. Nature Protocols 1, 1029-1043 (2006)
  16. 20. QconCAT data analysis <ul><li>Identify and quantify peptides / proteins of interest </li></ul><ul><li>Generate results in standard data format </li></ul><ul><ul><li>Facilitates data sharing </li></ul></ul><ul><ul><li>Exploit existing software tools </li></ul></ul><ul><li>PRIDE XML </li></ul><ul><ul><li>PRoteomics IDEntifications </li></ul></ul><ul><ul><li>Community developed standard </li></ul></ul><ul><ul><li>http://www.ebi.ac.uk/pride/ </li></ul></ul>
  17. 21. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  18. 22. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  19. 23. Pride Converter <ul><li>Pride Converter (EBI) used to extract meta-data </li></ul><ul><ul><li>Who ran the sample, what was the sample, instrument used? etc. </li></ul></ul><ul><ul><li>http://code.google.com/p/pride-converter/ </li></ul></ul><ul><ul><li>PRIDE Converter: making proteomics data-sharing easy. Barsnes H, et al. Nat Biotechnol . 2009, 27 :598-9. </li></ul></ul><ul><li>Simple wizard allowing experimental data to be marked up with meta-data </li></ul>
  20. 24. Pride Converter
  21. 25. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  22. 26. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  23. 27. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  24. 28. QconCAT PrideWizard: Identify <ul><li>Goal: to identify heavily-labelled QconCAT peptides </li></ul><ul><ul><li>Uses Mascot </li></ul></ul><ul><ul><li>http://www.matrixscience.com/search_form_select.html </li></ul></ul><ul><ul><li>De facto standard database search engine for identifying peptides / proteins </li></ul></ul>
  25. 30. QconCAT PrideWizard: Identify <ul><li>Mascot results are parsed to find labelled QconCAT peptides: </li></ul>
  26. 31. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  27. 32. QconCAT PrideWizard: Quantify <ul><li>Goal: to quantify heavily-labelled QconCAT peptides </li></ul><ul><li>We now know m/z and retention time of peak identified as a QconCAT peptide </li></ul><ul><li>First step: extract mass chromatogram for both heavy (labelled) and light (unlabelled) peptide </li></ul>
  28. 33. QconCAT PrideWizard: Quantify <ul><li>Extracted mass chromatograms </li></ul><ul><ul><li>Heavy and light peptide should overlay as they should have same retention time </li></ul></ul>
  29. 34. QconCAT PrideWizard: Quantify <ul><li>Could use peak areas to quantify heavy versus light </li></ul><ul><ul><li>BUT hard (and inaccurate) to determine start and end </li></ul></ul>
  30. 35. QconCAT PrideWizard: Quantify <ul><li>Alternative: extract individual scans showing isotopic clusters for both heavy and light </li></ul>
  31. 36. QconCAT PrideWizard: Quantify <ul><li>Apply sliding window and plot heavy versus light: </li></ul>
  32. 37. QconCAT PrideWizard: Quantify <ul><li>Final step: apply linear regression to determine heavy:light ratio (and an error ): </li></ul>
  33. 38. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  34. 39. QconCAT data analysis eXist database PRIDE XML Identify QconCAT Pride Wizard Quantify Format Upload Web / web service Browser Mascot PRIDE XML PRIDE Converter mzData
  35. 40. MCISB Proteome Database <ul><li>Searchable repository of quantitative proteomics data </li></ul><ul><li>Geeky bit… </li></ul><ul><ul><li>eXist native XML database holding PRIDE XML </li></ul></ul><ul><ul><li>JSP front end </li></ul></ul><ul><ul><li>Querying extensible through XQuery </li></ul></ul><ul><li>Web and web-service interface </li></ul><ul><ul><li>Both human and computer-queryable </li></ul></ul>
  36. 42. QconCAT informatics pipeline <ul><li>Reference: </li></ul><ul><ul><li>A QconCAT informatics pipeline for the analysis, visualization and sharing of absolute quantitative proteomics data. Swainston N, et al. Proteomics . 2011, 11 :329-33. </li></ul></ul>
  37. 43. Data Integration
  38. 44. Systems biology modelling Enzyme kinetics Quantitative metabolomics Quantitative proteomics Systems Biology Model Parameters (K M , K cat ) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web service Web service Web service MeMo-RK Web service
  39. 45. Systems biology modelling Enzyme kinetics Quantitative metabolomics Quantitative proteomics Systems Biology Model Parameters (K M , K cat ) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web service Web service Web service MeMo-RK Web service
  40. 46. Systems biology modelling Enzyme kinetics Quantitative metabolomics Quantitative proteomics Systems Biology Model Parameters (K M , K cat ) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web service Web service Web service MeMo-RK Web service
  41. 47. Systems biology modelling Enzyme kinetics Quantitative metabolomics Quantitative proteomics Systems Biology Model Parameters (K M , K cat ) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web service Web service Web service MeMo-RK Web service
  42. 48. Modelling life-cycle workflows
  43. 49. From experiment to simulation Kinetic models Experimental data Systematic integration of experimental data and models in systems biology. Li P, et al. BMC Bioinformatics . 2010, 11 :582.
  44. 50. Conclusion <ul><li>An informatics pipeline has been developed for analysis of quantitative proteomics data </li></ul><ul><ul><li>Data is associated with metadata , identified , quantified , and uploaded to database </li></ul></ul><ul><ul><li>Community standards have been followed </li></ul></ul><ul><li>Experimental data can be incorporated in systems biology models </li></ul><ul><ul><li>Allows simulations of biological systems to be performed </li></ul></ul>
  45. 51. Thanks…

×