SlideShare a Scribd company logo
1 of 25
Download to read offline
Peptide-level robust ridge regression
modeling improves both sensitivity
and specificity in quantitative
proteomics
Ludger Goeminne Promotors:
Lieven Clement
Kris Gevaert
Klaas Vandepoele
A story about relative protein quantification
in label-free shotgun proteomics
Outline
โ€ข Problem: reliable differential quantitation
โ€ข Solution: peptide-based models
โ€ข Improving solution via:
1. Shrinkage estimation
2. Borrowing information across proteins
3. Weighing down outliers
โ€ข Leads to:
1. Better fold change estimates
2. Better sensitivity and specificity
โ€ข Conclusions: all of the above
โ€ข Acknowledgements
For each protein:
Problem: how to do differential quantification?
22.6464 Treatment 1 Peptide A Rep 1
17.85773 Treatment 1 Peptide B Rep 1
15.4947 Treatment 1 Peptide C Rep 2
14.02125 Treatment 1 Peptide D Rep 2
18.0965 Treatment 2 Peptide A Rep 3
14.59100 Treatment 2 Peptide B Rep 3
14.2959 Treatment 2 Peptide C Rep 3
๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘กlog2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ
Solution: 2 main ways:
Problem: how to do differential quantification?
Normalisation
Normalisation
+summarization
(e.g. MaxLFQ)
T-tests
(e.g.
Perseus)
Peptide
intensities
Peptide-
based
models
1. Summarize
to protein
level
2. Peptide-
based models
Peptide-based models are superior
Daly, et al. (2008), Journal of Proteome Research, 7, (3), 1209-1217.
Clough et al. (2009, Journal of Proteome Research, 8, (11), 5275-5284.
Karpievitch et al. (2009), Bioinformatics, 25, (16), 2028-2034.
Goeminne et al. (2015), Journal of Proteome Research , 14, (6), 2457-2465.
Spike-in: 48 human proteins
in yeast proteome
A model for each protein:
Peptide-based models
22.6464 Intercept Treatment 1 Peptide A Rep 1 Error 1
17.85773 Intercept Treatment 1 Peptide B Rep 1 Error 2
15.4947 Intercept Treatment 1 Peptide C Rep 2 Error 3
14.02125 Intercept Treatment 1 Peptide D Rep 2 Error 4
18.0965 Intercept Treatment 2 Peptide A Rep 3 Error 5
14.59100 Intercept Treatment 2 Peptide B Rep 3 Error 6
14.2959 Intercept Treatment 2 Peptide C Rep 3 Error 7
log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
A model for each protein:
Peptide-based models
22.6464 16 1.5 4.5 0.5 0.1464
17.85773 16 1.5 -0.2 0.5 0.05773
15.4947 16 1.5 -1 -0.7 -0.3053
14.02125 16 1.5 -2 -0.7 -0.77875
18.0965 16 -1.5 4.5 -0.3 -0.6035
14.59100 16 -1.5 -0.2 -0.3 0.59100
14.2959 16 -1.5 -1 -0.3 0.0959
log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
A model for each protein:
Peptide-based models
22.6464 16 1.5 4.5 0.5 0.1464
17.85773 16 1.5 -0.2 0.5 0.05773
15.4947 16 1.5 -1 -0.7 -0.3053
14.02125 16 1.5 -2 -0.7 -0.77875
18.0965 16 -1.5 4.5 -0.3 -0.6035
14.59100 16 -1.5 -0.2 -0.3 0.59100
14.2959 16 -1.5 -1 -0.3 0.0959
log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
A model for each protein:
Peptide-based models
22.6464 1.5
17.85773 1.5
15.4947 1.5
14.02125 1.5
18.0965 -1.5
14.59100 -1.5
14.2959 -1.5
log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
1.5
-1.5
- 3
1. Unstable DA
estimates
2. Unstable
variance
estimates
3. Outliers
Still some issuesโ€ฆ
A yeast null protein
Structure of my presentation
โ€ข Problem: reliable differential quantitation
โ€ข Solution: peptide-based models
โ€ข Improving solution via:
1. Shrinkage estimation
2. Borrowing information across proteins
3. Weighing down outliers
โ€ข Leads to:
1. Better fold change estimates
2. Better sensitivity and specificity
โ€ข Conclusions: all of the above
โ€ข Acknowledgements
Problem
1. Unstable DA estimates
2. Unstable variance
estimates
3. Outliers
Solution
1. Shrinkage estimation
2. Borrow information
across proteins
3. Weigh down outlying
peptides
How can we improve upons existing peptide-
based models?
This will lead to:
1. Better fold change estimates
2. Better ranking
1. Shrinkage estimation
E.g. ridge regression: minimize the following loss
function:
๐‘ฆ โˆ’ ๐‘‹ ๐›ฝ
2
+ ๐œ† ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก ๐›ฝ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก
2
+ ๐œ† ๐‘๐‘’๐‘ ๐›ฝ ๐‘๐‘’๐‘
2 + ๐œ†๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ ๐›ฝ๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ
2
โ€ข Penalty on the effect sizes: shrinkage toward 0
โ€ข Biased but (much) more stable estimator
โ€ข Sparse data: shrinkage โ†—
โ€ข ๐œ†s: via cross-validation or link with mixed models
2. Borrow information across proteins:
Empirical Bayes variance estimation
๐ˆ๐’Š
๐Ÿ
๐ˆ ๐ŸŽ
๐Ÿ
๐ˆ๐’Š
๐Ÿ
๐ˆ๐’Š
๐Ÿ
๐ˆ๐’Š
๐Ÿ
๐ˆ๐’Š
๐Ÿ
๐ˆ๐’Š
๐Ÿ
Data decides!
โ€ข Stabilizes variance estimates
โ€ข Get rid of proteins with low fold changes and low
variance caused by data sparsity
More details (limma paper):
Smyth (2004), Linear models and empirical bayes methods for assessing differential expression in microarray
experiments. Stat Appl Genet Mol Biol, 3, Article3.
3. Weigh down outlying peptides
E.g. M estimation with Huber weights
Minimize the following loss function:
๐’˜ ๐‘ฆ โˆ’ ๐‘‹ ๐›ฝ
2
+ ๐œ† ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก ๐›ฝ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก
2
+ ๐œ† ๐‘๐‘’๐‘ ๐›ฝ ๐‘๐‘’๐‘
2 + ๐œ†๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ ๐›ฝ๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ
2
โ€ข Weigh down outlying observations
1. Better fold change
estimates
-> More accurate and
more precise
2. Better specificity and
sensitivity
-> Improved ranking
Results!
1. Better fold change estimates
For null proteins
1. Better fold change estimates
For differentially abundant proteins
2. Better sensitivity and specificity
Conclusions
1. Use peptide-based models
2. Our peptide-based model uses:
1. Shrinkage estimation
2. Empirical Bayes variance estimation
3. Downweighing of outliers
3. Advantages:
1. More stable fold change estimates
2. Better sensititivity and specificity
Papers
Goeminne et al. (2015), Summarization vs. Peptide-
Based Models in Label-free Quantitative Proteomics:
Performance, Pitfalls and Data Analysis Guidelines.
Journal of Proteome Research.
Goeminne et al. (2015), Peptide-level robust ridge
regression modeling with Empirical Bayes variance
estimation and M estimation improves both sensitivity
and specificity in quantitative label-free shotgun
proteomics, submitted.
Acknowledgements: people
Lieven Clement
+ lab members
Kris Gevaert
+ lab members
Lennart Martens
Andrea ArgentiniKlaas Vandepoele
Acknowledgements: organizations
Thank you for your attention!
Questions?

More Related Content

Similar to presentation_MQ_Summer_schools_2015_v4

Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analysesMike LaValley
ย 
Statistics
StatisticsStatistics
Statisticsasia said
ย 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesMilliporeSigma
ย 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesMerck Life Sciences
ย 
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)Bandar AlGhamdi BSRS, MPH-HSQM
ย 
EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...ChemAxon
ย 
Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...Eduardo Castro
ย 
A brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methodsA brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methodsCreative Proteomics
ย 
Final submission โ€“Pay attention to APA formatting, spelling, and
Final submission โ€“Pay attention to APA formatting, spelling, andFinal submission โ€“Pay attention to APA formatting, spelling, and
Final submission โ€“Pay attention to APA formatting, spelling, andChereCheek752
ย 
NetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterNetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterAlexander Pico
ย 
ChiMaster Details
ChiMaster DetailsChiMaster Details
ChiMaster DetailsRoop Wadhwani
ย 
JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017Irina Silva
ย 
Panel presentation ZZZiyan
Panel presentation ZZZiyanPanel presentation ZZZiyan
Panel presentation ZZZiyanZiyan Zhao
ย 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesMilliporeSigma
ย 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesMerck Life Sciences
ย 
From sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysisFrom sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysisExpedeon
ย 
FOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific ProteinsFOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific ProteinsRandox
ย 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...ssuser4b1f48
ย 
Using Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein ExpressionUsing Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein Expressionheathermerk
ย 
proteomics lecture as an aspect of multi omics
proteomics lecture as an aspect of multi omicsproteomics lecture as an aspect of multi omics
proteomics lecture as an aspect of multi omicsAnimikh Ray
ย 

Similar to presentation_MQ_Summer_schools_2015_v4 (20)

Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analyses
ย 
Statistics
StatisticsStatistics
Statistics
ย 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell Lines
ย 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell Lines
ย 
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
ย 
EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - รdรกm Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
ย 
Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...
ย 
A brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methodsA brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methods
ย 
Final submission โ€“Pay attention to APA formatting, spelling, and
Final submission โ€“Pay attention to APA formatting, spelling, andFinal submission โ€“Pay attention to APA formatting, spelling, and
Final submission โ€“Pay attention to APA formatting, spelling, and
ย 
NetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterNetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan Schuster
ย 
ChiMaster Details
ChiMaster DetailsChiMaster Details
ChiMaster Details
ย 
JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017
ย 
Panel presentation ZZZiyan
Panel presentation ZZZiyanPanel presentation ZZZiyan
Panel presentation ZZZiyan
ย 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and Peptides
ย 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and Peptides
ย 
From sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysisFrom sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysis
ย 
FOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific ProteinsFOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific Proteins
ย 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
ย 
Using Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein ExpressionUsing Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein Expression
ย 
proteomics lecture as an aspect of multi omics
proteomics lecture as an aspect of multi omicsproteomics lecture as an aspect of multi omics
proteomics lecture as an aspect of multi omics
ย 

presentation_MQ_Summer_schools_2015_v4

  • 1. Peptide-level robust ridge regression modeling improves both sensitivity and specificity in quantitative proteomics Ludger Goeminne Promotors: Lieven Clement Kris Gevaert Klaas Vandepoele
  • 2. A story about relative protein quantification in label-free shotgun proteomics
  • 3. Outline โ€ข Problem: reliable differential quantitation โ€ข Solution: peptide-based models โ€ข Improving solution via: 1. Shrinkage estimation 2. Borrowing information across proteins 3. Weighing down outliers โ€ข Leads to: 1. Better fold change estimates 2. Better sensitivity and specificity โ€ข Conclusions: all of the above โ€ข Acknowledgements
  • 4. For each protein: Problem: how to do differential quantification? 22.6464 Treatment 1 Peptide A Rep 1 17.85773 Treatment 1 Peptide B Rep 1 15.4947 Treatment 1 Peptide C Rep 2 14.02125 Treatment 1 Peptide D Rep 2 18.0965 Treatment 2 Peptide A Rep 3 14.59100 Treatment 2 Peptide B Rep 3 14.2959 Treatment 2 Peptide C Rep 3 ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘กlog2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ
  • 5. Solution: 2 main ways: Problem: how to do differential quantification? Normalisation Normalisation +summarization (e.g. MaxLFQ) T-tests (e.g. Perseus) Peptide intensities Peptide- based models 1. Summarize to protein level 2. Peptide- based models
  • 6. Peptide-based models are superior Daly, et al. (2008), Journal of Proteome Research, 7, (3), 1209-1217. Clough et al. (2009, Journal of Proteome Research, 8, (11), 5275-5284. Karpievitch et al. (2009), Bioinformatics, 25, (16), 2028-2034. Goeminne et al. (2015), Journal of Proteome Research , 14, (6), 2457-2465. Spike-in: 48 human proteins in yeast proteome
  • 7. A model for each protein: Peptide-based models 22.6464 Intercept Treatment 1 Peptide A Rep 1 Error 1 17.85773 Intercept Treatment 1 Peptide B Rep 1 Error 2 15.4947 Intercept Treatment 1 Peptide C Rep 2 Error 3 14.02125 Intercept Treatment 1 Peptide D Rep 2 Error 4 18.0965 Intercept Treatment 2 Peptide A Rep 3 Error 5 14.59100 Intercept Treatment 2 Peptide B Rep 3 Error 6 14.2959 Intercept Treatment 2 Peptide C Rep 3 Error 7 log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
  • 8. A model for each protein: Peptide-based models 22.6464 16 1.5 4.5 0.5 0.1464 17.85773 16 1.5 -0.2 0.5 0.05773 15.4947 16 1.5 -1 -0.7 -0.3053 14.02125 16 1.5 -2 -0.7 -0.77875 18.0965 16 -1.5 4.5 -0.3 -0.6035 14.59100 16 -1.5 -0.2 -0.3 0.59100 14.2959 16 -1.5 -1 -0.3 0.0959 log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
  • 9. A model for each protein: Peptide-based models 22.6464 16 1.5 4.5 0.5 0.1464 17.85773 16 1.5 -0.2 0.5 0.05773 15.4947 16 1.5 -1 -0.7 -0.3053 14.02125 16 1.5 -2 -0.7 -0.77875 18.0965 16 -1.5 4.5 -0.3 -0.6035 14.59100 16 -1.5 -0.2 -0.3 0.59100 14.2959 16 -1.5 -1 -0.3 0.0959 log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
  • 10. A model for each protein: Peptide-based models 22.6464 1.5 17.85773 1.5 15.4947 1.5 14.02125 1.5 18.0965 -1.5 14.59100 -1.5 14.2959 -1.5 log2 ๐‘€๐‘† ๐‘–๐‘›๐‘ก๐‘’๐‘›๐‘ ๐‘–๐‘ก๐‘ฆ ~ ๐‘–๐‘›๐‘ก๐‘’๐‘Ÿ๐‘๐‘’๐‘๐‘ก + ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก + ๐‘๐‘’๐‘๐‘ก๐‘–๐‘‘๐‘’ + ๐‘Ÿ๐‘’๐‘๐‘’๐‘Ž๐‘ก + ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ 1.5 -1.5 - 3
  • 11. 1. Unstable DA estimates 2. Unstable variance estimates 3. Outliers Still some issuesโ€ฆ A yeast null protein
  • 12. Structure of my presentation โ€ข Problem: reliable differential quantitation โ€ข Solution: peptide-based models โ€ข Improving solution via: 1. Shrinkage estimation 2. Borrowing information across proteins 3. Weighing down outliers โ€ข Leads to: 1. Better fold change estimates 2. Better sensitivity and specificity โ€ข Conclusions: all of the above โ€ข Acknowledgements
  • 13. Problem 1. Unstable DA estimates 2. Unstable variance estimates 3. Outliers Solution 1. Shrinkage estimation 2. Borrow information across proteins 3. Weigh down outlying peptides How can we improve upons existing peptide- based models? This will lead to: 1. Better fold change estimates 2. Better ranking
  • 14. 1. Shrinkage estimation E.g. ridge regression: minimize the following loss function: ๐‘ฆ โˆ’ ๐‘‹ ๐›ฝ 2 + ๐œ† ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก ๐›ฝ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก 2 + ๐œ† ๐‘๐‘’๐‘ ๐›ฝ ๐‘๐‘’๐‘ 2 + ๐œ†๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ ๐›ฝ๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ 2 โ€ข Penalty on the effect sizes: shrinkage toward 0 โ€ข Biased but (much) more stable estimator โ€ข Sparse data: shrinkage โ†— โ€ข ๐œ†s: via cross-validation or link with mixed models
  • 15. 2. Borrow information across proteins: Empirical Bayes variance estimation ๐ˆ๐’Š ๐Ÿ ๐ˆ ๐ŸŽ ๐Ÿ ๐ˆ๐’Š ๐Ÿ ๐ˆ๐’Š ๐Ÿ ๐ˆ๐’Š ๐Ÿ ๐ˆ๐’Š ๐Ÿ ๐ˆ๐’Š ๐Ÿ Data decides! โ€ข Stabilizes variance estimates โ€ข Get rid of proteins with low fold changes and low variance caused by data sparsity More details (limma paper): Smyth (2004), Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 3, Article3.
  • 16. 3. Weigh down outlying peptides E.g. M estimation with Huber weights Minimize the following loss function: ๐’˜ ๐‘ฆ โˆ’ ๐‘‹ ๐›ฝ 2 + ๐œ† ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก ๐›ฝ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก 2 + ๐œ† ๐‘๐‘’๐‘ ๐›ฝ ๐‘๐‘’๐‘ 2 + ๐œ†๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ ๐›ฝ๐‘–๐‘›๐‘ ๐‘ก๐‘Ÿ 2 โ€ข Weigh down outlying observations
  • 17. 1. Better fold change estimates -> More accurate and more precise 2. Better specificity and sensitivity -> Improved ranking Results!
  • 18. 1. Better fold change estimates For null proteins
  • 19. 1. Better fold change estimates For differentially abundant proteins
  • 20. 2. Better sensitivity and specificity
  • 21. Conclusions 1. Use peptide-based models 2. Our peptide-based model uses: 1. Shrinkage estimation 2. Empirical Bayes variance estimation 3. Downweighing of outliers 3. Advantages: 1. More stable fold change estimates 2. Better sensititivity and specificity
  • 22. Papers Goeminne et al. (2015), Summarization vs. Peptide- Based Models in Label-free Quantitative Proteomics: Performance, Pitfalls and Data Analysis Guidelines. Journal of Proteome Research. Goeminne et al. (2015), Peptide-level robust ridge regression modeling with Empirical Bayes variance estimation and M estimation improves both sensitivity and specificity in quantitative label-free shotgun proteomics, submitted.
  • 23. Acknowledgements: people Lieven Clement + lab members Kris Gevaert + lab members Lennart Martens Andrea ArgentiniKlaas Vandepoele
  • 25. Thank you for your attention! Questions?