SlideShare a Scribd company logo
Peptide-level robust ridge regression
modeling improves both sensitivity
and specificity in quantitative
proteomics
Ludger Goeminne Promotors:
Lieven Clement
Kris Gevaert
Klaas Vandepoele
A story about relative protein quantification
in label-free shotgun proteomics
Outline
• Problem: reliable differential quantitation
• Solution: peptide-based models
• Improving solution via:
1. Shrinkage estimation
2. Borrowing information across proteins
3. Weighing down outliers
• Leads to:
1. Better fold change estimates
2. Better sensitivity and specificity
• Conclusions: all of the above
• Acknowledgements
For each protein:
Problem: how to do differential quantification?
22.6464 Treatment 1 Peptide A Rep 1
17.85773 Treatment 1 Peptide B Rep 1
15.4947 Treatment 1 Peptide C Rep 2
14.02125 Treatment 1 Peptide D Rep 2
18.0965 Treatment 2 Peptide A Rep 3
14.59100 Treatment 2 Peptide B Rep 3
14.2959 Treatment 2 Peptide C Rep 3
𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦
Solution: 2 main ways:
Problem: how to do differential quantification?
Normalisation
Normalisation
+summarization
(e.g. MaxLFQ)
T-tests
(e.g.
Perseus)
Peptide
intensities
Peptide-
based
models
1. Summarize
to protein
level
2. Peptide-
based models
Peptide-based models are superior
Daly, et al. (2008), Journal of Proteome Research, 7, (3), 1209-1217.
Clough et al. (2009, Journal of Proteome Research, 8, (11), 5275-5284.
Karpievitch et al. (2009), Bioinformatics, 25, (16), 2028-2034.
Goeminne et al. (2015), Journal of Proteome Research , 14, (6), 2457-2465.
Spike-in: 48 human proteins
in yeast proteome
A model for each protein:
Peptide-based models
22.6464 Intercept Treatment 1 Peptide A Rep 1 Error 1
17.85773 Intercept Treatment 1 Peptide B Rep 1 Error 2
15.4947 Intercept Treatment 1 Peptide C Rep 2 Error 3
14.02125 Intercept Treatment 1 Peptide D Rep 2 Error 4
18.0965 Intercept Treatment 2 Peptide A Rep 3 Error 5
14.59100 Intercept Treatment 2 Peptide B Rep 3 Error 6
14.2959 Intercept Treatment 2 Peptide C Rep 3 Error 7
log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟
A model for each protein:
Peptide-based models
22.6464 16 1.5 4.5 0.5 0.1464
17.85773 16 1.5 -0.2 0.5 0.05773
15.4947 16 1.5 -1 -0.7 -0.3053
14.02125 16 1.5 -2 -0.7 -0.77875
18.0965 16 -1.5 4.5 -0.3 -0.6035
14.59100 16 -1.5 -0.2 -0.3 0.59100
14.2959 16 -1.5 -1 -0.3 0.0959
log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟
A model for each protein:
Peptide-based models
22.6464 16 1.5 4.5 0.5 0.1464
17.85773 16 1.5 -0.2 0.5 0.05773
15.4947 16 1.5 -1 -0.7 -0.3053
14.02125 16 1.5 -2 -0.7 -0.77875
18.0965 16 -1.5 4.5 -0.3 -0.6035
14.59100 16 -1.5 -0.2 -0.3 0.59100
14.2959 16 -1.5 -1 -0.3 0.0959
log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟
A model for each protein:
Peptide-based models
22.6464 1.5
17.85773 1.5
15.4947 1.5
14.02125 1.5
18.0965 -1.5
14.59100 -1.5
14.2959 -1.5
log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟
1.5
-1.5
- 3
1. Unstable DA
estimates
2. Unstable
variance
estimates
3. Outliers
Still some issues…
A yeast null protein
Structure of my presentation
• Problem: reliable differential quantitation
• Solution: peptide-based models
• Improving solution via:
1. Shrinkage estimation
2. Borrowing information across proteins
3. Weighing down outliers
• Leads to:
1. Better fold change estimates
2. Better sensitivity and specificity
• Conclusions: all of the above
• Acknowledgements
Problem
1. Unstable DA estimates
2. Unstable variance
estimates
3. Outliers
Solution
1. Shrinkage estimation
2. Borrow information
across proteins
3. Weigh down outlying
peptides
How can we improve upons existing peptide-
based models?
This will lead to:
1. Better fold change estimates
2. Better ranking
1. Shrinkage estimation
E.g. ridge regression: minimize the following loss
function:
𝑦 − 𝑋 𝛽
2
+ 𝜆 𝑡𝑟𝑒𝑎𝑡 𝛽𝑡𝑟𝑒𝑎𝑡
2
+ 𝜆 𝑝𝑒𝑝 𝛽 𝑝𝑒𝑝
2 + 𝜆𝑖𝑛𝑠𝑡𝑟 𝛽𝑖𝑛𝑠𝑡𝑟
2
• Penalty on the effect sizes: shrinkage toward 0
• Biased but (much) more stable estimator
• Sparse data: shrinkage ↗
• 𝜆s: via cross-validation or link with mixed models
2. Borrow information across proteins:
Empirical Bayes variance estimation
𝝈𝒊
𝟐
𝝈 𝟎
𝟐
𝝈𝒊
𝟐
𝝈𝒊
𝟐
𝝈𝒊
𝟐
𝝈𝒊
𝟐
𝝈𝒊
𝟐
Data decides!
• Stabilizes variance estimates
• Get rid of proteins with low fold changes and low
variance caused by data sparsity
More details (limma paper):
Smyth (2004), Linear models and empirical bayes methods for assessing differential expression in microarray
experiments. Stat Appl Genet Mol Biol, 3, Article3.
3. Weigh down outlying peptides
E.g. M estimation with Huber weights
Minimize the following loss function:
𝒘 𝑦 − 𝑋 𝛽
2
+ 𝜆 𝑡𝑟𝑒𝑎𝑡 𝛽𝑡𝑟𝑒𝑎𝑡
2
+ 𝜆 𝑝𝑒𝑝 𝛽 𝑝𝑒𝑝
2 + 𝜆𝑖𝑛𝑠𝑡𝑟 𝛽𝑖𝑛𝑠𝑡𝑟
2
• Weigh down outlying observations
1. Better fold change
estimates
-> More accurate and
more precise
2. Better specificity and
sensitivity
-> Improved ranking
Results!
1. Better fold change estimates
For null proteins
1. Better fold change estimates
For differentially abundant proteins
2. Better sensitivity and specificity
Conclusions
1. Use peptide-based models
2. Our peptide-based model uses:
1. Shrinkage estimation
2. Empirical Bayes variance estimation
3. Downweighing of outliers
3. Advantages:
1. More stable fold change estimates
2. Better sensititivity and specificity
Papers
Goeminne et al. (2015), Summarization vs. Peptide-
Based Models in Label-free Quantitative Proteomics:
Performance, Pitfalls and Data Analysis Guidelines.
Journal of Proteome Research.
Goeminne et al. (2015), Peptide-level robust ridge
regression modeling with Empirical Bayes variance
estimation and M estimation improves both sensitivity
and specificity in quantitative label-free shotgun
proteomics, submitted.
Acknowledgements: people
Lieven Clement
+ lab members
Kris Gevaert
+ lab members
Lennart Martens
Andrea ArgentiniKlaas Vandepoele
Acknowledgements: organizations
Thank you for your attention!
Questions?

More Related Content

Similar to presentation_MQ_Summer_schools_2015_v4

Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analyses
Mike LaValley
 
Statistics
StatisticsStatistics
Statistics
asia said
 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell Lines
MilliporeSigma
 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell Lines
Merck Life Sciences
 
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Bandar AlGhamdi BSRS, MPH-HSQM
 
EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
ChemAxon
 
Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...
Eduardo Castro
 
A brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methodsA brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methods
Creative Proteomics
 
Final submission –Pay attention to APA formatting, spelling, and
Final submission –Pay attention to APA formatting, spelling, andFinal submission –Pay attention to APA formatting, spelling, and
Final submission –Pay attention to APA formatting, spelling, and
ChereCheek752
 
NetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterNetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan Schuster
Alexander Pico
 
ChiMaster Details
ChiMaster DetailsChiMaster Details
ChiMaster Details
Roop Wadhwani
 
JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017
Irina Silva
 
Panel presentation ZZZiyan
Panel presentation ZZZiyanPanel presentation ZZZiyan
Panel presentation ZZZiyan
Ziyan Zhao
 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and Peptides
MilliporeSigma
 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and Peptides
Merck Life Sciences
 
From sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysisFrom sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysis
Expedeon
 
FOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific ProteinsFOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific Proteins
Randox
 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
ssuser4b1f48
 
Using Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein ExpressionUsing Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein Expression
heathermerk
 
proteomics lecture 2b.ppt
proteomics lecture 2b.pptproteomics lecture 2b.ppt
proteomics lecture 2b.ppt
Dr.SUSHIL KUMAR BAROLIA
 

Similar to presentation_MQ_Summer_schools_2015_v4 (20)

Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analyses
 
Statistics
StatisticsStatistics
Statistics
 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell Lines
 
Accelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell LinesAccelerate Delivery of High Producing Cell Lines
Accelerate Delivery of High Producing Cell Lines
 
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
Reducing Radiation Dose in CT Chest for Pulmonary Embolism (P.E) Studies - (QIP)
 
EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
EUGM 2014 - Ádám Andor Kelemen (Hungarian Academy of Sciences): Physicochemic...
 
Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...Advances in size exclusion chromatography for the analysis of small proteins ...
Advances in size exclusion chromatography for the analysis of small proteins ...
 
A brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methodsA brief introfuction of label-free protein quantification methods
A brief introfuction of label-free protein quantification methods
 
Final submission –Pay attention to APA formatting, spelling, and
Final submission –Pay attention to APA formatting, spelling, andFinal submission –Pay attention to APA formatting, spelling, and
Final submission –Pay attention to APA formatting, spelling, and
 
NetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan SchusterNetBioSIG2013-KEYNOTE Stefan Schuster
NetBioSIG2013-KEYNOTE Stefan Schuster
 
ChiMaster Details
ChiMaster DetailsChiMaster Details
ChiMaster Details
 
JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017JBEI Research Highlights - April 2017
JBEI Research Highlights - April 2017
 
Panel presentation ZZZiyan
Panel presentation ZZZiyanPanel presentation ZZZiyan
Panel presentation ZZZiyan
 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and Peptides
 
Long Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and PeptidesLong Acting Injectables - A New Dimension for Proteins and Peptides
Long Acting Injectables - A New Dimension for Proteins and Peptides
 
From sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysisFrom sample-to-spray: high performance workflow for top down protein analysis
From sample-to-spray: high performance workflow for top down protein analysis
 
FOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific ProteinsFOCUS 2016 Poster: Specific Proteins
FOCUS 2016 Poster: Specific Proteins
 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
 
Using Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein ExpressionUsing Anova to Evaluate Differential Protein Expression
Using Anova to Evaluate Differential Protein Expression
 
proteomics lecture 2b.ppt
proteomics lecture 2b.pptproteomics lecture 2b.ppt
proteomics lecture 2b.ppt
 

presentation_MQ_Summer_schools_2015_v4

  • 1. Peptide-level robust ridge regression modeling improves both sensitivity and specificity in quantitative proteomics Ludger Goeminne Promotors: Lieven Clement Kris Gevaert Klaas Vandepoele
  • 2. A story about relative protein quantification in label-free shotgun proteomics
  • 3. Outline • Problem: reliable differential quantitation • Solution: peptide-based models • Improving solution via: 1. Shrinkage estimation 2. Borrowing information across proteins 3. Weighing down outliers • Leads to: 1. Better fold change estimates 2. Better sensitivity and specificity • Conclusions: all of the above • Acknowledgements
  • 4. For each protein: Problem: how to do differential quantification? 22.6464 Treatment 1 Peptide A Rep 1 17.85773 Treatment 1 Peptide B Rep 1 15.4947 Treatment 1 Peptide C Rep 2 14.02125 Treatment 1 Peptide D Rep 2 18.0965 Treatment 2 Peptide A Rep 3 14.59100 Treatment 2 Peptide B Rep 3 14.2959 Treatment 2 Peptide C Rep 3 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦
  • 5. Solution: 2 main ways: Problem: how to do differential quantification? Normalisation Normalisation +summarization (e.g. MaxLFQ) T-tests (e.g. Perseus) Peptide intensities Peptide- based models 1. Summarize to protein level 2. Peptide- based models
  • 6. Peptide-based models are superior Daly, et al. (2008), Journal of Proteome Research, 7, (3), 1209-1217. Clough et al. (2009, Journal of Proteome Research, 8, (11), 5275-5284. Karpievitch et al. (2009), Bioinformatics, 25, (16), 2028-2034. Goeminne et al. (2015), Journal of Proteome Research , 14, (6), 2457-2465. Spike-in: 48 human proteins in yeast proteome
  • 7. A model for each protein: Peptide-based models 22.6464 Intercept Treatment 1 Peptide A Rep 1 Error 1 17.85773 Intercept Treatment 1 Peptide B Rep 1 Error 2 15.4947 Intercept Treatment 1 Peptide C Rep 2 Error 3 14.02125 Intercept Treatment 1 Peptide D Rep 2 Error 4 18.0965 Intercept Treatment 2 Peptide A Rep 3 Error 5 14.59100 Intercept Treatment 2 Peptide B Rep 3 Error 6 14.2959 Intercept Treatment 2 Peptide C Rep 3 Error 7 log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟
  • 8. A model for each protein: Peptide-based models 22.6464 16 1.5 4.5 0.5 0.1464 17.85773 16 1.5 -0.2 0.5 0.05773 15.4947 16 1.5 -1 -0.7 -0.3053 14.02125 16 1.5 -2 -0.7 -0.77875 18.0965 16 -1.5 4.5 -0.3 -0.6035 14.59100 16 -1.5 -0.2 -0.3 0.59100 14.2959 16 -1.5 -1 -0.3 0.0959 log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟
  • 9. A model for each protein: Peptide-based models 22.6464 16 1.5 4.5 0.5 0.1464 17.85773 16 1.5 -0.2 0.5 0.05773 15.4947 16 1.5 -1 -0.7 -0.3053 14.02125 16 1.5 -2 -0.7 -0.77875 18.0965 16 -1.5 4.5 -0.3 -0.6035 14.59100 16 -1.5 -0.2 -0.3 0.59100 14.2959 16 -1.5 -1 -0.3 0.0959 log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟
  • 10. A model for each protein: Peptide-based models 22.6464 1.5 17.85773 1.5 15.4947 1.5 14.02125 1.5 18.0965 -1.5 14.59100 -1.5 14.2959 -1.5 log2 𝑀𝑆 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ~ 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 + 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 + 𝑝𝑒𝑝𝑡𝑖𝑑𝑒 + 𝑟𝑒𝑝𝑒𝑎𝑡 + 𝑒𝑟𝑟𝑜𝑟 1.5 -1.5 - 3
  • 11. 1. Unstable DA estimates 2. Unstable variance estimates 3. Outliers Still some issues… A yeast null protein
  • 12. Structure of my presentation • Problem: reliable differential quantitation • Solution: peptide-based models • Improving solution via: 1. Shrinkage estimation 2. Borrowing information across proteins 3. Weighing down outliers • Leads to: 1. Better fold change estimates 2. Better sensitivity and specificity • Conclusions: all of the above • Acknowledgements
  • 13. Problem 1. Unstable DA estimates 2. Unstable variance estimates 3. Outliers Solution 1. Shrinkage estimation 2. Borrow information across proteins 3. Weigh down outlying peptides How can we improve upons existing peptide- based models? This will lead to: 1. Better fold change estimates 2. Better ranking
  • 14. 1. Shrinkage estimation E.g. ridge regression: minimize the following loss function: 𝑦 − 𝑋 𝛽 2 + 𝜆 𝑡𝑟𝑒𝑎𝑡 𝛽𝑡𝑟𝑒𝑎𝑡 2 + 𝜆 𝑝𝑒𝑝 𝛽 𝑝𝑒𝑝 2 + 𝜆𝑖𝑛𝑠𝑡𝑟 𝛽𝑖𝑛𝑠𝑡𝑟 2 • Penalty on the effect sizes: shrinkage toward 0 • Biased but (much) more stable estimator • Sparse data: shrinkage ↗ • 𝜆s: via cross-validation or link with mixed models
  • 15. 2. Borrow information across proteins: Empirical Bayes variance estimation 𝝈𝒊 𝟐 𝝈 𝟎 𝟐 𝝈𝒊 𝟐 𝝈𝒊 𝟐 𝝈𝒊 𝟐 𝝈𝒊 𝟐 𝝈𝒊 𝟐 Data decides! • Stabilizes variance estimates • Get rid of proteins with low fold changes and low variance caused by data sparsity More details (limma paper): Smyth (2004), Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 3, Article3.
  • 16. 3. Weigh down outlying peptides E.g. M estimation with Huber weights Minimize the following loss function: 𝒘 𝑦 − 𝑋 𝛽 2 + 𝜆 𝑡𝑟𝑒𝑎𝑡 𝛽𝑡𝑟𝑒𝑎𝑡 2 + 𝜆 𝑝𝑒𝑝 𝛽 𝑝𝑒𝑝 2 + 𝜆𝑖𝑛𝑠𝑡𝑟 𝛽𝑖𝑛𝑠𝑡𝑟 2 • Weigh down outlying observations
  • 17. 1. Better fold change estimates -> More accurate and more precise 2. Better specificity and sensitivity -> Improved ranking Results!
  • 18. 1. Better fold change estimates For null proteins
  • 19. 1. Better fold change estimates For differentially abundant proteins
  • 20. 2. Better sensitivity and specificity
  • 21. Conclusions 1. Use peptide-based models 2. Our peptide-based model uses: 1. Shrinkage estimation 2. Empirical Bayes variance estimation 3. Downweighing of outliers 3. Advantages: 1. More stable fold change estimates 2. Better sensititivity and specificity
  • 22. Papers Goeminne et al. (2015), Summarization vs. Peptide- Based Models in Label-free Quantitative Proteomics: Performance, Pitfalls and Data Analysis Guidelines. Journal of Proteome Research. Goeminne et al. (2015), Peptide-level robust ridge regression modeling with Empirical Bayes variance estimation and M estimation improves both sensitivity and specificity in quantitative label-free shotgun proteomics, submitted.
  • 23. Acknowledgements: people Lieven Clement + lab members Kris Gevaert + lab members Lennart Martens Andrea ArgentiniKlaas Vandepoele
  • 25. Thank you for your attention! Questions?