SlideShare a Scribd company logo
1 of 18
Download to read offline
Biology

Chemistry
Informatics

Evaluation of sample processing
protocols for the analysis of
pumpkin leaf metabolites

Statistics

Goals: Compare different extraction and drying
protocols to identify the “optimal” sample processing
approach
Topics:
1. Data quality overview
2. Statistical comparisons
3. Power analysis
Data Quality Overview
Biology

Chemistry
Informatics

Goal: Calculate and visualize the summary statistics for each
metabolite/treatment (Use DATA: Pumpkin data 1.csv)
Calculate:
1. Mean and standard deviation (sd)
2. The percent relative standard deviation, %RSD, (sd/mean)*100

Statistics

Visualize:
1. The relationship between mean vs. sd, mean and %RSD
2. Compare mean metabolite values for all treatments
Exercises:
1. Describe the relationship between analyte mean and sd, mean and %RSD?
2. Describe what constitutes an “optimal” method?
3. Which extraction/treatment should be chosen to process further samples?
Summary statistics
Biology

Chemistry

Statistics

Informatics
Mean vs. SD
Biology

Chemistry
Informatics

Mean and sd are highly correlated
Larger means have larger sd
This effect is also called heteroscedasticity

Statistics

SD

•
•
•

Mean
Mean vs. %RSD
Biology

Chemistry
Informatics

Statistics

%RSD

• %RSD is minimally correlated with the mean
Can be used as criteria for:
• Comparing method reproducibility
• Identifying data quality

Mean
Qualities of %RSD
Biology

Chemistry
Informatics

•

•
•

%RSD (also called the coefficient of variation or CV) is the sd (variation)
scaled by the mean (magnitude).
Removes the relationship between variation and magnitude
Provides a single value which can be used to compare the variation of a
measurement among different treatments/samples

Statistics

Showing the mean and sd of the %RSD for all metabolites for a given treatment
Data quality
Biology

Chemistry
Informatics

Below
LOQ

%RSD

(sensitivity)

Bad

Statistics

~40%

Moderate

~10,000 Mean

Good
Selecting the “optimal” method
Biology

Chemistry
Informatics

Optimal can be:
1. Lowest average %RSD for all measurements
2. Lowest %RSD for measurements of interest
3. Largest number of metabolites passing %RSD cutoff
4. Lowest average %RSD for all measurements passing %RSD cutoff
Using strategy #4 for metabolites %RSD ≤ 40

Statistics

Count

Method #2 (ACN/IPA/water 3:3:2) looks optimal…

%RSD (mean

sd)
Based on Method #2
Biology

Chemistry
Informatics

Mean

%RSD

%RSD ≤ 40

Log Mean

Statistics

Analytes with high
signal and high %RSD
should be further
interrogated for
explanations of low
reproducibility
Log Mean
Biology

Chemistry

Statistical comparison of the
effects of sample drying

Informatics

Goals: identify the effect of treatment (fresh/lyophylized) on Methods #3-4
performance? (Use DATA: Pumpkin data 2.csv)
Count
%RSD (mean sd)

Statistics

Steps:
1. Use t-Test to compare metabolite means for each treatment
2. Correct for the false discovery rate (FDR) adjusted p-value
3. Estimate FDR (q-value)
Visualize:
1. Relationship between p-value and FDR adjusted p-value
2. Relationship between FDR adjusted p-value and q-value
3. Box plots for highest and lowest p-value metabolites
Questions:
1. When should you use a one-sample, two-sample or paired t-test, ANOVA?

*return to 0-introduction
Hypothesis Testing Strategies
Biology

Chemistry

Statistics

Informatics

• One sample t-Test is used to compare single value to a population mean
• Two sample t-Test is used to compare 2 independent populations
• Paired t-Test is used to compare the same population (intervention, repeated
measures)
• One-way ANOVA (analysis of variance) is used to compare n populations for
one factor
• Two-way ANOVA is used to compare n populations for 2 factors
• ANCOVA (analysis of covariance) is used to adjust n populations for
covariate (typically continuous) prior to testing for n factors
• Mixed effects models are versatile analogue to linear model or
ANOVA/ANCOVA and typically used to adjust for covariates or variance due
to repeated measures
*All of the above are parametric tests, and some of which have non-parametric analogues
p-value vs. FDR adjusted p-value
Biology

Chemistry
Informatics

FDR adjusted p-value

Benjamini & Hochberg
(1995) (“BH”)
• Accepted standard

Statistics

Bonferroni
• Very conservative
• adjusted p-value = pvalue*# of tests
(e.g. 0.005 * 148 = 0.74 )
p-value
p-value vs. q-value
Biology

Chemistry
Informatics

Statistics

FDR adjusted p-value

• q-value can be used to
select appropriate p-value
cut off for an acceptable
FDR for multiple
hypotheses tested
• q=0.05 nicely matches
assumptions of p=0.05 for
multiple hypotheses tested
• q-value≤0.2 can be
acceptable

q-value
Biology

Chemistry

Change in metabolites due to
treatment

Informatics

Statistics

Effect size:

small

large
Effect of drying: is minimal
Biology

Chemistry
Informatics

- Log p-value

FDR p-value= 0.05

Statistics

7 significantly
different
metabolites out
of 148 (5%)

- Log p-value
Fold change (relative to fresh)
Power analysis
Biology

Chemistry
Informatics

Goals: Use power analysis to plan a follow up experiment to detect
differences in metabolites due to treatment

Steps:
1. Calculate effect size and power for three metabolites
2. Given the observed effect size calculate the number of samples needed to
reach 80% power

Statistics

Questions:
1. How would you take FDR in to account?
Power analysis
Biology

Chemistry
Informatics

Statistics

Scaled difference in means
between treatments

Ability to detect a
difference when it exists
(control false negative rate)

Probability of being wrong when spotting
a difference (control false positive rate)
Power analysis
Biology

Chemistry
Informatics

The minimum fold change (FC) in means observable by the study can be
calculated using RSD and estimated effect size to reach 0.8 (80%) power
given the population size

Statistics

RSD = 0.21 and effect size (EF) =1.2

We can observe a minimum of a 38% change in means at 0.8 power (p= 0.05).

More Related Content

What's hot

Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesDmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisDmitry Grapov
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisDmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataDmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 3
General Concepts in QSAR for Using the QSAR Application Toolbox Part 3General Concepts in QSAR for Using the QSAR Application Toolbox Part 3
General Concepts in QSAR for Using the QSAR Application Toolbox Part 3International QSAR Foundation
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 

What's hot (20)

Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case Studies
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data Analysis
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 3
General Concepts in QSAR for Using the QSAR Application Toolbox Part 3General Concepts in QSAR for Using the QSAR Application Toolbox Part 3
General Concepts in QSAR for Using the QSAR Application Toolbox Part 3
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 

Similar to 1 statistical analysis

Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryDrMAnwar2
 
Bioequivalence studies : A statistical approach through "R"
Bioequivalence  studies : A statistical approach through "R"Bioequivalence  studies : A statistical approach through "R"
Bioequivalence studies : A statistical approach through "R"Lavkush Upadhyay
 
Validation of Analytical Methods.pdf
Validation of Analytical Methods.pdfValidation of Analytical Methods.pdf
Validation of Analytical Methods.pdfanjaneyulu49
 
Pharmaceutical Analysis &Errors
Pharmaceutical Analysis &ErrorsPharmaceutical Analysis &Errors
Pharmaceutical Analysis &ErrorsAMOL DIGHE
 
design of experiments
design of experimentsdesign of experiments
design of experimentssigma-tau
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatographyVrushali Tambe
 
plackett-burmandesignppt.pptx
plackett-burmandesignppt.pptxplackett-burmandesignppt.pptx
plackett-burmandesignppt.pptxJasonWillardM
 
analaytical chemistry for medical laboratory.pdf
analaytical chemistry for medical laboratory.pdfanalaytical chemistry for medical laboratory.pdf
analaytical chemistry for medical laboratory.pdfnimonayoseph27
 
Assay-Method validation-PPT _slide
Assay-Method validation-PPT _slideAssay-Method validation-PPT _slide
Assay-Method validation-PPT _slideBhanu Prakash N
 
Analytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionAnalytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionBivek Timalsina
 
Webinar: How to Develop a Regulatory-compliant Continued Process Verificatio...
Webinar: 	How to Develop a Regulatory-compliant Continued Process Verificatio...Webinar: 	How to Develop a Regulatory-compliant Continued Process Verificatio...
Webinar: How to Develop a Regulatory-compliant Continued Process Verificatio...MilliporeSigma
 
Webinar: How to Develop a Regulatory-compliant Continued Process Verification...
Webinar: How to Develop a Regulatory-compliant Continued Process Verification...Webinar: How to Develop a Regulatory-compliant Continued Process Verification...
Webinar: How to Develop a Regulatory-compliant Continued Process Verification...Merck Life Sciences
 
Variability of clinical chemistry laboratory results
Variability of clinical  chemistry laboratory resultsVariability of clinical  chemistry laboratory results
Variability of clinical chemistry laboratory resultsAdetokunboAjala
 

Similar to 1 statistical analysis (20)

Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratory
 
Bioequivalence studies : A statistical approach through "R"
Bioequivalence  studies : A statistical approach through "R"Bioequivalence  studies : A statistical approach through "R"
Bioequivalence studies : A statistical approach through "R"
 
Data analysis and working on spss
Data analysis and working on spssData analysis and working on spss
Data analysis and working on spss
 
Validation of Analytical Methods.pdf
Validation of Analytical Methods.pdfValidation of Analytical Methods.pdf
Validation of Analytical Methods.pdf
 
Analytical target profile 1
Analytical target profile 1Analytical target profile 1
Analytical target profile 1
 
Pharmaceutical Analysis &Errors
Pharmaceutical Analysis &ErrorsPharmaceutical Analysis &Errors
Pharmaceutical Analysis &Errors
 
design of experiments
design of experimentsdesign of experiments
design of experiments
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatography
 
Bioassays praveen tk
Bioassays praveen tkBioassays praveen tk
Bioassays praveen tk
 
plackett-burmandesignppt.pptx
plackett-burmandesignppt.pptxplackett-burmandesignppt.pptx
plackett-burmandesignppt.pptx
 
Pharmacoeconomics
PharmacoeconomicsPharmacoeconomics
Pharmacoeconomics
 
analaytical chemistry for medical laboratory.pdf
analaytical chemistry for medical laboratory.pdfanalaytical chemistry for medical laboratory.pdf
analaytical chemistry for medical laboratory.pdf
 
Assay-Method validation-PPT _slide
Assay-Method validation-PPT _slideAssay-Method validation-PPT _slide
Assay-Method validation-PPT _slide
 
Analytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionAnalytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_Introduction
 
Webinar: How to Develop a Regulatory-compliant Continued Process Verificatio...
Webinar: 	How to Develop a Regulatory-compliant Continued Process Verificatio...Webinar: 	How to Develop a Regulatory-compliant Continued Process Verificatio...
Webinar: How to Develop a Regulatory-compliant Continued Process Verificatio...
 
Webinar: How to Develop a Regulatory-compliant Continued Process Verification...
Webinar: How to Develop a Regulatory-compliant Continued Process Verification...Webinar: How to Develop a Regulatory-compliant Continued Process Verification...
Webinar: How to Develop a Regulatory-compliant Continued Process Verification...
 
E04602033038
E04602033038E04602033038
E04602033038
 
Variability of clinical chemistry laboratory results
Variability of clinical  chemistry laboratory resultsVariability of clinical  chemistry laboratory results
Variability of clinical chemistry laboratory results
 
Error 2015 lamichhaneji
Error 2015 lamichhanejiError 2015 lamichhaneji
Error 2015 lamichhaneji
 

More from Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 

More from Dmitry Grapov (9)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 

1 statistical analysis

  • 1. Biology Chemistry Informatics Evaluation of sample processing protocols for the analysis of pumpkin leaf metabolites Statistics Goals: Compare different extraction and drying protocols to identify the “optimal” sample processing approach Topics: 1. Data quality overview 2. Statistical comparisons 3. Power analysis
  • 2. Data Quality Overview Biology Chemistry Informatics Goal: Calculate and visualize the summary statistics for each metabolite/treatment (Use DATA: Pumpkin data 1.csv) Calculate: 1. Mean and standard deviation (sd) 2. The percent relative standard deviation, %RSD, (sd/mean)*100 Statistics Visualize: 1. The relationship between mean vs. sd, mean and %RSD 2. Compare mean metabolite values for all treatments Exercises: 1. Describe the relationship between analyte mean and sd, mean and %RSD? 2. Describe what constitutes an “optimal” method? 3. Which extraction/treatment should be chosen to process further samples?
  • 4. Mean vs. SD Biology Chemistry Informatics Mean and sd are highly correlated Larger means have larger sd This effect is also called heteroscedasticity Statistics SD • • • Mean
  • 5. Mean vs. %RSD Biology Chemistry Informatics Statistics %RSD • %RSD is minimally correlated with the mean Can be used as criteria for: • Comparing method reproducibility • Identifying data quality Mean
  • 6. Qualities of %RSD Biology Chemistry Informatics • • • %RSD (also called the coefficient of variation or CV) is the sd (variation) scaled by the mean (magnitude). Removes the relationship between variation and magnitude Provides a single value which can be used to compare the variation of a measurement among different treatments/samples Statistics Showing the mean and sd of the %RSD for all metabolites for a given treatment
  • 8. Selecting the “optimal” method Biology Chemistry Informatics Optimal can be: 1. Lowest average %RSD for all measurements 2. Lowest %RSD for measurements of interest 3. Largest number of metabolites passing %RSD cutoff 4. Lowest average %RSD for all measurements passing %RSD cutoff Using strategy #4 for metabolites %RSD ≤ 40 Statistics Count Method #2 (ACN/IPA/water 3:3:2) looks optimal… %RSD (mean sd)
  • 9. Based on Method #2 Biology Chemistry Informatics Mean %RSD %RSD ≤ 40 Log Mean Statistics Analytes with high signal and high %RSD should be further interrogated for explanations of low reproducibility Log Mean
  • 10. Biology Chemistry Statistical comparison of the effects of sample drying Informatics Goals: identify the effect of treatment (fresh/lyophylized) on Methods #3-4 performance? (Use DATA: Pumpkin data 2.csv) Count %RSD (mean sd) Statistics Steps: 1. Use t-Test to compare metabolite means for each treatment 2. Correct for the false discovery rate (FDR) adjusted p-value 3. Estimate FDR (q-value) Visualize: 1. Relationship between p-value and FDR adjusted p-value 2. Relationship between FDR adjusted p-value and q-value 3. Box plots for highest and lowest p-value metabolites Questions: 1. When should you use a one-sample, two-sample or paired t-test, ANOVA? *return to 0-introduction
  • 11. Hypothesis Testing Strategies Biology Chemistry Statistics Informatics • One sample t-Test is used to compare single value to a population mean • Two sample t-Test is used to compare 2 independent populations • Paired t-Test is used to compare the same population (intervention, repeated measures) • One-way ANOVA (analysis of variance) is used to compare n populations for one factor • Two-way ANOVA is used to compare n populations for 2 factors • ANCOVA (analysis of covariance) is used to adjust n populations for covariate (typically continuous) prior to testing for n factors • Mixed effects models are versatile analogue to linear model or ANOVA/ANCOVA and typically used to adjust for covariates or variance due to repeated measures *All of the above are parametric tests, and some of which have non-parametric analogues
  • 12. p-value vs. FDR adjusted p-value Biology Chemistry Informatics FDR adjusted p-value Benjamini & Hochberg (1995) (“BH”) • Accepted standard Statistics Bonferroni • Very conservative • adjusted p-value = pvalue*# of tests (e.g. 0.005 * 148 = 0.74 ) p-value
  • 13. p-value vs. q-value Biology Chemistry Informatics Statistics FDR adjusted p-value • q-value can be used to select appropriate p-value cut off for an acceptable FDR for multiple hypotheses tested • q=0.05 nicely matches assumptions of p=0.05 for multiple hypotheses tested • q-value≤0.2 can be acceptable q-value
  • 14. Biology Chemistry Change in metabolites due to treatment Informatics Statistics Effect size: small large
  • 15. Effect of drying: is minimal Biology Chemistry Informatics - Log p-value FDR p-value= 0.05 Statistics 7 significantly different metabolites out of 148 (5%) - Log p-value Fold change (relative to fresh)
  • 16. Power analysis Biology Chemistry Informatics Goals: Use power analysis to plan a follow up experiment to detect differences in metabolites due to treatment Steps: 1. Calculate effect size and power for three metabolites 2. Given the observed effect size calculate the number of samples needed to reach 80% power Statistics Questions: 1. How would you take FDR in to account?
  • 17. Power analysis Biology Chemistry Informatics Statistics Scaled difference in means between treatments Ability to detect a difference when it exists (control false negative rate) Probability of being wrong when spotting a difference (control false positive rate)
  • 18. Power analysis Biology Chemistry Informatics The minimum fold change (FC) in means observable by the study can be calculated using RSD and estimated effect size to reach 0.8 (80%) power given the population size Statistics RSD = 0.21 and effect size (EF) =1.2 We can observe a minimum of a 38% change in means at 0.8 power (p= 0.05).