SlideShare a Scribd company logo
1 of 39
Data Normalization Approaches for
Large-Scale Metabolomic Studies
Dmitry Grapov, PhD
Analytical Variance
Variation in sample measurements stemming from sample
handling, data acquisition, processing, etc
• Can modify or mask true biological variability
• Calculated based on variance in replicated measurements
• Can be accounted for using data normalization approaches
Goal- minimize analytical variance using data normalization
Drift in >400 replicated measurements across >100 batches
Need for Normalization
To remove non-biological (e.g. analytical)
drift/variance/artifacts in measurements
Acquisition order Processing/acquisition batches
Samples
Quality Controls (QCs)
Quantifying Data Quality (precision)
Calculate median inter- and intra-batch %RSD
(for replicated measurements)
Analyte specific
performance across
whole study
Within batch
performance
Visualizing Performance
Intra-batch (within) precision for
normalization methods
Inter-batch (across) precision for
normalization methods
RSD = relative standard deviation = standard deviation/mean
Visualizing Metabolite Performance
acquisition time
batch
Univariate Multivariate
PCA
Common Normalization Approaches
Sample-wise scalar corrections
• L2 norm, mean, median, sum, etc.
Internal standard (ISTD)
• Ratio response (metabolite/ISTD)
• NOMIS (Sysi-Aho et al., 2007; selection of optimal combination ISTDs)
• CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs)
Quality control (QC) or reference sample
• Batch ratio (mean, median)
• Loess (doi:10.1038/nprot.2011.335; locally estimated scatterplot smoothing)
• Hierarchical mixed effects (Jauhiainen et al. 2014)
• Quantile (Bolstad et al., 2003; minimize variance in metabolite distribution)
Variance Based
• RUV-2 (De Livera et al., 2012; variance removal for hypothesis testing)
• Variance stabilizing normalization (Huber et al. 2002)
Evaluation of Normalizations
Use QC to define:
• Median within batch %RSD
• Median analyte study wide %RSD
• All normalization specific parameters
• Split QCs into training and test set
• Optimize tuning parameters using leave-one-out
cross-validation
• Assess performance on test set
Image: http://pingax.com/regularization-implementation-r/?utm_source=rss&utm_medium=rss&utm_campaign=regularization-implementation-r
Scalar Normalization
Calculate sample-
specific scalar to ensure
each sample’s (sum,
mean, median, etc)
signal is equivalent
• Using sum signal
normalization (sum
norm) assumes
equivalent total
metabolite signal per
sample
• Can correct for batch
effects when valid
BMC Bioinformatics 2007, 8:93 doi:10.1186/1471-2105-8-93
Theses normalizations may hide true
biological trends or create false ones
After sum norm phospholipids
seem lower in ob/ob when in
reality theses are the same as
in wt samples
Batch Ratio (BR) Normalization
Use QCs to calculate:
1. batch/analyte specific
correction factor =
(batch median /global
median)
2. Apply ratio to samples
• simple
LOESS Normalization (local smoothing)
For each analyte use QCs to:
• Tune LOESS model (span or degree of smoothing)
• LOESS model to remove analytical variance from samples
raw LOESS normalized
LOESS Normalization
LOESS span has a large effect model fit
span (α) defines the degree of
smoothing and is critical for
controlling overfitting
LOESS Normalization
raw samples (red) normalized based on QCs (black)
model is trained on QCs and applied to samples
span: too high just right?
Can not assume convergence of training and test performance because
test data has analytical + biological variance
LOESS Normalization
Avoiding over fitting is critical using the LOESS normalization
Exammple LOESS Normalization
raw span =0.75 span =0.005
Metabolomic Data Case Study I
GC-TOF
• 310 metabolites for 4930 samples
• 132 batches
• ~41 samples per batch
• ~1:10 QCs/samples (487 QCs or 9%)
• No Internal Standards (ISTDs)
Normalizations Implemented
• Batch ratio
• LOESS
• Sum known metabolite signal (mTIC) normalization
Batch Performance (GC-TOF Raw)
Within batch
• Median: 26
• Min: 19
• Max: 69
Median
RSD count cumulative %
10-20 3 2
20-30 98 76
30-40 26 96
40-50 3 98
50-60 1 99
60-70 1 100
Median
RSD count cumulative %
0-10 10 3
10-20 83 30
20-30 100 62
30-40 69 84
40-50 32 94
50-60 6 96
60-70 3 97
70-80 5 98
80-90 1 99
90-100 1 100
Analyte Performance (GC-TOF Raw)
Within Batch
• Median: 24
• Min: 7
• Max: 79
PCA (GC-TOF Raw)
Within batches
• Median: 23
• Min: 17
• Max: 69
Median
RSD count cumulative %
10-20 25 23
20-30 67 85
30-40 15 99
40-50 1 100
60-70 1 101
Batch Performance (GC-TOF BR)
Median
RSD count cumulative %
0-10 17 6
10-20 103 39
20-30 112 75
30-40 57 93
40-50 12 97
50-60 5 99
60-70 3 100
70-80 1 100
Across batches
• Median: 24
• Min: 7
• Max: 79
Batch Performance (GC-TOF BR)
PCA (GC-TOF BR)
BR Normalization Limitations
• Very susceptible to
outliers
• Requires many QCs
• Can inflate variance
when training and test
set trends do not
match
Within batches
• Median: 19
• Min: 11
• Max: 58
Median
RSD count cumulative %
10-20 75 57
20-30 51 96
30-40 4 99
40-50 1 99
50-60 1 100
Batch Performance (GC-TOF LOESS)
Median
RSD count cumulative %
0-10 17 6
10-20 103 39
20-30 112 75
30-40 57 93
40-50 12 97
50-60 5 99
60-70 3 100
70-80 1 100
Across batches
• Median: 19
• Min: 2.9
• Max: 66
Batch Performance (GC-TOF LOESS)
PCA (GC-TOF LOESS)
LOESS Normalization Limitations
raw normalized
LOESS normalization can
inflate variance when:
• overtrained
• training examples do
not match test set
Sum mTIC Normalization (GC-TOF)
Improved performance over
raw and BR, but alters data
from magnitudinal to
compositional
Sum mTIC Normalization (GC-TOF)
Poor removal of trends due to acquisition time, but limits magnitude of
outliers samples compared to other approaches
time
Raw
mTIC Normalized
Metabolomic Data Case
Study II
LC-Q-TOF
• 340+ metabolites for 4930 samples
• 132 batches
• ~41 samples per batch
• ~1:10 QC/samples (524 QCs or 11%)
• NIST reference (63 or 1%)
• 14 internal standards (ISTDs)
• NOMIS (IS = ISTD)
• qcISTD
Internal Standards Normalization
Analyte
Retention time
Internal standards (ISTD)
• qcISTD(QC optimized
metabolite/ISTD)
• NOMIS(Sysi-Aho et al., 2007;
selection of optimal combination ISTDs)
• CCRMN (Redestig et al., 2009;
removal of metabolite cross contribution
to ISTDs)
NOMIS
ISTD Based Normalizations (LC/Q-TOF)
• NOMIS (linear combination of optimal ISTDs;
Sysi-Aho et al., 2007)
• qcISTD (QC optimized ISTD strategy)
PC 38:6
Poor
performance
with NOMIS
qcISTD Normalization
Use QC samples to:
1. Evaluate analyte %RSD
before and after corrections
using all ISTDs
2. Select analyte/ISTD
combinations with %RSD
improvement over raw data
at some threshold (e.g 10%)
3. Correct sample analytes
with QC defined ISTD if ISTD
recovery is above some
minimal threshold (e.g. >
20% of median)
• Subject to overfitting
191 of 326 (60%) are
ISTD corrected
qcISTD Normalization
ISTD used by retention time (Rt) Total number of analytes corrected by ISTD
Optimal Lipidomic ISTDS
Normalizations (LC-Q-TOF)
LOESS performs very
poorly for two
metabolites
• qcISTD performs better than LOESS
• qcISTD + LOESS leads to highest replicate
precision
PCA (LC/Q-TOF)
Raw (%RSD = 13) qcISTD (9)
LOESS (12)
qcISTD +
LOESS (8)
Only LOESS included
normalizations effectively
remove analytical batch
effects
Conclusion
• Comparison of common data normalization approaches
suggests that in addition to ISTD corrections, LOESS
(analyte-specific, non-linear adjustment based on QC
performance at various data acquisition times) is superior
to batch based corrections.
• Further validations need to be completed to confirm the
effects of normalizations on samples’ variance
• These findings suggest that inclusion of “batch” as a
covariate in statistical models will not fully account for
analytical variance
R code for all normalization functions can be found at :
https://github.com/dgrapov/devium/blob/master/R/Devium%20Normalization.r
dgrapov@ucdavis.edu
metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154

More Related Content

What's hot

Cryopreservation
CryopreservationCryopreservation
CryopreservationMIR TAWSEEF
 
Protein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsProtein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsKAUSHAL SAHU
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformaticsjaumebp
 
Aseptic Techniques and sterile handling in atc lab
Aseptic Techniques and sterile handling in atc labAseptic Techniques and sterile handling in atc lab
Aseptic Techniques and sterile handling in atc labAkshdeep Sharma
 
Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)Nur Atikah Amira
 
Characterisation of cell lines
Characterisation of cell linesCharacterisation of cell lines
Characterisation of cell linesSANTOSHVET
 
Cell cytotoxicity assays
Cell cytotoxicity assaysCell cytotoxicity assays
Cell cytotoxicity assaysASHIKH SEETHY
 
An introduction to promoter prediction and analysis
An introduction to promoter prediction and analysisAn introduction to promoter prediction and analysis
An introduction to promoter prediction and analysisSarbesh D. Dangol
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markerssukruthaa
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignmentharshita agarwal
 
Protein structure
Protein structureProtein structure
Protein structurePooja Pawar
 
Common cloning technique
Common cloning techniqueCommon cloning technique
Common cloning techniqueshahnam azizi
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
transcriptional factros and repressors.pptx
transcriptional factros and repressors.pptxtranscriptional factros and repressors.pptx
transcriptional factros and repressors.pptxdrpvczback
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 

What's hot (20)

Cryopreservation
CryopreservationCryopreservation
Cryopreservation
 
OMICS.pptx
OMICS.pptxOMICS.pptx
OMICS.pptx
 
Bioinformatics t6-phylogenetics v2014
Bioinformatics t6-phylogenetics v2014Bioinformatics t6-phylogenetics v2014
Bioinformatics t6-phylogenetics v2014
 
Protein protein interaction, functional proteomics
Protein protein interaction, functional proteomicsProtein protein interaction, functional proteomics
Protein protein interaction, functional proteomics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Microarray
MicroarrayMicroarray
Microarray
 
Aseptic Techniques and sterile handling in atc lab
Aseptic Techniques and sterile handling in atc labAseptic Techniques and sterile handling in atc lab
Aseptic Techniques and sterile handling in atc lab
 
Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH)
 
Characterisation of cell lines
Characterisation of cell linesCharacterisation of cell lines
Characterisation of cell lines
 
Cell cytotoxicity assays
Cell cytotoxicity assaysCell cytotoxicity assays
Cell cytotoxicity assays
 
An introduction to promoter prediction and analysis
An introduction to promoter prediction and analysisAn introduction to promoter prediction and analysis
An introduction to promoter prediction and analysis
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markers
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 
Em bnet
Em bnetEm bnet
Em bnet
 
Protein structure
Protein structureProtein structure
Protein structure
 
Common cloning technique
Common cloning techniqueCommon cloning technique
Common cloning technique
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
transcriptional factros and repressors.pptx
transcriptional factros and repressors.pptxtranscriptional factros and repressors.pptx
transcriptional factros and repressors.pptx
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Viewers also liked

5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 

Viewers also liked (9)

2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
0 introduction
0  introduction0  introduction
0 introduction
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 

Similar to Data Normalization Approaches for Large-scale Biological Studies

Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Thomas Bagley
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaSada Siva Rao Maddiguntla
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaSada Siva Rao Maddiguntla
 
Analytical Method Validation.pptx
Analytical Method Validation.pptxAnalytical Method Validation.pptx
Analytical Method Validation.pptxBholakant raut
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryDrMAnwar2
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Thomas Bagley
 
Good laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachGood laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachSoils FAO-GSP
 
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Aamir Ijaz Brig
 
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjQualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjPratik434909
 
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxQualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxPratik434909
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00Vijay Dhonde
 
From Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsFrom Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsKBI Biopharma
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatographyVrushali Tambe
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty componentGH Yeoh
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
Bioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBhaswat Chakraborty
 
INSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONINSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONHamunyare Ndwabe
 

Similar to Data Normalization Approaches for Large-scale Biological Studies (20)

Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasiva
 
Analytical mehod validation explained sadasiva
Analytical mehod validation explained sadasivaAnalytical mehod validation explained sadasiva
Analytical mehod validation explained sadasiva
 
Analytical Method Validation.pptx
Analytical Method Validation.pptxAnalytical Method Validation.pptx
Analytical Method Validation.pptx
 
Evaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratoryEvaluation of methods in clinical laboratory
Evaluation of methods in clinical laboratory
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
ICP QC protocol
ICP  QC  protocolICP  QC  protocol
ICP QC protocol
 
Good laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approachGood laboratory practices. Internal quality control by z score approach
Good laboratory practices. Internal quality control by z score approach
 
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
 
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjjQualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
Qualification of HPLC & LCMS.pptxfjddjdjdhdjdjj
 
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfxQualification of HPLC & LCMS.pptdjdjdjdjfjkfx
Qualification of HPLC & LCMS.pptdjdjdjdjfjkfx
 
Analytical QBD -CPHI 25-27 July R00
Analytical QBD  -CPHI 25-27 July R00Analytical QBD  -CPHI 25-27 July R00
Analytical QBD -CPHI 25-27 July R00
 
From Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet MethodsFrom Screening to QC: Development Considerations for Octet Methods
From Screening to QC: Development Considerations for Octet Methods
 
Quantitation techniques used in chromatography
Quantitation techniques used in chromatographyQuantitation techniques used in chromatography
Quantitation techniques used in chromatography
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty component
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Bioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug ProductsBioequivalence of Highly Variable Drug Products
Bioequivalence of Highly Variable Drug Products
 
INSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTIONINSTRUMENTAL ANALYSIS INTRODUCTION
INSTRUMENTAL ANALYSIS INTRODUCTION
 

More from Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 

More from Dmitry Grapov (20)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 

Recently uploaded

REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 

Recently uploaded (20)

REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 

Data Normalization Approaches for Large-scale Biological Studies

  • 1. Data Normalization Approaches for Large-Scale Metabolomic Studies Dmitry Grapov, PhD
  • 2. Analytical Variance Variation in sample measurements stemming from sample handling, data acquisition, processing, etc • Can modify or mask true biological variability • Calculated based on variance in replicated measurements • Can be accounted for using data normalization approaches Goal- minimize analytical variance using data normalization Drift in >400 replicated measurements across >100 batches
  • 3. Need for Normalization To remove non-biological (e.g. analytical) drift/variance/artifacts in measurements Acquisition order Processing/acquisition batches Samples Quality Controls (QCs)
  • 4. Quantifying Data Quality (precision) Calculate median inter- and intra-batch %RSD (for replicated measurements) Analyte specific performance across whole study Within batch performance
  • 5. Visualizing Performance Intra-batch (within) precision for normalization methods Inter-batch (across) precision for normalization methods RSD = relative standard deviation = standard deviation/mean
  • 6. Visualizing Metabolite Performance acquisition time batch Univariate Multivariate PCA
  • 7. Common Normalization Approaches Sample-wise scalar corrections • L2 norm, mean, median, sum, etc. Internal standard (ISTD) • Ratio response (metabolite/ISTD) • NOMIS (Sysi-Aho et al., 2007; selection of optimal combination ISTDs) • CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs) Quality control (QC) or reference sample • Batch ratio (mean, median) • Loess (doi:10.1038/nprot.2011.335; locally estimated scatterplot smoothing) • Hierarchical mixed effects (Jauhiainen et al. 2014) • Quantile (Bolstad et al., 2003; minimize variance in metabolite distribution) Variance Based • RUV-2 (De Livera et al., 2012; variance removal for hypothesis testing) • Variance stabilizing normalization (Huber et al. 2002)
  • 8. Evaluation of Normalizations Use QC to define: • Median within batch %RSD • Median analyte study wide %RSD • All normalization specific parameters • Split QCs into training and test set • Optimize tuning parameters using leave-one-out cross-validation • Assess performance on test set Image: http://pingax.com/regularization-implementation-r/?utm_source=rss&utm_medium=rss&utm_campaign=regularization-implementation-r
  • 9. Scalar Normalization Calculate sample- specific scalar to ensure each sample’s (sum, mean, median, etc) signal is equivalent • Using sum signal normalization (sum norm) assumes equivalent total metabolite signal per sample • Can correct for batch effects when valid BMC Bioinformatics 2007, 8:93 doi:10.1186/1471-2105-8-93 Theses normalizations may hide true biological trends or create false ones After sum norm phospholipids seem lower in ob/ob when in reality theses are the same as in wt samples
  • 10. Batch Ratio (BR) Normalization Use QCs to calculate: 1. batch/analyte specific correction factor = (batch median /global median) 2. Apply ratio to samples • simple
  • 11. LOESS Normalization (local smoothing) For each analyte use QCs to: • Tune LOESS model (span or degree of smoothing) • LOESS model to remove analytical variance from samples raw LOESS normalized
  • 12. LOESS Normalization LOESS span has a large effect model fit span (α) defines the degree of smoothing and is critical for controlling overfitting
  • 13. LOESS Normalization raw samples (red) normalized based on QCs (black) model is trained on QCs and applied to samples span: too high just right? Can not assume convergence of training and test performance because test data has analytical + biological variance
  • 14. LOESS Normalization Avoiding over fitting is critical using the LOESS normalization
  • 15. Exammple LOESS Normalization raw span =0.75 span =0.005
  • 16. Metabolomic Data Case Study I GC-TOF • 310 metabolites for 4930 samples • 132 batches • ~41 samples per batch • ~1:10 QCs/samples (487 QCs or 9%) • No Internal Standards (ISTDs) Normalizations Implemented • Batch ratio • LOESS • Sum known metabolite signal (mTIC) normalization
  • 17. Batch Performance (GC-TOF Raw) Within batch • Median: 26 • Min: 19 • Max: 69 Median RSD count cumulative % 10-20 3 2 20-30 98 76 30-40 26 96 40-50 3 98 50-60 1 99 60-70 1 100
  • 18. Median RSD count cumulative % 0-10 10 3 10-20 83 30 20-30 100 62 30-40 69 84 40-50 32 94 50-60 6 96 60-70 3 97 70-80 5 98 80-90 1 99 90-100 1 100 Analyte Performance (GC-TOF Raw) Within Batch • Median: 24 • Min: 7 • Max: 79
  • 20. Within batches • Median: 23 • Min: 17 • Max: 69 Median RSD count cumulative % 10-20 25 23 20-30 67 85 30-40 15 99 40-50 1 100 60-70 1 101 Batch Performance (GC-TOF BR)
  • 21. Median RSD count cumulative % 0-10 17 6 10-20 103 39 20-30 112 75 30-40 57 93 40-50 12 97 50-60 5 99 60-70 3 100 70-80 1 100 Across batches • Median: 24 • Min: 7 • Max: 79 Batch Performance (GC-TOF BR)
  • 23. BR Normalization Limitations • Very susceptible to outliers • Requires many QCs • Can inflate variance when training and test set trends do not match
  • 24. Within batches • Median: 19 • Min: 11 • Max: 58 Median RSD count cumulative % 10-20 75 57 20-30 51 96 30-40 4 99 40-50 1 99 50-60 1 100 Batch Performance (GC-TOF LOESS)
  • 25. Median RSD count cumulative % 0-10 17 6 10-20 103 39 20-30 112 75 30-40 57 93 40-50 12 97 50-60 5 99 60-70 3 100 70-80 1 100 Across batches • Median: 19 • Min: 2.9 • Max: 66 Batch Performance (GC-TOF LOESS)
  • 27. LOESS Normalization Limitations raw normalized LOESS normalization can inflate variance when: • overtrained • training examples do not match test set
  • 28. Sum mTIC Normalization (GC-TOF) Improved performance over raw and BR, but alters data from magnitudinal to compositional
  • 29. Sum mTIC Normalization (GC-TOF) Poor removal of trends due to acquisition time, but limits magnitude of outliers samples compared to other approaches time Raw mTIC Normalized
  • 30. Metabolomic Data Case Study II LC-Q-TOF • 340+ metabolites for 4930 samples • 132 batches • ~41 samples per batch • ~1:10 QC/samples (524 QCs or 11%) • NIST reference (63 or 1%) • 14 internal standards (ISTDs) • NOMIS (IS = ISTD) • qcISTD
  • 31. Internal Standards Normalization Analyte Retention time Internal standards (ISTD) • qcISTD(QC optimized metabolite/ISTD) • NOMIS(Sysi-Aho et al., 2007; selection of optimal combination ISTDs) • CCRMN (Redestig et al., 2009; removal of metabolite cross contribution to ISTDs) NOMIS
  • 32. ISTD Based Normalizations (LC/Q-TOF) • NOMIS (linear combination of optimal ISTDs; Sysi-Aho et al., 2007) • qcISTD (QC optimized ISTD strategy) PC 38:6 Poor performance with NOMIS
  • 33. qcISTD Normalization Use QC samples to: 1. Evaluate analyte %RSD before and after corrections using all ISTDs 2. Select analyte/ISTD combinations with %RSD improvement over raw data at some threshold (e.g 10%) 3. Correct sample analytes with QC defined ISTD if ISTD recovery is above some minimal threshold (e.g. > 20% of median) • Subject to overfitting 191 of 326 (60%) are ISTD corrected
  • 34. qcISTD Normalization ISTD used by retention time (Rt) Total number of analytes corrected by ISTD
  • 36. Normalizations (LC-Q-TOF) LOESS performs very poorly for two metabolites • qcISTD performs better than LOESS • qcISTD + LOESS leads to highest replicate precision
  • 37. PCA (LC/Q-TOF) Raw (%RSD = 13) qcISTD (9) LOESS (12) qcISTD + LOESS (8) Only LOESS included normalizations effectively remove analytical batch effects
  • 38. Conclusion • Comparison of common data normalization approaches suggests that in addition to ISTD corrections, LOESS (analyte-specific, non-linear adjustment based on QC performance at various data acquisition times) is superior to batch based corrections. • Further validations need to be completed to confirm the effects of normalizations on samples’ variance • These findings suggest that inclusion of “batch” as a covariate in statistical models will not fully account for analytical variance R code for all normalization functions can be found at : https://github.com/dgrapov/devium/blob/master/R/Devium%20Normalization.r
  • 39. dgrapov@ucdavis.edu metabolomics.ucdavis.edu This research was supported in part by NIH 1 U24 DK097154