SlideShare a Scribd company logo
1 of 32
Multivariate Analysis and 
Visualization of ProteOmic Data 
Dmitry Grapov, PhD
State of the art facility producing massive 
amounts of biological data… 
>20-30K samples/yr 
>200 studies
Analysis at the ProteOmic Scale and Beyond 
Genomic 
Omic Multi-Omic 
Metabolomic 
Proteomic 
integration
Sample 
Data Analysis and Visualization 
Variable 
Quality Assessment 
• use replicated mesurements 
and/or internal standards to 
estimate analytical variance 
Statistical and Multivariate 
• use the experimental design 
to test hypotheses and/or 
identify trends in analytes 
Functional 
• use statistical and multivariate 
results to identify impacted 
biochemical domains 
Network 
• integrate statistical and 
multivariate results with the 
experimental design and 
analyte metadata 
Sample Variable 
experimental design 
- organism, sex, age etc. 
analyte description and 
metadata 
- biochemical class, mass 
spectra, etc.
Sample 
Data Analysis and Visualization 
Variable 
Quality Assessment 
• use replicated mesurements 
and/or internal standards to 
estimate analytical variance 
Statistical and Multivariate 
• use the experimental design 
to test hypotheses and/or 
identify trends in analytes 
Functional 
• use statistical and multivariate 
results to identify impacted 
biochemical domains 
Network 
• integrate statistical and 
multivariate results with the 
experimental design and 
analyte metadata 
Network Mapping 
Sample Variable 
experimental design 
- organism, sex, age etc. 
analyte description and 
metadata 
- biochemical class, mass 
spectra, etc.
Data Quality Assessment 
Quality metrics 
•Precision (replicated 
measurements) 
•Accuracy (reference 
samples) 
Common tasks 
•normalization 
•outlier detection 
•missing values 
imputation
Batch Effects 
Drift in >400 replicated measurements across >100 analytical batches for a single analyte 
Principal Component 
Analysis (PCA) of all 
analytes, showing QC 
sample scores 
Acquisition batch 
Abundance 
QCs embedded 
among >5,5000 
samples (1:10) 
collected over 
1.5 yrs 
If the biological effect 
size is less than the 
analytical variance 
then the experiment 
will incorrectly yield 
insignificant results
Analyte specific data quality 
overview 
Sample specific normalization can be used 
to estimate and remove analytical variance 
Raw Data Normalized Data 
Normalizations need to be 
numerically and visually validated 
low precision 
log mean 
%RSD 
high precision 
Samples 
QCs 
Batch Effects
Outlier Detection 
• 1 variable 
(univariate) 
• 2 variables 
(bivariate) 
• >2 variables 
(multivariate)
bivariate vs. 
multivariate 
(scatter plot) 
outliers? 
mixed up samples 
(PCA scores plot) 
Outlier Detection
Statistical and Multivariate Analyses 
Group 1 
Statistics 
Multivariate 
Context 
+ 
+ 
= 
Network Mapping 
Ranked statistically 
significant differences 
within a a biochemical 
context 
Group 2 
What analytes are 
different between the 
two groups of samples? 
Statistical 
t-Test 
significant differences 
lacking rank and 
context 
Multivariate 
O-PLS-DA 
ranked differences 
lacking significance 
and context
Statistical and Multivariate Analyses 
Group 1 
Statistics 
Multivariate 
Context 
+ 
+ 
= 
Network Mapping 
Group 2 
What analytes are 
different between the 
two groups of samples? 
Statistical 
t-Test 
Multivariate 
O-PLS-DA 
To see the big picture it is necessary too view the data from multiple 
different angles
Statistical Analysis: achieving ‘significance’ 
significance level (α) and power (1-β ) 
effect size (standardized difference in 
means) 
sample size (n) 
Power analyses can be used to 
optimize future experiments 
given preliminary data 
Example: use experimentally 
derived (or literature estimated) 
effect sizes, desired p-value 
(alpha) and power (beta) to 
calculate the optimal number of 
samples per group
Statistical Tests 
Poisson normal 
• Should be chosen based on the distribution 
(shape, type) of the (e.g. normal, negative 
binomial, Poisson) 
• Can be optimized based on data pre-treatment 
(e.g. NSAF, Power Law Global Error 
Model, PLGEM)
False Discovery Rate (FDR) 
Type I Error: False Positives (α) 
•Type II Error: False Negatives (β) 
•Type I risk = 
•1-(1-p.value)m 
m = number of variables tested
False Discovery Rate Adjustment 
FDR adjusted p-value 
p-value 
Benjamini & 
Hochberg (1995) 
(“BH”) 
•Accepted standard 
Bonferroni 
•Very conservative 
•adjusted p-value = 
p-value x # of tests 
(e.g. 0.005 x 148 = 0.74 )
Functional Analysis 
Identify changes or enrichment in biochemical domains 
• decrease 
• increase 
Nucl. Acids Res. (2008) 36 (suppl 2): W423-W426.doi: 10.1093/nar/gkn282
Functional Analysis: Enrichment 
Biochemical Pathway Biochemical Ontology
Common Multivariate Methods 
Clustering 
Projection 
Networks
Artist: Chuck Close 
Cluster Analysis 
Useful for 
•pattern recognition 
•complexity reduction 
Common Methods 
•Hierarchical 
•Model based 
•Other (k-means, k-NN, PAM, 
fuzzy) 
Linkage k-means 
Distribution Density
Hierarchical Clustering 
Similarity 
x 
x 
x 
x 
Dendrogram 
How does my metadata 
match my data structure?
Projection Methods 
single analyte all analytes 
The algorithm defines the position of the light source 
Principal Components Analysis (PCA) 
• unsupervised 
• maximize variance (X) 
Partial Least Squares Projection to 
Latent Structures (PLS) 
• supervised 
• maximize covariance (Y ~ X) 
James X. Li, 2009, VisuMap Tech.
Interpreting scores and loadings 
loadings represent how variables 
contribute to sample scores 
variables with the highest loadings have the 
greatest contribution to sample scores 
loadings 
scores 
Scores represent 
dis/similarities in samples 
based on all variables
Networks 
Biochemical 
•interaction 
• enrichment 
•etc 
Empirical (dependency) 
•correlation 
•partial-correlation 
•clustering 
variable 1 
variable 2 
variable 3
Enrichment Network 
Mapping of parents through children
Interaction Networks
Empirical Networks 
• Correlation based networks (CN) 
(simple, tendency to hairball) 
• GGM or partial correlation based 
networks (advanced, preference 
of direct over indirect 
relationships 
• *Increase in robustness with 
sample size 
10.1007/978-1-4614-1689-0_17
Proteomic Case Study: Diabetes Markers 
• Small sample size (control =12, GDM =6); covariates (time of sample collection) 
• >600 measured colostrum proteins; ~ 300 NSAF normalized proteins retained 
• Multivariate classification with O-PLS-DA used to identify variables to test using 
PLGEM with correction for FDR 
• Partial-correlation protein-protein interaction network analysis
DeviumWeb 
https://github.com/dgrapov/DeviumWeb 
• visualization 
• statistics 
• clustering 
• PCA 
• O-PLS
DeviumWeb 
https://github.com/dgrapov/DeviumWeb 
• visualization 
• statistics 
• clustering 
• PCA 
• O-PLS
Software and Resources 
•DeviumWeb- Dynamic multivariate data analysis and 
visualization platform 
url: https://github.com/dgrapov/DeviumWeb 
•imDEV- Microsoft Excel add-in for multivariate analysis 
url: http://sourceforge.net/projects/imdev/ 
•MetaMapR- Network analysis tools for metabolomics 
url: https://github.com/dgrapov/MetaMapR 
•TeachingDemos- Tutorials and demonstrations 
•url: http://sourceforge.net/projects/teachingdemos/?source=directory 
•url: https://github.com/dgrapov/TeachingDemos 
•Data analysis case studies and Examples 
url: http://imdevsoftware.wordpress.com/
Questions? 
dgrapov@ucdavis.edu 
This research was supported in part by NIH 1 U24 DK097154

More Related Content

What's hot

Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationDmitry Grapov
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataDmitry Grapov
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesDmitry Grapov
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysisSetia Pramana
 
Common Method Variance
Common Method Variance Common Method Variance
Common Method Variance Hiệp Phạm
 

What's hot (20)

Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
Automation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report GenerationAutomation of (Biological) Data Analysis and Report Generation
Automation of (Biological) Data Analysis and Report Generation
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 
0 introduction
0  introduction0  introduction
0 introduction
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Metabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case StudiesMetabolomic Data Analysis Case Studies
Metabolomic Data Analysis Case Studies
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
Ijcatr04051005
Ijcatr04051005Ijcatr04051005
Ijcatr04051005
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
 
Common Method Variance
Common Method Variance Common Method Variance
Common Method Variance
 

Viewers also liked

Gene Ontology Network Enrichment Analysis
Gene Ontology Network Enrichment AnalysisGene Ontology Network Enrichment Analysis
Gene Ontology Network Enrichment AnalysisUC Davis
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browserNeil Swainston
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserNeil Swainston
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleYasset Perez-Riverol
 
proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...
proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...
proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...Amit Yadav
 
Mass spectrometry final.pptx
Mass spectrometry final.pptxMass spectrometry final.pptx
Mass spectrometry final.pptxAashish Patel
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsCOST action BM1006
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomicsN Poorin
 

Viewers also liked (12)

Gene Ontology Network Enrichment Analysis
Gene Ontology Network Enrichment AnalysisGene Ontology Network Enrichment Analysis
Gene Ontology Network Enrichment Analysis
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
Bosc2011 isobar-fbp
Bosc2011 isobar-fbpBosc2011 isobar-fbp
Bosc2011 isobar-fbp
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scale
 
proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...
proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...
proteomics, mass spectrometry, science, bioinformatics, electrophoresis, liqu...
 
Proteomics
ProteomicsProteomics
Proteomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Mass spectrometry
Mass spectrometryMass spectrometry
Mass spectrometry
 
Mass spectrometry final.pptx
Mass spectrometry final.pptxMass spectrometry final.pptx
Mass spectrometry final.pptx
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and Applications
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 

Similar to Multivariate Analysis and Visualization of Proteomic Data

RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSGolden Helix
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data AnalysisSetia Pramana
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Enrico Glaab
 
Analytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionAnalytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionBivek Timalsina
 
New Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVSNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVSGolden Helix
 
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Golden Helix Inc
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyKern Rocke
 
Statistical Analysis for Educational Outcomes Measurement in CME
Statistical Analysis for Educational Outcomes Measurement in CMEStatistical Analysis for Educational Outcomes Measurement in CME
Statistical Analysis for Educational Outcomes Measurement in CMED. Warnick Consulting
 
Modelling physiological uncertainty
Modelling physiological uncertaintyModelling physiological uncertainty
Modelling physiological uncertaintyNatal van Riel
 
Are we really including all relevant evidence
Are we really including all relevant evidence Are we really including all relevant evidence
Are we really including all relevant evidence cheweb1
 
Biomarker Strategies
Biomarker StrategiesBiomarker Strategies
Biomarker StrategiesTom Plasterer
 

Similar to Multivariate Analysis and Visualization of Proteomic Data (20)

RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Vanderbilt b
Vanderbilt bVanderbilt b
Vanderbilt b
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)
 
Analytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_IntroductionAnalytical chemistry_Instrumentation_Introduction
Analytical chemistry_Instrumentation_Introduction
 
New Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVSNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVS
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human Ecology
 
Statistical Analysis for Educational Outcomes Measurement in CME
Statistical Analysis for Educational Outcomes Measurement in CMEStatistical Analysis for Educational Outcomes Measurement in CME
Statistical Analysis for Educational Outcomes Measurement in CME
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Modelling physiological uncertainty
Modelling physiological uncertaintyModelling physiological uncertainty
Modelling physiological uncertainty
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Are we really including all relevant evidence
Are we really including all relevant evidence Are we really including all relevant evidence
Are we really including all relevant evidence
 
Biomarker Strategies
Biomarker StrategiesBiomarker Strategies
Biomarker Strategies
 

More from UC Davis

Presentation phinney abrf 2019
Presentation phinney abrf 2019Presentation phinney abrf 2019
Presentation phinney abrf 2019UC Davis
 
Prosit google-cloud
Prosit google-cloudProsit google-cloud
Prosit google-cloudUC Davis
 
Phinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group TalkPhinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group TalkUC Davis
 
Genome web july 2019 presentation phinney
Genome web july 2019 presentation phinneyGenome web july 2019 presentation phinney
Genome web july 2019 presentation phinneyUC Davis
 
Asms qc Will Thompson Duke
Asms qc Will Thompson DukeAsms qc Will Thompson Duke
Asms qc Will Thompson DukeUC Davis
 
Phinney varibility workshop
Phinney varibility workshopPhinney varibility workshop
Phinney varibility workshopUC Davis
 
Colangelo asms workshop_061714
Colangelo asms workshop_061714Colangelo asms workshop_061714
Colangelo asms workshop_061714UC Davis
 
Moeller proteomics course
Moeller proteomics courseMoeller proteomics course
Moeller proteomics courseUC Davis
 

More from UC Davis (8)

Presentation phinney abrf 2019
Presentation phinney abrf 2019Presentation phinney abrf 2019
Presentation phinney abrf 2019
 
Prosit google-cloud
Prosit google-cloudProsit google-cloud
Prosit google-cloud
 
Phinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group TalkPhinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group Talk
 
Genome web july 2019 presentation phinney
Genome web july 2019 presentation phinneyGenome web july 2019 presentation phinney
Genome web july 2019 presentation phinney
 
Asms qc Will Thompson Duke
Asms qc Will Thompson DukeAsms qc Will Thompson Duke
Asms qc Will Thompson Duke
 
Phinney varibility workshop
Phinney varibility workshopPhinney varibility workshop
Phinney varibility workshop
 
Colangelo asms workshop_061714
Colangelo asms workshop_061714Colangelo asms workshop_061714
Colangelo asms workshop_061714
 
Moeller proteomics course
Moeller proteomics courseMoeller proteomics course
Moeller proteomics course
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

Multivariate Analysis and Visualization of Proteomic Data

  • 1. Multivariate Analysis and Visualization of ProteOmic Data Dmitry Grapov, PhD
  • 2. State of the art facility producing massive amounts of biological data… >20-30K samples/yr >200 studies
  • 3. Analysis at the ProteOmic Scale and Beyond Genomic Omic Multi-Omic Metabolomic Proteomic integration
  • 4. Sample Data Analysis and Visualization Variable Quality Assessment • use replicated mesurements and/or internal standards to estimate analytical variance Statistical and Multivariate • use the experimental design to test hypotheses and/or identify trends in analytes Functional • use statistical and multivariate results to identify impacted biochemical domains Network • integrate statistical and multivariate results with the experimental design and analyte metadata Sample Variable experimental design - organism, sex, age etc. analyte description and metadata - biochemical class, mass spectra, etc.
  • 5. Sample Data Analysis and Visualization Variable Quality Assessment • use replicated mesurements and/or internal standards to estimate analytical variance Statistical and Multivariate • use the experimental design to test hypotheses and/or identify trends in analytes Functional • use statistical and multivariate results to identify impacted biochemical domains Network • integrate statistical and multivariate results with the experimental design and analyte metadata Network Mapping Sample Variable experimental design - organism, sex, age etc. analyte description and metadata - biochemical class, mass spectra, etc.
  • 6. Data Quality Assessment Quality metrics •Precision (replicated measurements) •Accuracy (reference samples) Common tasks •normalization •outlier detection •missing values imputation
  • 7. Batch Effects Drift in >400 replicated measurements across >100 analytical batches for a single analyte Principal Component Analysis (PCA) of all analytes, showing QC sample scores Acquisition batch Abundance QCs embedded among >5,5000 samples (1:10) collected over 1.5 yrs If the biological effect size is less than the analytical variance then the experiment will incorrectly yield insignificant results
  • 8. Analyte specific data quality overview Sample specific normalization can be used to estimate and remove analytical variance Raw Data Normalized Data Normalizations need to be numerically and visually validated low precision log mean %RSD high precision Samples QCs Batch Effects
  • 9. Outlier Detection • 1 variable (univariate) • 2 variables (bivariate) • >2 variables (multivariate)
  • 10. bivariate vs. multivariate (scatter plot) outliers? mixed up samples (PCA scores plot) Outlier Detection
  • 11. Statistical and Multivariate Analyses Group 1 Statistics Multivariate Context + + = Network Mapping Ranked statistically significant differences within a a biochemical context Group 2 What analytes are different between the two groups of samples? Statistical t-Test significant differences lacking rank and context Multivariate O-PLS-DA ranked differences lacking significance and context
  • 12. Statistical and Multivariate Analyses Group 1 Statistics Multivariate Context + + = Network Mapping Group 2 What analytes are different between the two groups of samples? Statistical t-Test Multivariate O-PLS-DA To see the big picture it is necessary too view the data from multiple different angles
  • 13. Statistical Analysis: achieving ‘significance’ significance level (α) and power (1-β ) effect size (standardized difference in means) sample size (n) Power analyses can be used to optimize future experiments given preliminary data Example: use experimentally derived (or literature estimated) effect sizes, desired p-value (alpha) and power (beta) to calculate the optimal number of samples per group
  • 14. Statistical Tests Poisson normal • Should be chosen based on the distribution (shape, type) of the (e.g. normal, negative binomial, Poisson) • Can be optimized based on data pre-treatment (e.g. NSAF, Power Law Global Error Model, PLGEM)
  • 15. False Discovery Rate (FDR) Type I Error: False Positives (α) •Type II Error: False Negatives (β) •Type I risk = •1-(1-p.value)m m = number of variables tested
  • 16. False Discovery Rate Adjustment FDR adjusted p-value p-value Benjamini & Hochberg (1995) (“BH”) •Accepted standard Bonferroni •Very conservative •adjusted p-value = p-value x # of tests (e.g. 0.005 x 148 = 0.74 )
  • 17. Functional Analysis Identify changes or enrichment in biochemical domains • decrease • increase Nucl. Acids Res. (2008) 36 (suppl 2): W423-W426.doi: 10.1093/nar/gkn282
  • 18. Functional Analysis: Enrichment Biochemical Pathway Biochemical Ontology
  • 19. Common Multivariate Methods Clustering Projection Networks
  • 20. Artist: Chuck Close Cluster Analysis Useful for •pattern recognition •complexity reduction Common Methods •Hierarchical •Model based •Other (k-means, k-NN, PAM, fuzzy) Linkage k-means Distribution Density
  • 21. Hierarchical Clustering Similarity x x x x Dendrogram How does my metadata match my data structure?
  • 22. Projection Methods single analyte all analytes The algorithm defines the position of the light source Principal Components Analysis (PCA) • unsupervised • maximize variance (X) Partial Least Squares Projection to Latent Structures (PLS) • supervised • maximize covariance (Y ~ X) James X. Li, 2009, VisuMap Tech.
  • 23. Interpreting scores and loadings loadings represent how variables contribute to sample scores variables with the highest loadings have the greatest contribution to sample scores loadings scores Scores represent dis/similarities in samples based on all variables
  • 24. Networks Biochemical •interaction • enrichment •etc Empirical (dependency) •correlation •partial-correlation •clustering variable 1 variable 2 variable 3
  • 25. Enrichment Network Mapping of parents through children
  • 27. Empirical Networks • Correlation based networks (CN) (simple, tendency to hairball) • GGM or partial correlation based networks (advanced, preference of direct over indirect relationships • *Increase in robustness with sample size 10.1007/978-1-4614-1689-0_17
  • 28. Proteomic Case Study: Diabetes Markers • Small sample size (control =12, GDM =6); covariates (time of sample collection) • >600 measured colostrum proteins; ~ 300 NSAF normalized proteins retained • Multivariate classification with O-PLS-DA used to identify variables to test using PLGEM with correction for FDR • Partial-correlation protein-protein interaction network analysis
  • 29. DeviumWeb https://github.com/dgrapov/DeviumWeb • visualization • statistics • clustering • PCA • O-PLS
  • 30. DeviumWeb https://github.com/dgrapov/DeviumWeb • visualization • statistics • clustering • PCA • O-PLS
  • 31. Software and Resources •DeviumWeb- Dynamic multivariate data analysis and visualization platform url: https://github.com/dgrapov/DeviumWeb •imDEV- Microsoft Excel add-in for multivariate analysis url: http://sourceforge.net/projects/imdev/ •MetaMapR- Network analysis tools for metabolomics url: https://github.com/dgrapov/MetaMapR •TeachingDemos- Tutorials and demonstrations •url: http://sourceforge.net/projects/teachingdemos/?source=directory •url: https://github.com/dgrapov/TeachingDemos •Data analysis case studies and Examples url: http://imdevsoftware.wordpress.com/
  • 32. Questions? dgrapov@ucdavis.edu This research was supported in part by NIH 1 U24 DK097154