Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- 4 partial least squares modeling by Dmitry Grapov 28953 views
- 6 metabolite enrichment analysis by Dmitry Grapov 28161 views
- 3 principal components analysis by Dmitry Grapov 30327 views
- 2 cluster analysis by Dmitry Grapov 27671 views
- 8 network mapping II by Dmitry Grapov 27136 views
- 7 network mapping i by Dmitry Grapov 26994 views

27,732 views

Published on

No Downloads

Total views

27,732

On SlideShare

0

From Embeds

0

Number of Embeds

24,810

Shares

0

Downloads

103

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Introduction Introduction to Metabolomic Data Analysis Dmitry Grapov, PhD
- 2. Introduction Important •This is an introduction to a series of 8 tutorials for metabolomic data analysis •Download all the required files and software here: https://sourceforge.net/projects/teachingdemos/files/Winter%202014%20LC-MS%20and%20Statistics%20Course/ •Then follow the directions in the software/startup.R to launch all accompanying software
- 3. Goals?
- 4. Analysis at the Metabolomic Scale
- 5. Cycle of Scientific Discovery Hypothesis Hypothesis Generation Data Acquisition Data Processing Data Analysis Data
- 6. Univariate vs. Multivariate Multivariate Predictive Modeling Group 2 Group 1 Univariate Hypothesis testing (t-Test, ANOVA, etc.) PCA O-/PLS/-DA
- 7. Univariate vs. Multivariate univariate/bivariate vs. multivariate outliers? mixed up samples?
- 8. Data Analysis Goals Exploration Classification • Are there any trends in my data? – analytical sources – meta data/covariates • Useful Methods – matrix decomposition (PCA, ICA, NMF) – cluster analysis • Differences/similarities between groups? – discrimination, classification, significant changes • Useful Methods – analysis of variance (ANOVA), mixed effects models – partial least squares discriminant analysis (O-/PLS-DA) – Others: random forest, CART, SVM, ANN • What is related or predictive of my variable(s) of interest? – Regression, correlation • Useful Methods – correlation – partial least squares (O-/PLS) Prediction
- 9. Data Complexity Meta Data m n variables Experimental Design = complexity samples Data m-D 1-D 2-D Variable # = dimensionality
- 10. Univariate Qualities •length (sample size) •center (mean, median, geometric mean) •dispersion (variance, standard deviation) •range (min / max), •quantiles •shape (skewness, kurtosis, normality, etc.) standard deviation mean
- 11. Data Quality Metrics • Precision • Accuracy Remedies • normalization • outliers detection *Start lab 1-statistical analysis
- 12. Univariate Analyses •Identify differences in sample population means •sensitive to distribution shape •parametric = assumes normality •error in Y, not in X (Y = mX + error) wide •optimal for long data •assumed independence •false discovery rate (FDR) long n-of-one
- 13. False Discovery Rate (FDR) Type I Error: False Positives •Type II Error: False Negatives •Type I risk = •1-(1-p.value)m m = number of variables tested FDR correction • p-value adjustment or estimate of FDR (Fdr, q-value) Bioinformatics (2008) 24 (12):1461-1462
- 14. Achieving “significance” is a function of: significance level (α) and power (1-β ) effect size (standardized difference in means) sample size (n) *finish lab 1-statistical analysis
- 15. Clustering Identify •patterns •group structure •relationships •Evaluate/refine hypothesis •Reduce complexity Artist: Chuck Close
- 16. Cluster Analysis Use the concept similarity/dissimilarity to group a collection of samples or variables Linkage Approaches •hierarchical (HCA) •non-hierarchical (k-NN, k-means) •distribution (mixtures models) •density (DBSCAN) •self organizing maps (SOM) Distribution k-means Density
- 17. Hierarchical Cluster Analysis • similarity/dissimilarity defines “nearness” or distance euclidean manhattan Mahalanobis non-euclidean X X X * Y Y Y
- 18. Hierarchical Cluster Analysis Agglomerative/linkage algorithm defines how points are grouped single complete centroid average
- 19. Dendrograms x x x Similarity x
- 20. Hierarchical Cluster Analysis How does my metadata match my data structure? Exploration *finish lab 2-Cluster Analysis Confirmation
- 21. Projection of Data The algorithm defines the position of the light source Principal Components Analysis (PCA) • unsupervised • maximize variance (X) Partial Least Squares Projection to Latent Structures (PLS) • supervised • maximize covariance (Y ~ X) James X. Li, 2009, VisuMap Tech.
- 22. Interpreting PCA Results Variance explained (eigenvalues) Row (sample) scores and column (variable) loadings
- 23. How are scores and loadings related?
- 24. Centering and Scaling PMID: 16762068 *finish lab 3-Principal Components Analysis
- 25. Use PLS to test a hypothesis Partial Least Squares (PLS) is used to identify planes of maximum correlation between X measurements and Y (hypothesis) PLS PCA time = 0 120 min.
- 26. Modeling multifactorial relationships ~two-way ANOVA dynamic changes among groups
- 27. PLS Related Objects Model •dimensions, latent variables (LV) •performance metrics (Q2, RMSEP, etc) •validation (training/testing, permutation, cross-validation) •orthogonal correction Samples •scores •predicted values •residuals Variables •Loadings •Coefficients, summary of loadings based on all LVs •VIP, variable importance in projection •Feature selection
- 28. “goodness” of the model is all about the perspective Determine in-sample (Q2) and outof-sample error (RMSEP) and compare to a random model •permutation tests •training/testing *finish lab 4-Partial Least Squares and lab 5-Data Analysis Case Study
- 29. Biological Interpretation Projection or mapping of analysis results into a biological context. • Visualization • Enrichment • Networks – biochemical – structural – spectral – empirical
- 30. Identification of alterations in biochemical domains Organism specific biochemical relationships and information Multiple organism DBs •KEGG •BioCyc •Reactome •Human •HMDB •SMPDB *finish lab 6-Metabolite Enrichment Analysis
- 31. Network Mapping 1. Generate Connections 2. Calculate Mappings 3. Create Network Grapov D., Fiehn O., Multivariate and network tools for analysis and visualization of metabolomic data, ASMS, June 08, 2013, Minneapolis, MN
- 32. Connections and Contexts Biochemical (substrate/product) •Database lookup •Web query Chemical (structural or spectral similarity ) •fingerprint generation BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99 Empirical (dependency) •correlation, partial-correlation
- 33. Mapping Analysis Results Analysis results Network Annotation *finish lab 7-Network Mapping I Mapped Network
- 34. Biochemical Relationships http://www.genome.jp/dbget-bin/www_bget?rn:R00975
- 35. Structural Similarity http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
- 36. Mass Spectral Connections Watrous J et al. PNAS 2012;109:E1743-E1752 *finish lab 8-Network Mapping II

No public clipboards found for this slide

Be the first to comment