Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Metabolomics by Shreya Ahuja 1731 views
- A Brief Introduction to Metabolomics by Ranjith Raj Vasam 749 views
- Metabolomics by priya1111 640 views
- Metabolomics by Tamil Nadu Agricu... 2943 views
- Metabolomics: The Next Generation o... by Metabolon, Inc. 3157 views
- 7 network mapping i by Dmitry Grapov 22336 views

6,696 views

Published on

No Downloads

Total views

6,696

On SlideShare

0

From Embeds

0

Number of Embeds

20

Shares

0

Downloads

220

Comments

0

Likes

4

No embeds

No notes for slide

- 1. Metabolomics Data Analysis Johan A. Westerhuis Swammerdam Institute for Life Sciences, University of Amsterdam Business Mathematics and Information, North-West University, Potchefstroom, South Africa egraSeqAhead, Barcelona February 2013
- 2. Metabolomics pipeline : Issues for biostatisticsBiological Data Statistical Biological Experimental Data Metabolitequestion Pre- Data inter- design acquisition identification processing analysis pretation Power analysis Normalisation Explorative Treatment Quantification Predictive design Hypothetical QC strategy biomarkers Measurement Spectral Network design matching inference, De NOVO MSEA, indentification Pathway analysis 3
- 3. Data Analysisspecial issue Metabolomics • Data preprocessing methods (make samples more comparable) • How to treat non-detects • Variable importance in multivariate models • Metabolic network analysis • Data fusion methods • Individual responses • Between metabolite ratio’s Guest Editors Jeroen J. Jansen Johan A. Westerhuis
- 4. Multivariate metabolomics data NONTARGETED PROFILING TARGETED ANALYSIS hipp fum urea allant TMAO citrat1 67 45 6 3 31 10 44 32 10 3 1 8 7 13 43 24 12 4 33 23 0 0 99 76 5 2 12 6 15 2 Technical correlations Biological correlations Biological correlations
- 5. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios• Biological Interpretation – Metabolite set enrichment, – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
- 6. Metabolomics Data preprocessing• Optimize biological content of data• Correct for incorrect sampling, sample workup issues, batch effects• What is the noise level in the data? Generalized log transform Variance stabilization.• High peaks more important than low peaks?• Multivariate methods love large values! 7
- 7. Metabolic changes during E. coli culturegrowth using k-means clustering. time metabolites(A) Growth curve (optical density) of unperturbed E. coli culture. Numbers of respective sampling time points are marked in the curve. Time point 0 minutes marks the application of the respective stress condition.(B) Relative changes of metabolites pools normalized time point 1. Fold change is presented on log10 scale. To reveal main trends of metabolic changes 10 K means clusters are color coded. Szymanski, Jedrzej et al. PLoS ONE (2009), vol. 4 issue. 10
- 8. Self Organising Map of Metabolites in serum 1H NMR spectra of 613 patients with type I diabetes and a diverse spread of complications Nonlinear mapping method for large number of samples. Relate position on the map to diagnostic responses. Can be made supervised1H NMR metabonomics approach to the disease continuum of diabetic complications and premature deathVP Mäkinen et al, Molecular Systems Biology 4:167, 2008
- 9. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised (Differentially expressed) – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios• Biological Interpretation – Metabolite set enrichment, Pathway analysis – Metabolomics Data – Metabolic network inference Fusion
- 10. Supervised Metabolomics Data analysis Case – Control (PLSDA) Y 4 Men 3 0 Women 2 0 1 0 PC2 0 1 -1 1 -2 1 -3 -4 -2 0 2 4 6 PC1 0.04• Is there really a difference between the groups ? 0.02 Statistical validation issues 0 PLS b• Which are the most important -0.02 peaks for discrimination ? -0.04 Variable importance -0.06 4 3.5 3 2.5 2 1.5 1 0.5 0 Chemical shift (ppm)
- 11. • Psyhogios example uitleggen met paper voorbeelden en metaboanalyst voorbeelden Proton NMR spectra of the urine samples were obtained on a 500MHz 1H NMR machine. 13
- 12. NMR spectra of urine samples 14
- 13. Nonsupervised SupervisedUNIVERSITY OF 15AMSTERDAM
- 14. Experimental Design ExampleExperiment:Rats are given Bromobenzene that affects the liverMeasurements: NMR spectroscopy of urine RatsExperimental Design: 6 hours 24 hours Time: 6, 24 and 48 hours 48 hours Groups: 3 doses of BB 3.0275 Vehicle group, Control group 2.055 5.38 3.285 3.0475 Animals: 3 rats per dose per time 3.675 3.7525 2.7175 2.075 2.93 point 10 8 6 4 2 0 chemical shift (ppm)
- 15. Different contributions Experimental Design Time 4 3.5 0 0.2 0.4 time 0.6 0.8 1Metabolite concentration 3 2.5 Dose 2 1.5 1 0 0.2 0.4 0.6 0.8 1 0.5 time 0 -0.5 0 0.2 0.4 0.6 0.8 1 time Animal Trajectories 0 0.2 0.4 time 0.6 0.8 1
- 16. ANOVA decomposition of each variable xhkihk k hk hkihk 43.5 32.5 21.5 10.5 0 0 0.2 0.4 0.6 0.8 1-0.5 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 MATRICES: X 1mT X α X αβ X αβγ
- 17. ANOVA and PCA ASCAX 1m Xα Xαβ Xαβγ T Pα Pαβ Pαβγ X E Tα Tαβ Tαβγ Parts of the data not explained by the component X 1mT TαPα TαβPαβ TαβγPαβγ E T T T models
- 18. Results 0.5 control vehicle 0.4 low Xαβγ mediumXα 0.3 high αβ -scores Xαβ Scores 0.2 0.1 40 % 0 -0.1 -0.2 6 24 48 Time (Hours)
- 19. Results biomarkers 3.0475 5.38 3.7525 3.675 Unique to the α submodelα Differences 3.9675 2.735 2.055 between submodels 2.5425 2.5825 2.6975 2.055 Interesting for Biology 2.075 Interesting for Statistics / 2.91 Diagnosticsαβ 3.0275 2.93 3.9675 2.735 2.6975 2.5825 3.285 3.2625 2.075 2.93αβγ 3.0475 2.055 3.73 3.8875 2.735 3.0275 3.285 10 8 6 4 2 0 chemical shift (ppm)
- 20. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite – Method comparison ratios• Biological Interpretation – Metabolite set enrichment – Metabolomics Data – Pathway analysis Fusion – Metabolic network inference
- 21. NONTARGETED SELDI measurements of serum samples of 20 Gaucher patients and 20 healthy controls. Gaucher is a genetic disease in which a fatty substance (lipid) accumulates in cells and certain organs
- 22. • human urine and porcine cerebrospinal fluid samples spiked with a range of peptides.• Variation in #samples, within and between group variation
- 23. Gaucher Spiked
- 24. Feature selection methods RESULTS• Complex nontargeted Gaucher profiling data with highly variable background and varying difference between case and control: Multivariate methods perform best.• Spiked LCMS targeted data with less variation in effect size: univariate and semi-univariate methods are best in selecting biomarkers.
- 25. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios• Biological Interpretation – Metabolite set enrichment, – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
- 26. Biomarkers:A: UnivariateB: MultivariateC: Change in group correlation
- 27. BMR of green tea intervention study 186 human subjects with abdominal obesityValidation shows significant changes in BMR between placebo and green tea treatmenttogether with most important triacylglycerols TG28-29 and TG41-42.
- 28. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios• Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
- 29. Plasma
- 30. Differences in blood metabolites due to aging
- 31. Aging biomarker metabolites in liver
- 32. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios• Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
- 33. Special topic: Metabolic networks Biochemical Network vs Association Network Figure 7 Marginal correlation network for a set of metabolites in tomato. Volatiles in red, derivatized metabolites in yellow. Solid lines represent positive correlations, dashed lines negative ones. Thickness of line corresponds to magnitude of ...Margriet M.W.B. Hendriks , Data-processing strategies for metabolomics studies, Trends in Analytical Chemistry, 20212
- 34. Metabolomics, 2005 Data from Potato tubers Metabolic neighbors Do not participate in common reactions High correlation due to e.g. chemical equilibrium, mass conservation,..“a systematic relationship between observed correlationnetworks and the underlying biochemical pathways.”Ralf Steuer: Observing and interpreting correlations in metabolomic networks, Bioinformatics, 2003
- 35. Metabolic Network InferenceSearch for the link between metabolome data and underlying metabolicnetworks. F A E ?? F A E C B C B D D As an example: can we distinguish healthy from diseased networks: C Glucose A B C Glucose A B G G G G D DHEALTHY DISEASE F F E E F F
- 36. From data to network NETWORK TOPOLOGYGoal: ? ? DIRECTIONSProblems: NOISE MISSING METABOLITES HUGE AMOUNT OF POSSIBLE NETWORK STRUCTURES 40
- 37. Inference from static data1. DATA COLLECTION 2. SIMILARITY SCORE CALCULATION 2a. Relevance Networks 2b. Conditioned NetworksA. EnzymaticVariability ALL POSSIBLE Pearson Correlation (PC) Partial Pearson Correlation (PPC) PAIRWISE 0.6 INTERACTIONS (linear) (linear) 0.55 F 0.5 A E F A E 0.45 2 0.4 1.5 B 0.35 B 1 100 200 300 400 500 600 700 800 900 1000 0.5 5 C C 2 0 4B. Intrinsic Variability 1 1.5 1 0.2 0.4 3 0.6 0.8 D D 2 0.9 0.5 1 5 0.8 0 0 0 1 2 3 4 4 0.2 0.4 0.6 0.8 0.7 3 0.6 2 1 0.5 0 F A E 0 1 2 3 4 0 0.4 50 100 2 1.5 F 0 2 4 6 8 B A E 1 C 0.5 B 5 0 4 C DC. Environmental 0.2 0.4 0.6 0.8 3 2Variability 1 D 0 0 1 2 3 4 Mutual Information (MI) Conditional Mutual Information (non-linear) (CMI) (non-linear) 0 50 100 10 20 30 40 50
- 38. ESTIMATION OF CORRELATION NETWORKS 1. ASPP 2. ASA 3. HS 4. HSP Real Pathway Vmax Variability Intrinsic Variability Environmental Variability PC ASPP ASA HS HSP PC ASPP ASA HS HSP PC ASPP ASA HS HSP MI ASPP ASA HS HSP MI ASPP ASA HS HSP MI ASPP ASA HS HSP PPC1 ASPP PPC1 ASPP ASA HS HSPPPC1 ASPP ASA HS HSP ASA HS HSPCMI1 ASPP ASA HS HSP CMI1 ASPP ASA HS HSP CMI1 ASPP ASA HS HSP PPCn ASPP ASA HS HSPPPCn ASPP ASA HS HSP PPCn ASPP ASA HS HSP 100% PC: Pearson Correlation (linear measure) > 90% MI: Entropy-based Mutual Information (non-linear measure) 10% … 90% PPC: Partial Pearson Correlation (linear conditioning measure) < 10% CMI: Conditional Mutual Information (nonlinear conditioning measure) 42 Cakir, Metabolomics 2009
- 39. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios• Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference
- 40. Metabolomics data fusion• Account for between-block difference in quality of measurements to improve data fusion• For example, multi-platform data fusion, with differences in quantification, (non) targeted, error structure Amino acids Lipids Fused data• How to quantify the quality of measurements with many metabolites, and many samples?
- 41. Error model for 1 metabolite QC sample -> RSDStandard Deviaton St.D • Error models: - RSD using 1 QC sample - 2-component using study samples M • Good error description - sufficient # samples A - large -range study samples I S Mean Intensity I
- 42. Figure of merit for data from 1 platform Median: F-50 = 0.1St.D Var. 15 Var. 365 90th-percentile: F-90 = 0.35 Number of peaks Var. 118 F-50 F-90 Var. 213 I(Van Batenburg et al. Analytical Chemistry, 2011)
- 43. Two-step data fusion j GC/MS LC/MS J1= 82 J2= 49 peaks peaks Ij M M • Step 1: Compute figures of merit for each platform
- 44. Two-step data fusion: MB-MLPCA • Step 2 : Multi-block PCA with weighting by figures of merit Fused error covariance X1 X2 Amino acids Lipids js ˆ2 • Method needs good estimation of error variance by – Repeats – QC samples
- 45. Realistic simulations using GCMS and LCMS data• Error variance estimated from duplicates• True error variance• Estimating variance from duplicates is problematic.• Use Mix of QC samples and repeats.
- 46. Multivariate Metabolomics Data analysis• Explorative – Find groups, clusters structure / outliers in metabolites and in samples• Supervised – Discriminate two or more groups to make predictive model and to find • Special topics biomarkers. – Between metabolite ratios• Biological Interpretation – Metabolite set enrichment – Metabolomics Data Pathway analysis Fusion – Metabolic network inference

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment