Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Odam: Open Data, Access and Mining by Daniel JACOB 276 views
- Power point teknology pendidikan by Abdullah Al-lampungy 111 views
- La corse. .splendide by Dimitri Haikin 312 views
- Yuli by Abdullah Al-lampungy 83 views
- Yuli by Abdullah Al-lampungy 262 views
- Service thinking cases consolidat... by Harumi Sugai 112 views

862 views

643 views

643 views

Published on

License: CC Attribution-NonCommercial License

No Downloads

Total views

862

On SlideShare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

10

Comments

0

Likes

1

No embeds

No notes for slide

- 1. ERVA: a novel method of binning, allowing chemical information to be highlighted, from 1H-NMR metabolomics data (1) PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon Daniel Jacob (1), Catherine Deborde (1), Annick Moing (1)
- 2. Metabolic fingerprinting Aims: Classification of samples & highlighting the metabolic biomarkers NMR Spectra Spectra processing Experiment Features Samples Statistical Analyses Data matrix D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 3. NMR Spectra Spectra processing Metabolic fingerprinting Aims: Classification of samples & highlighting the metabolic biomarkers Experiment RAW DATA Features Samples Data matrix Relevant Information Statistical Analyses D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 4. Spectra processing Metabolic fingerprinting D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 5. Data Reduction : Bucketing Comparison of resulting buckets produced by Equidistant and AIBIN(1) binning methods (1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790 • Take into account full data, including noise area • Generates asymmetric buckets which are not centered on the peaks. Drawbacks of the AIBIN binning method: D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 6. Data Reduction : Bucketing New approach called ERVA for Extraction of Relevant Variables for Analysis: • Convolution product between a spectrum (S) and the second order derivative of the Lorentzian function (SDL) Jacob D. et al (March 2013) Analytical and Bioanalytical Chemistry, 405, 5049-5061 • The convolution product gives a signal (in blue). • The zero crossings of the resulting signal extended each side by the value of σ (the full width at half maximum of Lorentzian function) give the bounds of the buckets D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 7. ERVA : Extraction of Relevant Variables for Analysis • a NMR spectrum is a sum of Lorentzian, plus noise and distortion, • the second derivative of a Lorentzian is symmetric, and its integral is zero. Why SDL ? Mathematically, applying such a convolution product on a spectrum is similar to partial wavelet decomposition In case of a full experimental design, the convolution product is applied on the average spectrum obtained by summation of all spectra. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 8. A1 A2 A3 E1 E2 E3E2 Comparison of resulting buckets produced by ERVA and AIBIN(1) binning methods - Sum of three identical Lorentzians but shifted between them with a ppm interval - A1,A2,A3: The bins produce by the AIBIN method delimited by the dotted lines - E1,E2,E3: , The bins produce by ERVA method are shown by superposed grey boxes. (1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790 1/ Integration of ERVA's buckets provides values closer together than those obtained by AIBIN method. 2/ Centres of buckets correspond to the centres of resonance peaks with the ERVA method unlike AIBIN method. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 9. Illustration of the effect of the alignment process. A1 A3 A2 Example of the “citrate-malate” zone from a NMR spectra set of Tomato -When a spectral peak alignment is required in the misaligned region involving alteration of the lower part of the peaks, impacts will remain relatively minor using the ERVA data reduction method. - Indeed, buckets produced by the ERVA method are mainly based on the central part of peaks. As shown below, the A1 region was first aligned and the A2 and A3 regions were then aligned in turn D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 10. Clustering of buckets Buckets now have a strong chemical meaning Thanks to their exact matching with the resonance peaks, since the resonance peaks are the fingerprints of chemical compounds • Compounds involved in the same biochemical pathway may present high correlations between their resonances, • But not usually as high as for resonances corresponding to the same molecule Realistic Assumption To generate relevant clusters (i.e. chemical compounds), an appropriate correlation threshold has to be applied on the correlation matrix before its cluster decomposition Appliance of a similar approach of clustering of latent variables(*) (CLV), which involves two steps: • a hierarchical clustering analysis based on correlations between buckets, • a partitioning algorithm (R IGRAPH package). (*) Vigneau E et al. (2005) Clustering of variables to analyze spectral data. J Chemom 19:122-128 D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 11. Effect of the correlation threshold on the size and number of buckets clusters The correlation threshold allowing a maximal discrimination of compounds (3) is one that gives the maximum number of clusters in the optimum range (grey area) defined by : (1) the higher limit of the size of the biggest cluster (40), (2)the higher value to the ratio of the criterion. Criterion = Total number of clusters Size of the biggest cluster PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France) Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines 8 and 4 F1 hybrids derived). D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 12. Buckets’ Clustering greatly helps the interpretation of discriminant analyses such as PCA, PLS, ... PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France) Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines 8 and 4 F1 hybrids derived). D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 13. Buckets’ Clustering greatly helps the interpretation of discriminant analyses such as PCA, PLS, ... Correlation threshold = 0.98 623 Buckets •Nb Clusters = 58 => 254 buckets •Biggest Cluster => 18 buckets Clusters mainly located at the periphery of a circle => biomarkers are highlighted D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 14. 1 2 2 1 1 R1 2 R2 2 > R1 2 Highlighting biomarkers 1 2 2 R2 2 • By chosing a good correlation threshold, clusters link mainly the buckets that have a "between-groups" variance, • Hoping that these "groups" corresponds to factor levels. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 15. Matching the bucket clusters with compounds Reference compound library: HMDB, MMCD, BMRB, … or a home-made library Scoring fonction is based on the concept of "valid cluster" introduced in Chenomx NMR suite 6.0 Clusters D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 16. d1 d2 d3 d4 d2 d4 Bucketing+Clustering+Matching: Focus on a small example Mounet et al (2006) Metabolomics, 2007, 3:273-288 d1 d2 d3 d1 d3 CLUSTER PPM: 3.235, 3.252, 3.269, 3.387, 3.398, 3.406, 3.417, 3.425, 3.436, 3.456, 3.461, 3.468, 3.472, 3.481, 3.487, 3.491, 3.499, 3.735, 3.745, 4.646, 4.662, 5.238, 5.245 # DBREF0014 (Glucose): Score=0.878068 : CLUSTER: 23/23 matches Matching ppm: 3.235, 3.252, 3.269, 3.387, 3.398, 3.406, 3.417, 3.425, 3.436, 3.456, 3.461, 3.468, 3.472, 3.481, 3.487, 3.491, 3.499, 3.735, 3.745, 4.646, 4.662, 5.238, 5.245 D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 17. Tomato Mounet et al (2006) Quantitative metabolic profiles of tomato flesh and seeds during fruit development: complementary analysis with ANN and PCA. Metabolomics, 2007, 3:273-288 Global approach to characterize changes in metabolic profiles in two interdependent tissues Seed and Flesh from the same tomato fruits during tomato fruit development. D.Jacob – 7 RFMF - Amiens, 10 juin 2013 •25 true positive compounds (more than 80 % of the 31 compounds identified by the expert user), •Including 21 compounds at rank 1 (nearly 70 %)
- 18. To summarize D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 19. Conclusions - Perspectives • The « Bucketing » and « Clustering » steps are very efficient to • Extract relevant information from raw data, • Allow the metabolic biomarkers to be highlighted,from 1H-NMR metabolomics data • The « Matching clusters » step is very efficient provided that • The relevant reference NMR spectra libray are available To address this need, MetaboHub aims to provide a bioinformatics framework to provide a centralized databases for managing metabolites spectral libraries, i.e. the most commonly observed in an experiment of metabolomics, and this, i) in the various domains (nutrition, medicine, environment, plant), ii) in several analytical techniques. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
- 20. Remerciements : UMR1332 BFP / PMFB Stéphane Bernillon Catherine Deborde Yves Gibon Mickaël Maucourt Annick Moing Dominique Rolin http://bit.ly/merybDominique Rolin http://bit.ly/meryb http://bit.ly/biostatflow https://code.google.com/p/nmr-viewer/
- 21. Correlation threshold = 0.969 •Nb Clusters = 58 => 316 buckets •Biggest Cluster => 40 buckets Effect of the correlation threshold on the number of buckets clusters (PCA loadings) Correlation threshold = 0.99 •Nb Clusters = 46 => 176 buckets •Biggest Cluster => 12 buckets Correlation threshold = 0.98 •Nb Clusters = 58 => 254 buckets •Biggest Cluster => 18 buckets

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment