ERVA-NMR
Upcoming SlideShare
Loading in...5
×
 

ERVA-NMR

on

  • 1,362 views

Spectra processing is crucial in metabolomics approaches, especially for proton NMR metabolomic profiling, since each processing step may impact the following steps. Among the different ...

Spectra processing is crucial in metabolomics approaches, especially for proton NMR metabolomic profiling, since each processing step may impact the following steps. Among the different processing steps, data reduction (binning or bucketing) strongly impacts subsequent statistical data analysis and potential biomarker discovery. Based on a recently published work, we propose an improved method of data reduction, called ERVA which stands for Extraction of Relevant Variables for Analysis. This new method, by providing buckets centred on resonance peaks and rid of any non-significant signal, helps to recover the chemical fingerprints of metabolites. Moreover, we take advantage of the concentration variability of each compound from a series of samples of a complex mixture, to highlight chemical information. This is performed by linking the buckets into clusters based on significant correlations, thus bringing a helpful support for compound identification. As a proof of concept, this new method has been applied to a tomato 1H-NMR dataset to test its ability to recover fruit extract composition.

Statistics

Views

Total Views
1,362
Views on SlideShare
1,362
Embed Views
0

Actions

Likes
1
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ERVA-NMR ERVA-NMR Presentation Transcript

  • ERVA: a novel method of binning, allowing chemical information to be highlighted, from 1H-NMR metabolomics data (1) PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon Daniel Jacob (1), Catherine Deborde (1), Annick Moing (1)
  • Metabolic fingerprinting Aims: Classification of samples & highlighting the metabolic biomarkers NMR Spectra Spectra processing Experiment Features Samples Statistical Analyses Data matrix D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • NMR Spectra Spectra processing Metabolic fingerprinting Aims: Classification of samples & highlighting the metabolic biomarkers Experiment RAW DATA Features Samples Data matrix Relevant Information Statistical Analyses D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Spectra processing Metabolic fingerprinting D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Data Reduction : Bucketing Comparison of resulting buckets produced by Equidistant and AIBIN(1) binning methods (1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790 • Take into account full data, including noise area • Generates asymmetric buckets which are not centered on the peaks. Drawbacks of the AIBIN binning method: D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Data Reduction : Bucketing New approach called ERVA for Extraction of Relevant Variables for Analysis: • Convolution product between a spectrum (S) and the second order derivative of the Lorentzian function (SDL) Jacob D. et al (March 2013) Analytical and Bioanalytical Chemistry, 405, 5049-5061 • The convolution product gives a signal (in blue). • The zero crossings of the resulting signal extended each side by the value of σ (the full width at half maximum of Lorentzian function) give the bounds of the buckets D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • ERVA : Extraction of Relevant Variables for Analysis • a NMR spectrum is a sum of Lorentzian, plus noise and distortion, • the second derivative of a Lorentzian is symmetric, and its integral is zero. Why SDL ? Mathematically, applying such a convolution product on a spectrum is similar to partial wavelet decomposition In case of a full experimental design, the convolution product is applied on the average spectrum obtained by summation of all spectra. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • A1 A2 A3 E1 E2 E3E2 Comparison of resulting buckets produced by ERVA and AIBIN(1) binning methods - Sum of three identical Lorentzians but shifted between them with a ppm interval - A1,A2,A3: The bins produce by the AIBIN method delimited by the dotted lines - E1,E2,E3: , The bins produce by ERVA method are shown by superposed grey boxes. (1) AIBIN: Adaptive, Intelligent Binning Algorithm, de Meyer T et al. (2008) Anal. Chem 80:3783–3790 1/ Integration of ERVA's buckets provides values closer together than those obtained by AIBIN method. 2/ Centres of buckets correspond to the centres of resonance peaks with the ERVA method unlike AIBIN method. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Illustration of the effect of the alignment process. A1 A3 A2 Example of the “citrate-malate” zone from a NMR spectra set of Tomato -When a spectral peak alignment is required in the misaligned region involving alteration of the lower part of the peaks, impacts will remain relatively minor using the ERVA data reduction method. - Indeed, buckets produced by the ERVA method are mainly based on the central part of peaks. As shown below, the A1 region was first aligned and the A2 and A3 regions were then aligned in turn D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Clustering of buckets Buckets now have a strong chemical meaning Thanks to their exact matching with the resonance peaks, since the resonance peaks are the fingerprints of chemical compounds • Compounds involved in the same biochemical pathway may present high correlations between their resonances, • But not usually as high as for resonances corresponding to the same molecule Realistic Assumption To generate relevant clusters (i.e. chemical compounds), an appropriate correlation threshold has to be applied on the correlation matrix before its cluster decomposition Appliance of a similar approach of clustering of latent variables(*) (CLV), which involves two steps: • a hierarchical clustering analysis based on correlations between buckets, • a partitioning algorithm (R IGRAPH package). (*) Vigneau E et al. (2005) Clustering of variables to analyze spectral data. J Chemom 19:122-128 D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Effect of the correlation threshold on the size and number of buckets clusters The correlation threshold allowing a maximal discrimination of compounds (3) is one that gives the maximum number of clusters in the optimum range (grey area) defined by : (1) the higher limit of the size of the biggest cluster (40), (2)the higher value to the ratio of the criterion. Criterion = Total number of clusters Size of the biggest cluster PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France) Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines 8 and 4 F1 hybrids derived). D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Buckets’ Clustering greatly helps the interpretation of discriminant analyses such as PCA, PLS, ... PhenoTom. – UR 1052 Unité Génétique et Amélioration des Fruits et Légumes - INRA - Montfavet (France) Characterization of tomato fruits in two stages (expansion and red orange fruit) from 12 contrasting genotypes (lines 8 and 4 F1 hybrids derived). D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Buckets’ Clustering greatly helps the interpretation of discriminant analyses such as PCA, PLS, ... Correlation threshold = 0.98 623 Buckets •Nb Clusters = 58 => 254 buckets •Biggest Cluster => 18 buckets Clusters mainly located at the periphery of a circle => biomarkers are highlighted D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • 1 2 2 1 1 R1 2 R2 2 > R1 2 Highlighting biomarkers 1 2 2 R2 2 • By chosing a good correlation threshold, clusters link mainly the buckets that have a "between-groups" variance, • Hoping that these "groups" corresponds to factor levels. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Matching the bucket clusters with compounds Reference compound library: HMDB, MMCD, BMRB, … or a home-made library Scoring fonction is based on the concept of "valid cluster" introduced in Chenomx NMR suite 6.0 Clusters D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • d1 d2 d3 d4 d2 d4 Bucketing+Clustering+Matching: Focus on a small example Mounet et al (2006) Metabolomics, 2007, 3:273-288 d1 d2 d3 d1 d3 CLUSTER PPM: 3.235, 3.252, 3.269, 3.387, 3.398, 3.406, 3.417, 3.425, 3.436, 3.456, 3.461, 3.468, 3.472, 3.481, 3.487, 3.491, 3.499, 3.735, 3.745, 4.646, 4.662, 5.238, 5.245 # DBREF0014 (Glucose): Score=0.878068 : CLUSTER: 23/23 matches Matching ppm: 3.235, 3.252, 3.269, 3.387, 3.398, 3.406, 3.417, 3.425, 3.436, 3.456, 3.461, 3.468, 3.472, 3.481, 3.487, 3.491, 3.499, 3.735, 3.745, 4.646, 4.662, 5.238, 5.245 D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Tomato Mounet et al (2006) Quantitative metabolic profiles of tomato flesh and seeds during fruit development: complementary analysis with ANN and PCA. Metabolomics, 2007, 3:273-288 Global approach to characterize changes in metabolic profiles in two interdependent tissues Seed and Flesh from the same tomato fruits during tomato fruit development. D.Jacob – 7 RFMF - Amiens, 10 juin 2013 •25 true positive compounds (more than 80 % of the 31 compounds identified by the expert user), •Including 21 compounds at rank 1 (nearly 70 %)
  • To summarize D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Conclusions - Perspectives • The « Bucketing » and « Clustering » steps are very efficient to • Extract relevant information from raw data, • Allow the metabolic biomarkers to be highlighted,from 1H-NMR metabolomics data • The « Matching clusters » step is very efficient provided that • The relevant reference NMR spectra libray are available To address this need, MetaboHub aims to provide a bioinformatics framework to provide a centralized databases for managing metabolites spectral libraries, i.e. the most commonly observed in an experiment of metabolomics, and this, i) in the various domains (nutrition, medicine, environment, plant), ii) in several analytical techniques. D.Jacob – 7 RFMF - Amiens, 10 juin 2013
  • Remerciements : UMR1332 BFP / PMFB Stéphane Bernillon Catherine Deborde Yves Gibon Mickaël Maucourt Annick Moing Dominique Rolin http://bit.ly/merybDominique Rolin http://bit.ly/meryb http://bit.ly/biostatflow https://code.google.com/p/nmr-viewer/
  • Correlation threshold = 0.969 •Nb Clusters = 58 => 316 buckets •Biggest Cluster => 40 buckets Effect of the correlation threshold on the number of buckets clusters (PCA loadings) Correlation threshold = 0.99 •Nb Clusters = 46 => 176 buckets •Biggest Cluster => 12 buckets Correlation threshold = 0.98 •Nb Clusters = 58 => 254 buckets •Biggest Cluster => 18 buckets