Call Girls In Karkardooma 83770 87607 Just-Dial Escorts Service 24X7 Avilable
Representation of metabolomic data with wavelets
1. Representation of metabolomic data with wavelets
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
Toulouse School of Economics
Workgroup BioPuces, INRA de Castanet
June 5th, 2009
BioPuces (05/06/09) Nathalie Villa Metabolomic data 1 / 16
2. Sommaire
1 Database presentation
2 Wavelet representation
3 Perspective of work
BioPuces (05/06/09) Nathalie Villa Metabolomic data 2 / 16
4. Database presentation
Basics about the data base
The database was given by Alain Paris (INRA) and consists of
metabolomic registration (H NMR) from urine of mice.
950 variables from 0.505 ppm to 9.995 ppm.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
5. Database presentation
Basics about the data base
The database was given by Alain Paris (INRA) and consists of
metabolomic registration (H NMR) from urine of mice.
950 variables from 0.505 ppm to 9.995 ppm.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
6. Database presentation
Basics about the data base
The database was given by Alain Paris (INRA) and consists of
metabolomic registration (H NMR) from urine of mice.
950 variables from 0.505 ppm to 9.995 ppm.
Baseline has been removed and peaks have been aligned.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 4 / 16
7. Database presentation
Purpose of the work
Study the effects of the ingestion of Hypochoeris radicata (HR) on the
metabolism: the inflorescences of this plant are known to be responsible
for a horse desease, the Australian stringhalt.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 5 / 16
8. Database presentation
Purpose of the work
Study the effects of the ingestion of Hypochoeris radicata (HR) on the
metabolism: the inflorescences of this plant are known to be responsible
for a horse desease, the Australian stringhalt.
As it is hard to obtain several dizains of horses to kill them, the
experiments have been conducted on 72 mice.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 5 / 16
9. Database presentation
Description of the experiment
72 mice from:
2 sexes 36 males 36 females
BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
10. Database presentation
Description of the experiment
72 mice from:
2 sexes 36 males 36 females
3 kinds of HR doses 0 (control) : 24 mice 3%: 24 mice 9%: 24 mice
BioPuces (05/06/09) Nathalie Villa Metabolomic data 6 / 16
13. Database presentation
Measurements days
The urine was collected:
Days 0 1 4 8 11 15 18 21
Nb of observations 68 68 68 66 46 44 19 18
BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
14. Database presentation
Measurements days
The urine was collected:
Days 0 1 4 8 11 15 18 21
Nb of observations 68 68 68 66 46 44 19 18
For each mice, from 2 to 22 measurements are made.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
15. Database presentation
Measurements days
The urine was collected:
Days 0 1 4 8 11 15 18 21
Nb of observations 68 68 68 66 46 44 19 18
For each mice, from 2 to 22 measurements are made.
In conclusion, 397 observations for 950 variables.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 7 / 16
17. Wavelet representation
Basic principle of wavelets
For a given J integer, the spectra can be expressed at level J as:
f(x) =
k
αk 2−J/2
Ψ(2−J
x − k) +
J
j=1 k
βjk 2−j/2
Φ 2−j
x − k
BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
18. Wavelet representation
Basic principle of wavelets
For a given J integer, the spectra can be expressed at level J as:
f(x) =
k
αk 2−J/2
Ψ(2−J
x − k)
Trend: based on the father wavelet Ψ
+
J
j=1 k
βjk 2−j/2
Φ 2−j
x − k
BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
19. Wavelet representation
Basic principle of wavelets
For a given J integer, the spectra can be expressed at level J as:
f(x) =
k
αk 2−J/2
Ψ(2−J
x − k)
Trend: based on the father wavelet Ψ
+
J
j=1 k
βjk 2−j/2
Φ 2−j
x − k
Details at levels 1,...,J: based on the mother wavelet Φ
BioPuces (05/06/09) Nathalie Villa Metabolomic data 9 / 16
20. Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
21. Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
22. Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
↓
Level 2 Trend Details
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
23. Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
↓
Level 2 Trend Details
. . .
↓
Level 9 Trend Details
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
24. Wavelet representation
Hierarchical decomposition
We add 74 zero values at the end of the spectra to have a diadic discrete
sampling.
Original Data: f observed at t1 ... t1024 equally spaced
↓
Level 1 Trend Details
↓
Level 2 Trend Details
. . .
↓
Level 9 Trend Details
⇒ At level 9 (maximum level with 1024 length discrete sampling), we
obtain 1025 coefficients.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 10 / 16
26. Wavelet representation
Denoising
For coefficients corresponding to details greater than J (with J large
enough), a filtering is made:
c∗
=
0 if |c| < 2 log 10ˆσ
c if |c| ≥ 2 log 10ˆσ
(Donoho and Johnstone)
BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
27. Wavelet representation
Denoising
For coefficients corresponding to details greater than J (with J large
enough), a filtering is made:
c∗
=
0 if |c| < 2 log 10ˆσ
c if |c| ≥ 2 log 10ˆσ
(Donoho and Johnstone)
Two parameters are to be tuned:
• Which wavelet has to be used?
• Which J has to be used?
to make a trade-off between quality of the reconstruction of the function
(what are the values on the functions built on the the basis of the filtered
coefficients?) and the number of non negative coefficients.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
28. Wavelet representation
Denoising
For coefficients corresponding to details greater than J (with J large
enough), a filtering is made:
c∗
=
0 if |c| < 2 log 10ˆσ
c if |c| ≥ 2 log 10ˆσ
(Donoho and Johnstone)
Two parameters are to be tuned:
• Which wavelet has to be used?
• Which J has to be used?
to make a trade-off between quality of the reconstruction of the function
(what are the values on the functions built on the the basis of the filtered
coefficients?) and the number of non negative coefficients.
Minimization of an empirical (self-created) quality criterium:
1
n
i
1
D
j
fi(tj) − ˆfi(tj)
2
+
Nb of non negative coefficients
Nb of coefficients
BioPuces (05/06/09) Nathalie Villa Metabolomic data 12 / 16
32. Perspective of work
Sommaire
1 Database presentation
2 Wavelet representation
3 Perspective of work
BioPuces (05/06/09) Nathalie Villa Metabolomic data 15 / 16
33. Perspective of work
Using random forests
The idea is to use random forest to make prediction and also extract the
main coefficients responsible for the explanation of the target variables.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16
34. Perspective of work
Using random forests
The idea is to use random forest to make prediction and also extract the
main coefficients responsible for the explanation of the target variables.
Proposed regression: the scale coefficients will be the explanatory
variables. The variable of interest could be:
• the dose (either as a number or as a class leading to a classification
problem);
• the total dose injected (i.e., the dose multiplied by the number of
days of ingestion);
• any other interesting idea?
BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16
35. Perspective of work
Using random forests
The idea is to use random forest to make prediction and also extract the
main coefficients responsible for the explanation of the target variables.
Proposed regression: the scale coefficients will be the explanatory
variables. The variable of interest could be:
• the dose (either as a number or as a class leading to a classification
problem);
• the total dose injected (i.e., the dose multiplied by the number of
days of ingestion);
• any other interesting idea?
The idea is to rebuilt the individuals from the main coefficients (putting the
others to zero) to see which peaks are different from one group to the
others.
BioPuces (05/06/09) Nathalie Villa Metabolomic data 16 / 16