Book Paid Powai Call Girls Mumbai š 9930245274 š Low Budget Full Independent H...
Ā
Shockomics milano april_2016_v2
1. Problem deļ¬nition Methods Results Conclusions
Data Mining Activities for Shockomics
Vicent J. Ribas
Custom Software and Electronics
Barcelona, Catalonia
April 19, 2016
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
2. Problem deļ¬nition Methods Results Conclusions
Contents
1 Problem deļ¬nition
2 Methods
3 Results
4 Conclusions
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
3. Problem deļ¬nition Methods Results Conclusions
Problem deļ¬nition
Objective: detect the clinical variables, proteins and
metabolites that present the highest impact on the three
diļ¬erent types of shock that we are studying at three diļ¬erent
times (T1, T2 and discharge). Of course, attending to the
nature of this study, we will have many more variables/data
observed than patients.
Deļ¬nition: āa small n, large pā is a data mining problem that
tries to predict response to treatment based on data having a
very small sample size (n) but very large dimensionality (p).
This problem needs to be tackled through data visualisation,
dimensionality reduction techniques (shrinkage methods) or a
combination of both.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
4. Problem deļ¬nition Methods Results Conclusions
small n large p
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
5. Problem deļ¬nition Methods Results Conclusions
Contents
1 Problem deļ¬nition
2 Methods
3 Results
4 Conclusions
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
6. Problem deļ¬nition Methods Results Conclusions
Methods
In line with the description of WP8, the following methods are
considered for studying our n >> p problem:
Graphical models:
Factorisation of the probability densty function (Hammersley
Cliļ¬ord theorem).
Conditional independence maps (CIM - mutual information
between sets of variables).
Graphical LASSO.
Ensembles of classiļ¬cation and regression trees (bagged trees).
Dendrograms, Kohonen Maps and Heat Maps.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
7. Problem deļ¬nition Methods Results Conclusions
Methods
Regularised models:
Regularised linear regression.
LASSO.
Relevance vector machines (regularised support vector
machines with a-priori distributions).
Support Vector Machines with generative kernels (the
Quotient Basis Kernel).
Neural Networks (Restricted Boltzmann Machines/Factor
Analysis Models).
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
8. Problem deļ¬nition Methods Results Conclusions
Conditional Independence for Gaussian Distribution
Let X = (X1, . . . , Xp) be a random vector with joint normal
distribution Np(Āµ, Ī£) with mean vector Āµ ā Rp and positive
deļ¬nite covariance matrix Ī£. For three pairwise disjoint index sets
A, B, C ā {1, . . . , p}, the sub-vectors XA and XB are conditionally
independent given XC , iļ¬
det Ī£A CĆB C = 0
Here A C Ć B C is the minor related to the variables over
which we are calculating the marginal dependence/independence
(i.e. the resulting minor after removing the rows and columns
corresponding to the conditional independence statement).
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
9. Problem deļ¬nition Methods Results Conclusions
Markov Random Fields
Given and undirected graph G, a Markov Random Field (MRF) is
deļ¬ned as a set of probability distributions
MRFG := {p(x) : p(x) > 0, āp, x} such that āp ā MRFG and for
any three disjoint subsets A, B, C of G, if C separates A from B
then p satisļ¬es XA ā„ā„ XB|XC . If p ā MRFG, we often say p
respects G.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
10. Problem deļ¬nition Methods Results Conclusions
Hammersley Cliļ¬ord theorem
If a pdf p ā MRFG, then p(x) must also factorize according to G,
i.e. there exist functions Ļc(x) on c ā C, such that
p(x) =
1
Z
exp
cāC
Ļc(xc) .
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
11. Problem deļ¬nition Methods Results Conclusions
Statistical methods
The method above is exact but it requires the calculation of
the inverse of covariance matrices, which for the
dimenstionality that we are considering is not viable.
However, some numerical methods based on the same
framework will help.
CIM is an eļ¬cient method for ļ¬nding the structure of a
graphical model. In particular, it is based on the measure of
mutual information between sets of variables. A mutual
information of zero corresponds to mutual statistical
independence.
The graphical lasso is an algorithm to estimate the precision
matrix (inverse of covariance matrix) from the available data.
The algorithm is based on a Lasso regularisation over the
estimated covariance matrix.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
12. Problem deļ¬nition Methods Results Conclusions
CIM
I(A, B) =
a,b
P(a, b) log
P(a, b)
P(a)P(b)
and the conditional mutual information, with respect to the
condition set C,
I(A, B|C) =
a,b,c
P(a, b, c) log
P(a, b|c)
P(a|c)P(b|c)
.
The conditional mutual information is consistent with Hammersley
Cliļ¬ord and Factorisation because mutual information will be 0 iļ¬
A ā„ā„ B|C.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
13. Problem deļ¬nition Methods Results Conclusions
Shrinkage methods
Linear Regression: minw LĪ»(w, y, x) = M
i=1 (yi ā g(xi))2
,
Regularised linear regression:
minw LĪ»(w, y, x) = minw Ī» w 2 + M
i=1 (yi ā g(xi))2
,
Lasso (L1 loss function):
minw LĪ»(w, y, x) = minw Ī»|w|1 + M
i=1 (yi ā g(xi))2
,
Regularised support vector machines with a-priori
distributions:
ln p(t | s, Ļā2) = 1
2
M
i=1 ln si ā N
2 ln Ļā2 + ln(2Ļ)
ā1
2 Ļā2tT t ā ĀµT Ī£ā1Āµ + ln|Ī£| ,
which has to be maximized w.r.t. Ļā2 and s.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
14. Problem deļ¬nition Methods Results Conclusions
Contents
1 Problem deļ¬nition
2 Methods
3 Results
4 Conclusions
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
15. Problem deļ¬nition Methods Results Conclusions
ProteoSepsis
Objective: assess the metabolites of the ProteoSepsis biobank
showing the strongest interaction with organ dysfunction
assessed through the SOFA score and Lactate levels.
Methodology: graphical models with Conditional
Independence Maps.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
16. Problem deļ¬nition Methods Results Conclusions
ProteoSepsis
At ICU admission, there was a signiļ¬cant interaction between
C3-DC / C4-OH (Hydroxybutyrylcarnitine) and C5
(Valerylcarnitine).
We also found a signiļ¬cant interaction between C3-DC /
C4-OH (Hydroxybutyrylcarnitine) and C5 (Valerylcarnitine)
and Isoleucine (Ile) at 48h and ICU discharge.
A separate conditional analysis for each metabolite family
yielded the interactions in the following table.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
17. Problem deļ¬nition Methods Results Conclusions
Phosphatidylcholines LysoPC a C18:1
PC aa C28:1
PC aa C38:1
MUPC / SPC
Amines and Amino-acids Creatinine
Kynurenine
Putrescine
t4-OH-Pro
Taurine
Total DMA
Glu
Ile
Ala
Leu
Arg
Sphingolipids SM C24:1
SM C18:0
SM C20:2
SM C24:0
SM C24:1
SM C26:1
Acylcarnitines C14:1
C14:2
C2
C5
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
18. Problem deļ¬nition Methods Results Conclusions
Albios
Objective: assess the proteins of the Albios biobank showing
the strongest interaction with organ dysfunction assessed
through Lactate levels.
Methodology: Graphical Lasso and Elastic Network (Lasso)
over a set of 660 proteins (graph with 662 nodes and 28370
edges).
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
19. Problem deļ¬nition Methods Results Conclusions
Albios
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
20. Problem deļ¬nition Methods Results Conclusions
Albios
At ICU admission, there was a signiļ¬cant interaction with:
Angiotensinogen P01019 (p = 0.01).
Neural cell adhesion molecule L1-like protein O00533
(p = 0.02).
Complement C3 P01024 (p = 0.01)
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
21. Problem deļ¬nition Methods Results Conclusions
Albios
At day 7, there was a signiļ¬cant interaction with:
Phosphatidylnositol-glycan biosynthesis class X protein
Q8TBF5 (p = 0.024).
Myosin 9 P35579 (p = 0.078).
Beta-2-Glycoprotein P02749 (p = 0.0614)
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
22. Problem deļ¬nition Methods Results Conclusions
Contents
1 Problem deļ¬nition
2 Methods
3 Results
4 Conclusions
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics
23. Problem deļ¬nition Methods Results Conclusions
Conclusions
Methodology
We propose to analyse the interactions between the diļ¬erent
āomicsā and clinical data for each shock.
If possible, we will provide a stratiļ¬cation framework to study
the evolution of patients over time.
Vicent J. Ribas Custom Software and Electronics Barcelona, Catalonia
Data Mining Activities for Shockomics