SlideShare a Scribd company logo
Kernel methods for data integration in systems biology
Nathalie Vialaneix
nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
Séminaire CBI
February 17, 2020 – Toulouse
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 1/37
A short bio
trained as a mathematician, statistician
application: research applied to human health (obesity) and animal
genomics
data: mostly transcriptome but also Hi-C and metabolome and (to a
lesser extent) scRNAseq, metagenomics, ATACseq, ...
methods: networks (inference, mining), omics data integration,
machine learning (including random forest, SVM and neural networks)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 2/37
Examples of past works
inferring and understanding the relations between
gene expression, lipids and phenotypes (weight,
waist circumference, ...) in adipose tissu (Diogenes)
⇒ network inference and mining, data integration,
missing data, ... [Montastier et al., 2015, Imbert et al., 2018]
and R package RNAseqNet
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 3/37
Examples of past works
integrating
expression and location (3D DNA FISH)
for network inference in fetal pig tissus
[Marti-Marimon et al., 2018]
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 4/37
Other activities
including training for biologists in RNAseq data
analysis, basic statistics, graphics with R...
organizer of the working group “Biopuces”
http://www.nathalievialaneix.eu/biopuces and active member of
“Chrocogen” https://groupes.renater.fr/sympa/info/chrocogen
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 5/37
In this talk...
How to integrate multiple omics data from various sources and various
types with kernels?
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 6/37
In this talk...
How to integrate multiple omics data from various sources and various
types with kernels?
Disclaimer: equations included (not necessary to understand the talk but
necessary for the speaker to understand her own work during the talk)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 6/37
A primer on kernel methods for
biology
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 7/37
Before we start: context and motivations
Data characteristics
a few (paired) samples
information at various levels
... but of heterogeneous types
and, when numeric, with a large
dimension
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 8/37
Before we start: context and motivations
Data characteristics
a few (paired) samples
information at various levels
... but of heterogeneous types
and, when numeric, with a large
dimension
What we want to achieve
integrative analysis
to predict a phenotype, to
understand the typology of the
samples, ...
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 8/37
In short: what are kernels?
Data we are used to...
n samples on which p variables are
measured (xi)i=1,...,n with xi ∈ Rp
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
In short: what are kernels?
Data we are used to...
n samples on which p variables are
measured (xi)i=1,...,n with xi ∈ Rp
From that, we can compute:
centers of gravity: x = 1
n
n
i=1 xi
distances and dot products:
d(xi, xi ) = p
j=1
(xij − xi j)2
and xi, xi = p
j=1
xijxi j
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
In short: what are kernels?
Data we are used to...
n samples on which p variables are
measured (xi)i=1,...,n with xi ∈ Rp
From that, we can compute:
centers of gravity: x = 1
n
n
i=1 xi
distances and dot products:
d(xi, xi ) = p
j=1
(xij − xi j)2
and xi, xi = p
j=1
xijxi j
Kernels...
The characteristics on the n samples
(xi)i are summarized by pairwise
similarities
More formally: n × n-matrix K, st K is
symmetric and positive definite
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
In short: what are kernels?
Data we are used to...
n samples on which p variables are
measured (xi)i=1,...,n with xi ∈ Rp
From that, we can compute:
centers of gravity: x = 1
n
n
i=1 xi
distances and dot products:
d(xi, xi ) = p
j=1
(xij − xi j)2
and xi, xi = p
j=1
xijxi j
Kernels...
The characteristics on the n samples
(xi)i are summarized by pairwise
similarities
More formally: n × n-matrix K, st K is
symmetric and positive definite
Representer Theorem:
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
Why are kernels interesting?
1 because they can reduce high dimensional data in small similarity
matrices
2 because they are not restricted to data in Rp
(kernels on graphs,
between graphs, on text, ...) some examples to come
3 because they can embed expert knowledge (i.e., phylogeny between
taxons for instance) some examples to come
4 because they offer a rigorous framework to extend many statistical
methods basic principles to come just after
5 because they offer a clean and common framework for data
integration topic of this talk
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
10/37
Why are kernels interesting?
1 because they can reduce high dimensional data in small similarity
matrices
2 because they are not restricted to data in Rp
(kernels on graphs,
between graphs, on text, ...) some examples to come
3 because they can embed expert knowledge (i.e., phylogeny between
taxons for instance) some examples to come
4 because they offer a rigorous framework to extend many statistical
methods basic principles to come just after
5 because they offer a clean and common framework for data
integration topic of this talk
but:
1 the choice of the relevant kernel is still up to you...
2 can strongly increase computational time when n is large...
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
10/37
Kernel examples
1 Rp
observations: Gaussian kernel Kii = e−γ xi−xi
2
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
11/37
Kernel examples
1 Rp
observations: Gaussian kernel Kii = e−γ xi−xi
2
2 nodes of a graph: [Kondor and Lafferty, 2002]
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
11/37
Kernel examples
1 Rp
observations: Gaussian kernel Kii = e−γ xi−xi
2
2 nodes of a graph: [Kondor and Lafferty, 2002]
3 sequence kernels (used to compute similarities between proteins for
instance): spectrum kernel [Jaakkola et al., 2000] (with HMM),
convolution kernel [Saigo et al., 2004]
4 kernel between graphs (or “structured data”; used in metabolomics to
compute similarities between metabolites based on their
fragmentation trees): [Shen et al., 2014, Brouard et al., 2016]
More examples: [Mariette and Vialaneix, 2019]
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
11/37
Principles for learning from kernels
Start from any statistical method (PCA, regression, k-means clustering)
and rewrite all quantities using:
K to compute distances and dot products
dot product is: Kii and distance is:
√
Kii + Ki i − 2Kii
(implicit) linear or convex combinations of (φ(xi))i to describe all
unobserved elements (centers of gravity and so on...)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
12/37
A simple example: k-means
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
13/37
A simple example: k-means
1: Initialization: random initialization of P centers ¯xCt
j
∈ Rp
2: for t = 1 to T do
3: Affectation step ∀ i = 1, ..., n
ft+1
(xi) = argmin
j=1,...,P
d(xi, ¯xCt
j
)
4: Representation step
∀ j = 1, . . . , P, ¯xCt
j
=
1
|Ct
j
|
xl∈Ct
j
xl
5: end for Convergence
6: return Partition
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
14/37
A simple example: k-means
1: Initialization: random initialization of a partition of (xi)i and
¯xC1
j
= 1
|C1
j
| xi∈C1
j
φ(xi)
2: for t = 1 to T do
3: Affectation step ∀ i = 1, ..., n
ft+1
(xi) = argmin
j=1,...,P
d(xi, ¯xCt
j
)
4: Representation step
∀ j = 1, . . . , P, ¯xCt
j
=
1
|Ct
j
|
xl∈Ct
j
xl
5: end for Convergence
6: return Partition
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
14/37
A simple example: k-means
1: Initialization: random initialization of a partition of (xi)i and
¯xC1
j
= 1
|C1
j
| xi∈C1
j
φ(xi)
2: for t = 1 to T do
3: Affectation step
ft+1
(xi) = argmin
j=1,...,P
φ(xi) − ¯xCt
j
2
H ,
4: Representation step
∀ j = 1, . . . , P, ¯xCt
j
=
1
|Ct
j
|
xl∈Ct
j
xl
5: end for Convergence
6: return Partition
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
14/37
A simple example: k-means
1: Initialization: random initialization of a partition of (xi)i and
¯xC1
j
= 1
|C1
j
| xi∈C1
j
φ(xi)
2: for t = 1 to T do
3: Affectation step
ft+1
(xi) = argmin
j=1,...,P
φ(xi) − ¯xCt
j
2
H ,
4: Representation step
∀ j = 1, . . . , P, ¯xCt
j
=
1
|Ct
j
|
xl∈Ct
j
φ(xl)
5: end for Convergence
6: return Partition
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
14/37
A simple example: k-means
1: Initialization: random initialization of a partition of (xi)i and
¯xC1
j
= 1
|C1
j
| xi∈C1
j
φ(xi)
2: for t = 1 to T do
3: Affectation step
ft+1
(xi) = argmin
j=1,...,P
= Kii −
2
|Ct
j
|
xl∈Ct
j
Kil +
1
|Ct
j
|2
xl, xl ∈Ct
j
Kll .
4: Representation step
∀ j = 1, . . . , P, ¯xCt
j
=
1
|Ct
j
|
xl∈Ct
j
φ(xl)
5: end for Convergence
6: return Partition
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
14/37
Beyond kernels: relational data
DNA barcoding
Astraptes fulgerator
optimal matching
(edit) distances to
differentiate species
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
15/37
Beyond kernels: relational data
DNA barcoding
Astraptes fulgerator
optimal matching
(edit) distances to
differentiate species
Hi-C data
pairwise measure (similarity) related to
the physical 3D distance between loci in
the cell, at genome scale
[Ambroise et al., 2019,
Randriamihamison et al., 2019]
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
15/37
Beyond kernels: relational data
DNA barcoding
Astraptes fulgerator
optimal matching
(edit) distances to
differentiate species
Hi-C data
pairwise measure (similarity) related to
the physical 3D distance between loci in
the cell, at genome scale
[Ambroise et al., 2019,
Randriamihamison et al., 2019]
Metagenomics
dissemblance between
samples is better
captured when
phylogeny between
species is taken into
account (unifrac
distances)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
15/37
Combining relational data in an
unsupervised setting
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
16/37
What are metagenomic data?
Source: [Sommer et al., 2010]
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
17/37
What are metagenomic data?
Source: [Sommer et al., 2010]
abundance data sparse
n × p-matrices with count data
of samples in rows and
descriptors (species, OTUs,
KEGG groups, k-mer, ...) in
columns. Generally p n.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
17/37
What are metagenomic data?
Source: [Sommer et al., 2010]
abundance data sparse
n × p-matrices with count data
of samples in rows and
descriptors (species, OTUs,
KEGG groups, k-mer, ...) in
columns. Generally p n.
phylogenetic tree (evolution
history between species,
OTUs...). One tree with p leaves
built from the sequences
collected in the n samples.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
17/37
What are metagenomic data used for?
produce a profile of the diversity of a given sample ⇒ allows to
compare diversity between various conditions
used in various fields: environmental science, microbiote, ...
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
18/37
What are metagenomic data used for?
produce a profile of the diversity of a given sample ⇒ allows to
compare diversity between various conditions
used in various fields: environmental science, microbiote, ...
Processed by computing a relevant dissimilarity between samples
(standard Euclidean distance is not relevant) and by using this dissimilarity
in subsequent analyses.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
18/37
β-diversity data: dissimilarities between count data
Compositional dissimilarities: (nig) count of species g for sample i
Jaccard: the fraction of species specific of either sample i or j:
djac =
g I{nig>0,njg=0} + I{njg>0,nig=0}
j I{nig+njg>0}
Bray-Curtis: the fraction of the sample which is specific of either
sample i or j
dBC =
g |nig − njg|
g(nig + njg)
Other dissimilarities available in the R package philoseq, most of them
not Euclidean.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
19/37
β-diversity data: phylogenetic dissimilarities
Phylogenetic dissimilarities
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
20/37
β-diversity data: phylogenetic dissimilarities
Phylogenetic dissimilarities
For each branch e, note le its length and pei
the fraction of counts in sample i
corresponding to species below branch e.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
20/37
β-diversity data: phylogenetic dissimilarities
Phylogenetic dissimilarities
For each branch e, note le its length and pei
the fraction of counts in sample i
corresponding to species below branch e.
Unifrac: the fraction of the tree specific to
either sample i or sample j.
dUF =
e le(I{pei>0,pej=0} + I{pej>0,pei=0})
e leI{pei+pej>0}
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
20/37
β-diversity data: phylogenetic dissimilarities
Phylogenetic dissimilarities
For each branch e, note le its length and pei
the fraction of counts in sample i
corresponding to species below branch e.
Unifrac: the fraction of the tree specific to
either sample i or sample j.
dUF =
e le(I{pei>0,pej=0} + I{pej>0,pei=0})
e leI{pei+pej>0}
Weighted Unifrac: the fraction of the
diversity specific to sample i or to sample j.
dwUF =
e le|pei − pej|
e(pei + pej)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
20/37
TARA Oceans datasets
The 2009-2013 expedition
Co-directed by Étienne Bourgois
and Éric Karsenti.
7,012 datasets collected from
35,000 samples of plankton and
water (11,535 Gb of data).
Study the plankton: bacteria,
protists, metazoans and viruses
representing more than 90% of the
biomass in the ocean.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
21/37
TARA Oceans datasets
Science (May 2015) - Studies on:
eukaryotic plankton diversity
[de Vargas et al., 2015],
ocean viral communities
[Brum et al., 2015],
global plankton interactome
[Lima-Mendez et al., 2015],
global ocean microbiome
[Sunagawa et al., 2015],
. . . .
→ datasets from different types and
different sources analyzed separately.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
22/37
TARA Oceans datasets that we used
[Sunagawa et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
23/37
TARA Oceans datasets that we used
[Sunagawa et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
23/37
TARA Oceans datasets that we used
[Sunagawa et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
bacteria functional composition: ∼ 63,000 KEGG orthologous groups.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
23/37
TARA Oceans datasets that we used
[de Vargas et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
bacteria functional composition: ∼ 63,000 KEGG orthologous groups.
eukaryotic plankton composition splited into 4 groups pico (0.8 − 5µm),
nano (5 − 20µm), micro (20 − 180µm) and meso (180 − 2000µm).
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
23/37
TARA Oceans datasets that we used
[Brum et al., 2015]
Datasets used
environmental dataset: 22 numeric features (temperature, salinity, . . . ).
bacteria phylogenomic tree: computed from ∼ 35,000 OTUs.
bacteria functional composition: ∼ 63,000 KEGG orthologous groups.
eukaryotic plankton composition splited into 4 groups pico (0.8 − 5µm),
nano (5 − 20µm), micro (20 − 180µm) and meso (180 − 2000µm).
virus composition: ∼ 867 virus clusters based on shared gene content.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
23/37
TARA Oceans datasets that we used
Common samples
48 samples,
2 depth layers: surface
(SRF) and deep chlorophyll
maximum (DCM),
31 different sampling
stations.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
24/37
From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
25/37
From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
supervised framework: K∗ = m βmKm
with βm ≥ 0 and m βm = 1
with βm chosen so as to minimize the prediction error
[Gönen and Alpaydin, 2011]
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
25/37
From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
supervised framework: K∗ = m βmKm
with βm ≥ 0 and m βm = 1
with βm chosen so as to minimize the prediction error
[Gönen and Alpaydin, 2011]
unsupervised framework but input space is Rp
[Zhuang et al., 2011]
K∗ = m βmKm
with βm ≥ 0 and m βm = 1 with βm chosen so as to
minimize the distortion between all training data ij K∗
(xi, xj) xi − xj
2
;
AND minimize the approximation of the original data by the kernel
embedding i xi − j K∗
(xi, xj)xj
2
.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
25/37
From multiple kernels to an integrated kernel
How to combine multiple kernels?
naive approach: K∗ = 1
M m Km
supervised framework: K∗ = m βmKm
with βm ≥ 0 and m βm = 1
with βm chosen so as to minimize the prediction error
[Gönen and Alpaydin, 2011]
unsupervised framework but input space is Rp
[Zhuang et al., 2011]
K∗ = m βmKm
with βm ≥ 0 and m βm = 1 with βm chosen so as to
minimize the distortion between all training data ij K∗
(xi, xj) xi − xj
2
;
AND minimize the approximation of the original data by the kernel
embedding i xi − j K∗
(xi, xj)xj
2
.
Our proposal: 2 UMKL frameworks which do not require data to have
values in Rd
.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
25/37
Multi-kernel/distances integration
How to “optimally” combine several
relational datasets in an unsupervised
setting?
for kernels K1
, . . . , KM
obtained on the
same n objects, search: Kβ = M
m=1 βmKm
with βm ≥ 0 and m βm = 1
[Mariette and Villa-Vialaneix, 2018]
Package R mixKernel
https://cran.r-project.org/
package=mixKernel
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
26/37
STATIS like framework
[L’Hermier des Plantes, 1976, Lavit et al., 1994]
Similarities between kernels:
Cmm =
Km
, Km
F
Km
F Km
F
=
Trace(Km
Km
)
Trace((Km)2)Trace((Km )2)
.
(Cmm is an extension of the RV-coefficient [Robert and Escoufier, 1976] to the
kernel framework)
maximizev
M
m=1
K∗
(v),
Km
Km
F F
= v Cv
for K∗
(v) =
M
m=1
vmKm
and v ∈ RM
such that v 2 = 1.
Solution: first eigenvector of C ⇒ Set β = v
M
m=1 vm
(consensual kernel).
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
27/37
A kernel preserving the original topology of the data I
Similarly to [Lin et al., 2010], preserve the local geometry of the data in the
feature space.
Proxy of the local geometry
Km
−→ Gm
k
k−nearest neighbors graph
−→ Am
k
adjacency matrix
⇒ W = m I{Am
k
>0} or W = m Am
k
Feature space geometry measured by
∆i(β) = φ∗
β(xi),


φ∗
β(x1)
...
φ∗
β(xn)


=


K∗
β(xi, x1)
...
K∗
β(xi, xn)


Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
28/37
A kernel preserving the original topology of the data II
Sparse version (quadprog in R)
minimizeβ
N
i,j=1
Wij ∆i(β) − ∆j(β)
2
for K∗
β =
M
m=1
βmKm
and β ∈ RM
st βm ≥ 0 and
M
m=1
βm = 1.
Non sparse version (ADMM optimization [Boyd et al., 2011]
minimizev
N
i,j=1
Wij ∆i(β) − ∆j(β)
2
for K∗
v =
M
m=1
vmKm
and v ∈ RM
st vm ≥ 0 and v 2 = 1.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
29/37
Application to TARA oceans
Similarity between datasets (STATIS)
Low similarities between meso-plankton (euk.meso) and other
datasets: strong geographical structure of mesoplanktonic
communities [de Vargas et al., 2015].
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
30/37
Application to TARA oceans
Similarity between datasets (STATIS)
Low similarities between meso-plankton (euk.meso) and other
datasets: strong geographical structure of mesoplanktonic
communities [de Vargas et al., 2015].
Strongest similarities between environmental variables and small
organisms than largest ones [de Vargas et al., 2015, Sunagawa et al., 2015].
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
30/37
Integrating all Tara Oceans data sets
no particular pattern in terms of depth layers but in terms of
geography.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
31/37
Application to TARA oceans
Important variables
Rhizaria abundance strongly structure the differences between samples (analyses
restricted to some organisms found differences mostly based on water depths)
and waters from Arctic Oceans and Pacific Oceans differ in terms of Rhizaria
abundance
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
32/37
Conclusions
Kernel methods are useful for:
dealing with different types of data
even when they are high-dimensional
combining them
However, they can be:
computationally intensive to train
not easy to interpret (work-in-progress with Jérôme Mariette and
Céline Brouard on variable selection in unsupervised setting)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
33/37
SOMbrero
Madalina Olteanu,
Fabrice Rossi, Marie Cottrell,
Laura Bendhaïba and
Julien Boelaert
SOMbrero and mixKernel
Jérôme Mariette
adjclust and Hi-C
Pierre Neuvial, Nathanaël Randriamihamison,
Sylvain Foissac, Guillem Rigail, Christophe Ambroise and
Shubham Chaturvedi
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
34/37
Credits for pictures
Slide 3: image based on ENCODE project, by Darryl Leja (NHGRI), Ian Dunham
(EBI) and Michael Pazin (NHGRI)
Slide 8: k-means image from Wikimedia Commons by Weston.pace
Slide 10: Astraptes picture is from
https://www.flickr.com/photos/39139121@N00/2045403823/ by Anne Toal
(CC BY-SA 2.0), Hi-C experiment is taken from the article Matharu et al., 2015
DOI:10.1371/journal.pgen.1005640 (CC BY-SA 4.0) and metagenomics illustration is
taken from the article Sommer et al., 2010 DOI:10.1038/msb.2010.16 (CC BY-NC-SA
3.0)
Other pictures are from articles that I co-authored.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
35/37
References
Ambroise, C., Dehman, A., Neuvial, P., Rigaill, G., and Vialaneix, N. (2019).
Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.
Algorithms for Molecular Biology, 14:22.
Bach, F. (2013).
Sharp analysis of low-rank kernel matrix approximations.
Journal of Machine Learning Research, Workshop and Conference Proceedings, 30:185–209.
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011).
Distributed optimization and statistical learning via the alterning direction method of multipliers.
Foundations and Trends in Machine Learning, 3(1):1–122.
Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016).
Fast metabolite identification with input output kernel regression.
Bioinformatics, 32(12):i28–i36.
Brum, J., Ignacio-Espinoza, J., Roux, S., Doulcier, G., Acinas, S., Alberti, A., Chaffron, S., Cruaud, C., de Vargas, C., Gasol, J.,
Gorsky, G., Gregory, A., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B., Schwenck, S., Speich, S.,
Dimier, C., Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker,
P., Karsenti, E., and Sullivan, M. (2015).
Patterns and ecological drivers of ocean viral communities.
Science, 348(6237).
Cortes, C., Mohri, M., and Talwalkar, A. (2010).
On the impact of kernel approximation on learning accuracy.
Journal of Machine Learning Research, Workshop and Conference Proceedings, 9:113–120.
Crone, L. and Crosby, D. (1995).
Statistical applications of a metric on subspaces to satellite meteorology.
Technometrics, 37(3):324–328.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
35/37
de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, P., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I.,
Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O.,
Guidi, L., Horák, A., Jaillon, O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F.,
Zingone, A., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Acinas, S., Bork, P., Bowler, C.,
Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M. E., Speich, S.,
Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., and Karsenti, E. (2015).
Eukaryotic plankton diversity in the sunlit ocean.
Science, 348(6237).
Drineas, P. and Mahoney, M. (2005).
On the Nyström method for approximating a Gram matrix for improved kernel-based learning.
Journal of Machine Learning Research, 6:2153–2175.
Goldfarb, L. (1984).
A unified approach to pattern recognition.
Pattern Recognition, 17(5):575–582.
Gönen, M. and Alpaydin, E. (2011).
Multiple kernel learning algorithms.
Journal of Machine Learning Research, 12:2211–2268.
Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G., Gourraud, P., Viguerie, N., and Villa-Vialaneix, N. (2018).
Multiple hot-deck imputation for network inference from RNA sequencing data.
Bioinformatics, 34(10):1726–1732.
Jaakkola, T., Diekhans, M., and Haussler, D. (2000).
A discriminative framework for detecting remote protein homologies.
Journal of Computational Biology, 7(1-2):95–114.
Kohonen, T. (2001).
Self-Organizing Maps, 3rd Edition, volume 30.
Springer, Berlin, Heidelberg, New York.
Kondor, R. and Lafferty, J. (2002).
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
35/37
Diffusion kernels on graphs and other discrete structures.
In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages
315–322, Sydney, Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).
The ACT (STATIS method).
Computational Statistics and Data Analysis, 18(1):97–119.
L’Hermier des Plantes, H. (1976).
Structuration des tableaux à trois indices de la statistique.
PhD thesis, Université de Montpellier.
Thèse de troisième cycle.
Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J., Roux, S., Vincent, F.,
Bittner, L., Darzi, Y., Wang, B., Audic, S., Berline, L., Bontempi, G., Cabello, A., Coppola, L., Cornejo-Castillo, F., d’Oviedo, F.,
de Meester, L., Ferrera, I., Garet-Delmas, M., Guidi, L., Lara, E., Pesant, S., Royo-Llonch, M., Salazar, F., Sánchez, P.,
Sebastian, M., Souffreau, C., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Gorsky, G.,
Not, F., Ogata, H., Speich, S., Stemmann, L., Weissenbach, J., Wincker, P., Acinas, S., Sunagawa, S., Bork, P., Sullivan, M.,
Karsenti, E., Bowler, C., de Vargas, C., and Raes, J. (2015).
Determinants of community structure in the global plankton interactome.
Science, 348(6237).
Lin, Y., Liu, T., and CS., F. (2010).
Multiple kernel learning for dimensionality reduction.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 33:1147–1160.
Mariette, J., Olteanu, M., and Villa-Vialaneix, N. (2017a).
Efficient interpretable variants of online SOM for large dissimilarity data.
Neurocomputing, 225:31–48.
Mariette, J., Rossi, F., Olteanu, M., and Villa-Vialaneix, N. (2017b).
Accelerating stochastic kernel som.
In Verleysen, M., editor, XXVth European Symposium on Artificial Neural Networks, Computational Intelligence and Machine
Learning (ESANN 2017), pages 269–274, Bruges, Belgium. i6doc.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
35/37
Mariette, J. and Vialaneix, N. (2019).
Approches à noyau pour l’analyse et l’intégration de données omiques en biologie des systèmes.
Forthcoming (book chapter).
Mariette, J. and Villa-Vialaneix, N. (2018).
Unsupervised multiple kernel learning for heterogeneous data integration.
Bioinformatics, 34(6):1009–1015.
Marti-Marimon, M., Vialaneix, N., Voillet, V., Yerle-Bouissou, M., Lahbib-Mansais, Y., and Liaubet, L. (2018).
A new approach of gene co-expression network inference reveals significant biological processes involved in porcine muscle
development in late gestation.
Scientific Report, 8:10150.
Montastier, E., Villa-Vialaneix, N., Caspar-Bauguil, S., Hlavaty, P., Tvrzicka, E., Gonzalez, I., Saris, W., Langin, D., Kunesova, M.,
and Viguerie, N. (2015).
System model network for adipose tissue signatures related to weight changes in response to calorie restriction and subsequent
weight maintenance.
PLoS Computational Biology, 11(1):e1004047.
Olteanu, M. and Villa-Vialaneix, N. (2015).
On-line relational and multiple relational SOM.
Neurocomputing, 147:15–30.
Randriamihamison, N., Vialaneix, N., and Neuvial, P. (2019).
Applicability and interpretability of hierarchical agglomerative clustering with or without contiguity constraints.
Submitted for publication. Preprint arXiv 1909.10923.
Robert, P. and Escoufier, Y. (1976).
A unifying tool for linear multivariate statistical methods: the rv-coefficient.
Applied Statistics, 25(3):257–265.
Rossi, F., Hasenfuss, A., and Hammer, B. (2007).
Accelerating relational clustering algorithms with sparse prototype representation.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
35/37
In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07), Bielefield, Germany. Neuroinformatics Group,
Bielefield University.
Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004).
Protein homology detection using string alignment kernels.
Bioinformatics, 20(11):1682–1689.
Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014).
Metabolite identification through multiple kernel learning on fragmentation trees.
Bioinformatics, 30(12):i157–i64.
Sommer, M., Church, G., and Dantas, G. (2010).
A functional metagenomic approach for expanding the synthetic biology toolbox for biomass conversion.
Molecular Systems Biology, 6(360).
Sunagawa, S., Coelho, L., Chaffron, S., Kultima, J., Labadie, K., Salazar, F., Djahanschiri, B., Zeller, G., Mende, D., Alberti, A.,
Cornejo-Castillo, F., Costea, P., Cruaud, C., d’Oviedo, F., Engelen, S., Ferrera, I., Gasol, J., Guidi, L., Hildebrand, F., Kokoszka,
F., Lepoivre, C., Lima-Mendez, G., Poulain, J., Poulos, B., Royo-Llonch, M., Sarmento, H., Vieira-Silva, S., Dimier, C., Picheral,
M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P.,
Iudicone, D., Jaillon, O., Not, F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M., Weissenbach, J., Wincker, P.,
Karsenti, E., Raes, J., Acinas, S., and Bork, P. (2015).
Structure and function of the global ocean microbiome.
Science, 348(6237).
Villa, N. and Rossi, F. (2007).
A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph.
In 6th International Workshop on Self-Organizing Maps (WSOM 2007), Bielefield, Germany. Neuroinformatics Group, Bielefield
University.
Williams, C. and Seeger, M. (2000).
Using the Nyström method to speed up kernel machines.
In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in Neural Information Processing Systems (Proceedings of NIPS
2000), volume 13, Denver, CO, USA. Neural Information Processing Systems Foundation.
Zhuang, J., Wang, J., Hoi, S., and Lan, X. (2011).
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
35/37
Unsupervised multiple kernel clustering.
Journal of Machine Learning Research: Workshop and Conference Proceedings, 20:129–144.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
36/37
Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
36/37
Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Non sparse version writes minβ βT
Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC
problem (hard to solve).
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
36/37
Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Non sparse version writes minβ βT
Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC
problem (hard to solve).
Solved using Alternating Direction Method of Multipliers (ADMM
[Boyd et al., 2011]) by replacing the previous optimization problem
with
min
x,z
x Sx + 1{x≥0}(x) + 1{ z 2
2
≥1}(z)
with the constraint x − z = 0.
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
36/37
Optimization issues
Sparse version writes minβ βT
Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒
standard QP problem with linear constrains (ex: package quadprog
in R).
Non sparse version writes minβ βT
Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC
problem (hard to solve).
Solved using Alternating Direction Method of Multipliers (ADMM
[Boyd et al., 2011])
1 minx x Sx + y (x − z) + λ
2
x − z 2
under the constraint x ≥ 0
(standard QP problem)
2 project on the unit ball z = x
min{ x 2,1}
3 update auxiliary variable y = y + λ(x − z)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
36/37
A proposal to improve interpretability of K-PCA in our
framework
Issue: How to assess the importance of a given species in the K-PCA?
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
37/37
A proposal to improve interpretability of K-PCA in our
framework
Issue: How to assess the importance of a given species in the K-PCA?
our datasets are either numeric (environmental) or are built from a
n × p count matrix
⇒ for a given species, randomly permute counts and re-do the
analysis (kernel computation - with the same optimized weights - and
K-PCA)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
37/37
A proposal to improve interpretability of K-PCA in our
framework
Issue: How to assess the importance of a given species in the K-PCA?
our datasets are either numeric (environmental) or are built from a
n × p count matrix
⇒ for a given species, randomly permute counts and re-do the
analysis (kernel computation - with the same optimized weights - and
K-PCA)
the influence of a given species in a given dataset on a given PC
subspace is accessed by computing the Crone-Crosby distance
between these two PCA subspaces [Crone and Crosby, 1995] (∼
Frobenius norm between the projectors)
Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology
37/37

More Related Content

What's hot

Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
tuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
tuxette
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
tuxette
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
tuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
tuxette
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
tuxette
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
tuxette
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learningbutest
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyMichiel Stock
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
eSAT Journals
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
Ha Phuong
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
tuxette
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
The Statistical and Applied Mathematical Sciences Institute
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
wl820609
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
The Statistical and Applied Mathematical Sciences Institute
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
Tarat Diloksawatdikul
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
Ha Phuong
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
Ha Phuong
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
Tianlu Wang
 

What's hot (20)

Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational Biology
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
 

Similar to Kernel methods for data integration in systems biology

Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Umberto Picchini
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
tuxette
 
MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...
MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...
MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...
cscpconf
 
MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...
MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...
MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...
csandit
 
Haoying1999
Haoying1999Haoying1999
Haoying1999
Alieska Waye
 
Basen Network
Basen NetworkBasen Network
Basen Network
guestf7d226
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big Data
Facultad de Informática UCM
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Composite repetition-aware data structures
Composite repetition-aware data structuresComposite repetition-aware data structures
Composite repetition-aware data structures
Fabio Cunial
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
ijtsrd
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
eSAT Journals
 
Credal Fusion of Classifications for Noisy and Uncertain Data
Credal Fusion of Classifications for Noisy and Uncertain DataCredal Fusion of Classifications for Noisy and Uncertain Data
Credal Fusion of Classifications for Noisy and Uncertain Data
IJECEIAES
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
Xin-She Yang
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
ShwetaPatil174
 
Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction
Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction
Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction
IJERA Editor
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
IJERA Editor
 

Similar to Kernel methods for data integration in systems biology (20)

Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...
MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...
MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...
 
MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...
MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...
MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...
 
Haoying1999
Haoying1999Haoying1999
Haoying1999
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big Data
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Composite repetition-aware data structures
Composite repetition-aware data structuresComposite repetition-aware data structures
Composite repetition-aware data structures
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
Credal Fusion of Classifications for Noisy and Uncertain Data
Credal Fusion of Classifications for Noisy and Uncertain DataCredal Fusion of Classifications for Noisy and Uncertain Data
Credal Fusion of Classifications for Noisy and Uncertain Data
 
08 entropie
08 entropie08 entropie
08 entropie
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction
Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction
Dynamic Evolving Neuro-Fuzzy Inference System for Mortality Prediction
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
tuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
tuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
tuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
tuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
tuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
tuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
tuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
tuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
tuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
tuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
tuxette
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
tuxette
 
La famille *down
La famille *downLa famille *down
La famille *down
tuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
La famille *down
La famille *downLa famille *down
La famille *down
 

Recently uploaded

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 

Recently uploaded (20)

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 

Kernel methods for data integration in systems biology

  • 1. Kernel methods for data integration in systems biology Nathalie Vialaneix nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu Séminaire CBI February 17, 2020 – Toulouse Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 1/37
  • 2. A short bio trained as a mathematician, statistician application: research applied to human health (obesity) and animal genomics data: mostly transcriptome but also Hi-C and metabolome and (to a lesser extent) scRNAseq, metagenomics, ATACseq, ... methods: networks (inference, mining), omics data integration, machine learning (including random forest, SVM and neural networks) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 2/37
  • 3. Examples of past works inferring and understanding the relations between gene expression, lipids and phenotypes (weight, waist circumference, ...) in adipose tissu (Diogenes) ⇒ network inference and mining, data integration, missing data, ... [Montastier et al., 2015, Imbert et al., 2018] and R package RNAseqNet Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 3/37
  • 4. Examples of past works integrating expression and location (3D DNA FISH) for network inference in fetal pig tissus [Marti-Marimon et al., 2018] Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 4/37
  • 5. Other activities including training for biologists in RNAseq data analysis, basic statistics, graphics with R... organizer of the working group “Biopuces” http://www.nathalievialaneix.eu/biopuces and active member of “Chrocogen” https://groupes.renater.fr/sympa/info/chrocogen Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 5/37
  • 6. In this talk... How to integrate multiple omics data from various sources and various types with kernels? Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 6/37
  • 7. In this talk... How to integrate multiple omics data from various sources and various types with kernels? Disclaimer: equations included (not necessary to understand the talk but necessary for the speaker to understand her own work during the talk) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 6/37
  • 8. A primer on kernel methods for biology Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 7/37
  • 9. Before we start: context and motivations Data characteristics a few (paired) samples information at various levels ... but of heterogeneous types and, when numeric, with a large dimension Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 8/37
  • 10. Before we start: context and motivations Data characteristics a few (paired) samples information at various levels ... but of heterogeneous types and, when numeric, with a large dimension What we want to achieve integrative analysis to predict a phenotype, to understand the typology of the samples, ... Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 8/37
  • 11. In short: what are kernels? Data we are used to... n samples on which p variables are measured (xi)i=1,...,n with xi ∈ Rp Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
  • 12. In short: what are kernels? Data we are used to... n samples on which p variables are measured (xi)i=1,...,n with xi ∈ Rp From that, we can compute: centers of gravity: x = 1 n n i=1 xi distances and dot products: d(xi, xi ) = p j=1 (xij − xi j)2 and xi, xi = p j=1 xijxi j Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
  • 13. In short: what are kernels? Data we are used to... n samples on which p variables are measured (xi)i=1,...,n with xi ∈ Rp From that, we can compute: centers of gravity: x = 1 n n i=1 xi distances and dot products: d(xi, xi ) = p j=1 (xij − xi j)2 and xi, xi = p j=1 xijxi j Kernels... The characteristics on the n samples (xi)i are summarized by pairwise similarities More formally: n × n-matrix K, st K is symmetric and positive definite Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
  • 14. In short: what are kernels? Data we are used to... n samples on which p variables are measured (xi)i=1,...,n with xi ∈ Rp From that, we can compute: centers of gravity: x = 1 n n i=1 xi distances and dot products: d(xi, xi ) = p j=1 (xij − xi j)2 and xi, xi = p j=1 xijxi j Kernels... The characteristics on the n samples (xi)i are summarized by pairwise similarities More formally: n × n-matrix K, st K is symmetric and positive definite Representer Theorem: Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 9/37
  • 15. Why are kernels interesting? 1 because they can reduce high dimensional data in small similarity matrices 2 because they are not restricted to data in Rp (kernels on graphs, between graphs, on text, ...) some examples to come 3 because they can embed expert knowledge (i.e., phylogeny between taxons for instance) some examples to come 4 because they offer a rigorous framework to extend many statistical methods basic principles to come just after 5 because they offer a clean and common framework for data integration topic of this talk Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 10/37
  • 16. Why are kernels interesting? 1 because they can reduce high dimensional data in small similarity matrices 2 because they are not restricted to data in Rp (kernels on graphs, between graphs, on text, ...) some examples to come 3 because they can embed expert knowledge (i.e., phylogeny between taxons for instance) some examples to come 4 because they offer a rigorous framework to extend many statistical methods basic principles to come just after 5 because they offer a clean and common framework for data integration topic of this talk but: 1 the choice of the relevant kernel is still up to you... 2 can strongly increase computational time when n is large... Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 10/37
  • 17. Kernel examples 1 Rp observations: Gaussian kernel Kii = e−γ xi−xi 2 Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 11/37
  • 18. Kernel examples 1 Rp observations: Gaussian kernel Kii = e−γ xi−xi 2 2 nodes of a graph: [Kondor and Lafferty, 2002] Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 11/37
  • 19. Kernel examples 1 Rp observations: Gaussian kernel Kii = e−γ xi−xi 2 2 nodes of a graph: [Kondor and Lafferty, 2002] 3 sequence kernels (used to compute similarities between proteins for instance): spectrum kernel [Jaakkola et al., 2000] (with HMM), convolution kernel [Saigo et al., 2004] 4 kernel between graphs (or “structured data”; used in metabolomics to compute similarities between metabolites based on their fragmentation trees): [Shen et al., 2014, Brouard et al., 2016] More examples: [Mariette and Vialaneix, 2019] Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 11/37
  • 20. Principles for learning from kernels Start from any statistical method (PCA, regression, k-means clustering) and rewrite all quantities using: K to compute distances and dot products dot product is: Kii and distance is: √ Kii + Ki i − 2Kii (implicit) linear or convex combinations of (φ(xi))i to describe all unobserved elements (centers of gravity and so on...) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 12/37
  • 21. A simple example: k-means Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 13/37
  • 22. A simple example: k-means 1: Initialization: random initialization of P centers ¯xCt j ∈ Rp 2: for t = 1 to T do 3: Affectation step ∀ i = 1, ..., n ft+1 (xi) = argmin j=1,...,P d(xi, ¯xCt j ) 4: Representation step ∀ j = 1, . . . , P, ¯xCt j = 1 |Ct j | xl∈Ct j xl 5: end for Convergence 6: return Partition Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 14/37
  • 23. A simple example: k-means 1: Initialization: random initialization of a partition of (xi)i and ¯xC1 j = 1 |C1 j | xi∈C1 j φ(xi) 2: for t = 1 to T do 3: Affectation step ∀ i = 1, ..., n ft+1 (xi) = argmin j=1,...,P d(xi, ¯xCt j ) 4: Representation step ∀ j = 1, . . . , P, ¯xCt j = 1 |Ct j | xl∈Ct j xl 5: end for Convergence 6: return Partition Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 14/37
  • 24. A simple example: k-means 1: Initialization: random initialization of a partition of (xi)i and ¯xC1 j = 1 |C1 j | xi∈C1 j φ(xi) 2: for t = 1 to T do 3: Affectation step ft+1 (xi) = argmin j=1,...,P φ(xi) − ¯xCt j 2 H , 4: Representation step ∀ j = 1, . . . , P, ¯xCt j = 1 |Ct j | xl∈Ct j xl 5: end for Convergence 6: return Partition Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 14/37
  • 25. A simple example: k-means 1: Initialization: random initialization of a partition of (xi)i and ¯xC1 j = 1 |C1 j | xi∈C1 j φ(xi) 2: for t = 1 to T do 3: Affectation step ft+1 (xi) = argmin j=1,...,P φ(xi) − ¯xCt j 2 H , 4: Representation step ∀ j = 1, . . . , P, ¯xCt j = 1 |Ct j | xl∈Ct j φ(xl) 5: end for Convergence 6: return Partition Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 14/37
  • 26. A simple example: k-means 1: Initialization: random initialization of a partition of (xi)i and ¯xC1 j = 1 |C1 j | xi∈C1 j φ(xi) 2: for t = 1 to T do 3: Affectation step ft+1 (xi) = argmin j=1,...,P = Kii − 2 |Ct j | xl∈Ct j Kil + 1 |Ct j |2 xl, xl ∈Ct j Kll . 4: Representation step ∀ j = 1, . . . , P, ¯xCt j = 1 |Ct j | xl∈Ct j φ(xl) 5: end for Convergence 6: return Partition Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 14/37
  • 27. Beyond kernels: relational data DNA barcoding Astraptes fulgerator optimal matching (edit) distances to differentiate species Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 15/37
  • 28. Beyond kernels: relational data DNA barcoding Astraptes fulgerator optimal matching (edit) distances to differentiate species Hi-C data pairwise measure (similarity) related to the physical 3D distance between loci in the cell, at genome scale [Ambroise et al., 2019, Randriamihamison et al., 2019] Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 15/37
  • 29. Beyond kernels: relational data DNA barcoding Astraptes fulgerator optimal matching (edit) distances to differentiate species Hi-C data pairwise measure (similarity) related to the physical 3D distance between loci in the cell, at genome scale [Ambroise et al., 2019, Randriamihamison et al., 2019] Metagenomics dissemblance between samples is better captured when phylogeny between species is taken into account (unifrac distances) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 15/37
  • 30. Combining relational data in an unsupervised setting Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 16/37
  • 31. What are metagenomic data? Source: [Sommer et al., 2010] Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 17/37
  • 32. What are metagenomic data? Source: [Sommer et al., 2010] abundance data sparse n × p-matrices with count data of samples in rows and descriptors (species, OTUs, KEGG groups, k-mer, ...) in columns. Generally p n. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 17/37
  • 33. What are metagenomic data? Source: [Sommer et al., 2010] abundance data sparse n × p-matrices with count data of samples in rows and descriptors (species, OTUs, KEGG groups, k-mer, ...) in columns. Generally p n. phylogenetic tree (evolution history between species, OTUs...). One tree with p leaves built from the sequences collected in the n samples. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 17/37
  • 34. What are metagenomic data used for? produce a profile of the diversity of a given sample ⇒ allows to compare diversity between various conditions used in various fields: environmental science, microbiote, ... Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 18/37
  • 35. What are metagenomic data used for? produce a profile of the diversity of a given sample ⇒ allows to compare diversity between various conditions used in various fields: environmental science, microbiote, ... Processed by computing a relevant dissimilarity between samples (standard Euclidean distance is not relevant) and by using this dissimilarity in subsequent analyses. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 18/37
  • 36. β-diversity data: dissimilarities between count data Compositional dissimilarities: (nig) count of species g for sample i Jaccard: the fraction of species specific of either sample i or j: djac = g I{nig>0,njg=0} + I{njg>0,nig=0} j I{nig+njg>0} Bray-Curtis: the fraction of the sample which is specific of either sample i or j dBC = g |nig − njg| g(nig + njg) Other dissimilarities available in the R package philoseq, most of them not Euclidean. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 19/37
  • 37. β-diversity data: phylogenetic dissimilarities Phylogenetic dissimilarities Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 20/37
  • 38. β-diversity data: phylogenetic dissimilarities Phylogenetic dissimilarities For each branch e, note le its length and pei the fraction of counts in sample i corresponding to species below branch e. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 20/37
  • 39. β-diversity data: phylogenetic dissimilarities Phylogenetic dissimilarities For each branch e, note le its length and pei the fraction of counts in sample i corresponding to species below branch e. Unifrac: the fraction of the tree specific to either sample i or sample j. dUF = e le(I{pei>0,pej=0} + I{pej>0,pei=0}) e leI{pei+pej>0} Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 20/37
  • 40. β-diversity data: phylogenetic dissimilarities Phylogenetic dissimilarities For each branch e, note le its length and pei the fraction of counts in sample i corresponding to species below branch e. Unifrac: the fraction of the tree specific to either sample i or sample j. dUF = e le(I{pei>0,pej=0} + I{pej>0,pei=0}) e leI{pei+pej>0} Weighted Unifrac: the fraction of the diversity specific to sample i or to sample j. dwUF = e le|pei − pej| e(pei + pej) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 20/37
  • 41. TARA Oceans datasets The 2009-2013 expedition Co-directed by Étienne Bourgois and Éric Karsenti. 7,012 datasets collected from 35,000 samples of plankton and water (11,535 Gb of data). Study the plankton: bacteria, protists, metazoans and viruses representing more than 90% of the biomass in the ocean. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 21/37
  • 42. TARA Oceans datasets Science (May 2015) - Studies on: eukaryotic plankton diversity [de Vargas et al., 2015], ocean viral communities [Brum et al., 2015], global plankton interactome [Lima-Mendez et al., 2015], global ocean microbiome [Sunagawa et al., 2015], . . . . → datasets from different types and different sources analyzed separately. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 22/37
  • 43. TARA Oceans datasets that we used [Sunagawa et al., 2015] Datasets used environmental dataset: 22 numeric features (temperature, salinity, . . . ). Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 23/37
  • 44. TARA Oceans datasets that we used [Sunagawa et al., 2015] Datasets used environmental dataset: 22 numeric features (temperature, salinity, . . . ). bacteria phylogenomic tree: computed from ∼ 35,000 OTUs. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 23/37
  • 45. TARA Oceans datasets that we used [Sunagawa et al., 2015] Datasets used environmental dataset: 22 numeric features (temperature, salinity, . . . ). bacteria phylogenomic tree: computed from ∼ 35,000 OTUs. bacteria functional composition: ∼ 63,000 KEGG orthologous groups. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 23/37
  • 46. TARA Oceans datasets that we used [de Vargas et al., 2015] Datasets used environmental dataset: 22 numeric features (temperature, salinity, . . . ). bacteria phylogenomic tree: computed from ∼ 35,000 OTUs. bacteria functional composition: ∼ 63,000 KEGG orthologous groups. eukaryotic plankton composition splited into 4 groups pico (0.8 − 5µm), nano (5 − 20µm), micro (20 − 180µm) and meso (180 − 2000µm). Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 23/37
  • 47. TARA Oceans datasets that we used [Brum et al., 2015] Datasets used environmental dataset: 22 numeric features (temperature, salinity, . . . ). bacteria phylogenomic tree: computed from ∼ 35,000 OTUs. bacteria functional composition: ∼ 63,000 KEGG orthologous groups. eukaryotic plankton composition splited into 4 groups pico (0.8 − 5µm), nano (5 − 20µm), micro (20 − 180µm) and meso (180 − 2000µm). virus composition: ∼ 867 virus clusters based on shared gene content. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 23/37
  • 48. TARA Oceans datasets that we used Common samples 48 samples, 2 depth layers: surface (SRF) and deep chlorophyll maximum (DCM), 31 different sampling stations. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 24/37
  • 49. From multiple kernels to an integrated kernel How to combine multiple kernels? naive approach: K∗ = 1 M m Km Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 25/37
  • 50. From multiple kernels to an integrated kernel How to combine multiple kernels? naive approach: K∗ = 1 M m Km supervised framework: K∗ = m βmKm with βm ≥ 0 and m βm = 1 with βm chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011] Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 25/37
  • 51. From multiple kernels to an integrated kernel How to combine multiple kernels? naive approach: K∗ = 1 M m Km supervised framework: K∗ = m βmKm with βm ≥ 0 and m βm = 1 with βm chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011] unsupervised framework but input space is Rp [Zhuang et al., 2011] K∗ = m βmKm with βm ≥ 0 and m βm = 1 with βm chosen so as to minimize the distortion between all training data ij K∗ (xi, xj) xi − xj 2 ; AND minimize the approximation of the original data by the kernel embedding i xi − j K∗ (xi, xj)xj 2 . Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 25/37
  • 52. From multiple kernels to an integrated kernel How to combine multiple kernels? naive approach: K∗ = 1 M m Km supervised framework: K∗ = m βmKm with βm ≥ 0 and m βm = 1 with βm chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011] unsupervised framework but input space is Rp [Zhuang et al., 2011] K∗ = m βmKm with βm ≥ 0 and m βm = 1 with βm chosen so as to minimize the distortion between all training data ij K∗ (xi, xj) xi − xj 2 ; AND minimize the approximation of the original data by the kernel embedding i xi − j K∗ (xi, xj)xj 2 . Our proposal: 2 UMKL frameworks which do not require data to have values in Rd . Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 25/37
  • 53. Multi-kernel/distances integration How to “optimally” combine several relational datasets in an unsupervised setting? for kernels K1 , . . . , KM obtained on the same n objects, search: Kβ = M m=1 βmKm with βm ≥ 0 and m βm = 1 [Mariette and Villa-Vialaneix, 2018] Package R mixKernel https://cran.r-project.org/ package=mixKernel Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 26/37
  • 54. STATIS like framework [L’Hermier des Plantes, 1976, Lavit et al., 1994] Similarities between kernels: Cmm = Km , Km F Km F Km F = Trace(Km Km ) Trace((Km)2)Trace((Km )2) . (Cmm is an extension of the RV-coefficient [Robert and Escoufier, 1976] to the kernel framework) maximizev M m=1 K∗ (v), Km Km F F = v Cv for K∗ (v) = M m=1 vmKm and v ∈ RM such that v 2 = 1. Solution: first eigenvector of C ⇒ Set β = v M m=1 vm (consensual kernel). Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 27/37
  • 55. A kernel preserving the original topology of the data I Similarly to [Lin et al., 2010], preserve the local geometry of the data in the feature space. Proxy of the local geometry Km −→ Gm k k−nearest neighbors graph −→ Am k adjacency matrix ⇒ W = m I{Am k >0} or W = m Am k Feature space geometry measured by ∆i(β) = φ∗ β(xi),   φ∗ β(x1) ... φ∗ β(xn)   =   K∗ β(xi, x1) ... K∗ β(xi, xn)   Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 28/37
  • 56. A kernel preserving the original topology of the data II Sparse version (quadprog in R) minimizeβ N i,j=1 Wij ∆i(β) − ∆j(β) 2 for K∗ β = M m=1 βmKm and β ∈ RM st βm ≥ 0 and M m=1 βm = 1. Non sparse version (ADMM optimization [Boyd et al., 2011] minimizev N i,j=1 Wij ∆i(β) − ∆j(β) 2 for K∗ v = M m=1 vmKm and v ∈ RM st vm ≥ 0 and v 2 = 1. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 29/37
  • 57. Application to TARA oceans Similarity between datasets (STATIS) Low similarities between meso-plankton (euk.meso) and other datasets: strong geographical structure of mesoplanktonic communities [de Vargas et al., 2015]. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 30/37
  • 58. Application to TARA oceans Similarity between datasets (STATIS) Low similarities between meso-plankton (euk.meso) and other datasets: strong geographical structure of mesoplanktonic communities [de Vargas et al., 2015]. Strongest similarities between environmental variables and small organisms than largest ones [de Vargas et al., 2015, Sunagawa et al., 2015]. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 30/37
  • 59. Integrating all Tara Oceans data sets no particular pattern in terms of depth layers but in terms of geography. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 31/37
  • 60. Application to TARA oceans Important variables Rhizaria abundance strongly structure the differences between samples (analyses restricted to some organisms found differences mostly based on water depths) and waters from Arctic Oceans and Pacific Oceans differ in terms of Rhizaria abundance Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 32/37
  • 61. Conclusions Kernel methods are useful for: dealing with different types of data even when they are high-dimensional combining them However, they can be: computationally intensive to train not easy to interpret (work-in-progress with Jérôme Mariette and Céline Brouard on variable selection in unsupervised setting) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 33/37
  • 62. SOMbrero Madalina Olteanu, Fabrice Rossi, Marie Cottrell, Laura Bendhaïba and Julien Boelaert SOMbrero and mixKernel Jérôme Mariette adjclust and Hi-C Pierre Neuvial, Nathanaël Randriamihamison, Sylvain Foissac, Guillem Rigail, Christophe Ambroise and Shubham Chaturvedi Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 34/37
  • 63. Credits for pictures Slide 3: image based on ENCODE project, by Darryl Leja (NHGRI), Ian Dunham (EBI) and Michael Pazin (NHGRI) Slide 8: k-means image from Wikimedia Commons by Weston.pace Slide 10: Astraptes picture is from https://www.flickr.com/photos/39139121@N00/2045403823/ by Anne Toal (CC BY-SA 2.0), Hi-C experiment is taken from the article Matharu et al., 2015 DOI:10.1371/journal.pgen.1005640 (CC BY-SA 4.0) and metagenomics illustration is taken from the article Sommer et al., 2010 DOI:10.1038/msb.2010.16 (CC BY-NC-SA 3.0) Other pictures are from articles that I co-authored. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 35/37
  • 64. References Ambroise, C., Dehman, A., Neuvial, P., Rigaill, G., and Vialaneix, N. (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algorithms for Molecular Biology, 14:22. Bach, F. (2013). Sharp analysis of low-rank kernel matrix approximations. Journal of Machine Learning Research, Workshop and Conference Proceedings, 30:185–209. Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimization and statistical learning via the alterning direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122. Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics, 32(12):i28–i36. Brum, J., Ignacio-Espinoza, J., Roux, S., Doulcier, G., Acinas, S., Alberti, A., Chaffron, S., Cruaud, C., de Vargas, C., Gasol, J., Gorsky, G., Gregory, A., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B., Schwenck, S., Speich, S., Dimier, C., Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker, P., Karsenti, E., and Sullivan, M. (2015). Patterns and ecological drivers of ocean viral communities. Science, 348(6237). Cortes, C., Mohri, M., and Talwalkar, A. (2010). On the impact of kernel approximation on learning accuracy. Journal of Machine Learning Research, Workshop and Conference Proceedings, 9:113–120. Crone, L. and Crosby, D. (1995). Statistical applications of a metric on subspaces to satellite meteorology. Technometrics, 37(3):324–328. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 35/37
  • 65. de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, P., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horák, A., Jaillon, O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F., Zingone, A., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Acinas, S., Bork, P., Bowler, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M. E., Speich, S., Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., and Karsenti, E. (2015). Eukaryotic plankton diversity in the sunlit ocean. Science, 348(6237). Drineas, P. and Mahoney, M. (2005). On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6:2153–2175. Goldfarb, L. (1984). A unified approach to pattern recognition. Pattern Recognition, 17(5):575–582. Gönen, M. and Alpaydin, E. (2011). Multiple kernel learning algorithms. Journal of Machine Learning Research, 12:2211–2268. Imbert, A., Valsesia, A., Le Gall, C., Armenise, C., Lefebvre, G., Gourraud, P., Viguerie, N., and Villa-Vialaneix, N. (2018). Multiple hot-deck imputation for network inference from RNA sequencing data. Bioinformatics, 34(10):1726–1732. Jaakkola, T., Diekhans, M., and Haussler, D. (2000). A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, 7(1-2):95–114. Kohonen, T. (2001). Self-Organizing Maps, 3rd Edition, volume 30. Springer, Berlin, Heidelberg, New York. Kondor, R. and Lafferty, J. (2002). Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 35/37
  • 66. Diffusion kernels on graphs and other discrete structures. In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages 315–322, Sydney, Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA. Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994). The ACT (STATIS method). Computational Statistics and Data Analysis, 18(1):97–119. L’Hermier des Plantes, H. (1976). Structuration des tableaux à trois indices de la statistique. PhD thesis, Université de Montpellier. Thèse de troisième cycle. Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J., Roux, S., Vincent, F., Bittner, L., Darzi, Y., Wang, B., Audic, S., Berline, L., Bontempi, G., Cabello, A., Coppola, L., Cornejo-Castillo, F., d’Oviedo, F., de Meester, L., Ferrera, I., Garet-Delmas, M., Guidi, L., Lara, E., Pesant, S., Royo-Llonch, M., Salazar, F., Sánchez, P., Sebastian, M., Souffreau, C., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Gorsky, G., Not, F., Ogata, H., Speich, S., Stemmann, L., Weissenbach, J., Wincker, P., Acinas, S., Sunagawa, S., Bork, P., Sullivan, M., Karsenti, E., Bowler, C., de Vargas, C., and Raes, J. (2015). Determinants of community structure in the global plankton interactome. Science, 348(6237). Lin, Y., Liu, T., and CS., F. (2010). Multiple kernel learning for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33:1147–1160. Mariette, J., Olteanu, M., and Villa-Vialaneix, N. (2017a). Efficient interpretable variants of online SOM for large dissimilarity data. Neurocomputing, 225:31–48. Mariette, J., Rossi, F., Olteanu, M., and Villa-Vialaneix, N. (2017b). Accelerating stochastic kernel som. In Verleysen, M., editor, XXVth European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017), pages 269–274, Bruges, Belgium. i6doc. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 35/37
  • 67. Mariette, J. and Vialaneix, N. (2019). Approches à noyau pour l’analyse et l’intégration de données omiques en biologie des systèmes. Forthcoming (book chapter). Mariette, J. and Villa-Vialaneix, N. (2018). Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics, 34(6):1009–1015. Marti-Marimon, M., Vialaneix, N., Voillet, V., Yerle-Bouissou, M., Lahbib-Mansais, Y., and Liaubet, L. (2018). A new approach of gene co-expression network inference reveals significant biological processes involved in porcine muscle development in late gestation. Scientific Report, 8:10150. Montastier, E., Villa-Vialaneix, N., Caspar-Bauguil, S., Hlavaty, P., Tvrzicka, E., Gonzalez, I., Saris, W., Langin, D., Kunesova, M., and Viguerie, N. (2015). System model network for adipose tissue signatures related to weight changes in response to calorie restriction and subsequent weight maintenance. PLoS Computational Biology, 11(1):e1004047. Olteanu, M. and Villa-Vialaneix, N. (2015). On-line relational and multiple relational SOM. Neurocomputing, 147:15–30. Randriamihamison, N., Vialaneix, N., and Neuvial, P. (2019). Applicability and interpretability of hierarchical agglomerative clustering with or without contiguity constraints. Submitted for publication. Preprint arXiv 1909.10923. Robert, P. and Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods: the rv-coefficient. Applied Statistics, 25(3):257–265. Rossi, F., Hasenfuss, A., and Hammer, B. (2007). Accelerating relational clustering algorithms with sparse prototype representation. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 35/37
  • 68. In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07), Bielefield, Germany. Neuroinformatics Group, Bielefield University. Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004). Protein homology detection using string alignment kernels. Bioinformatics, 20(11):1682–1689. Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014). Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics, 30(12):i157–i64. Sommer, M., Church, G., and Dantas, G. (2010). A functional metagenomic approach for expanding the synthetic biology toolbox for biomass conversion. Molecular Systems Biology, 6(360). Sunagawa, S., Coelho, L., Chaffron, S., Kultima, J., Labadie, K., Salazar, F., Djahanschiri, B., Zeller, G., Mende, D., Alberti, A., Cornejo-Castillo, F., Costea, P., Cruaud, C., d’Oviedo, F., Engelen, S., Ferrera, I., Gasol, J., Guidi, L., Hildebrand, F., Kokoszka, F., Lepoivre, C., Lima-Mendez, G., Poulain, J., Poulos, B., Royo-Llonch, M., Sarmento, H., Vieira-Silva, S., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Jaillon, O., Not, F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M., Weissenbach, J., Wincker, P., Karsenti, E., Raes, J., Acinas, S., and Bork, P. (2015). Structure and function of the global ocean microbiome. Science, 348(6237). Villa, N. and Rossi, F. (2007). A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph. In 6th International Workshop on Self-Organizing Maps (WSOM 2007), Bielefield, Germany. Neuroinformatics Group, Bielefield University. Williams, C. and Seeger, M. (2000). Using the Nyström method to speed up kernel machines. In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in Neural Information Processing Systems (Proceedings of NIPS 2000), volume 13, Denver, CO, USA. Neural Information Processing Systems Foundation. Zhuang, J., Wang, J., Hoi, S., and Lan, X. (2011). Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 35/37
  • 69. Unsupervised multiple kernel clustering. Journal of Machine Learning Research: Workshop and Conference Proceedings, 20:129–144. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 36/37
  • 70. Optimization issues Sparse version writes minβ βT Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒ standard QP problem with linear constrains (ex: package quadprog in R). Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 36/37
  • 71. Optimization issues Sparse version writes minβ βT Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒ standard QP problem with linear constrains (ex: package quadprog in R). Non sparse version writes minβ βT Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC problem (hard to solve). Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 36/37
  • 72. Optimization issues Sparse version writes minβ βT Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒ standard QP problem with linear constrains (ex: package quadprog in R). Non sparse version writes minβ βT Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC problem (hard to solve). Solved using Alternating Direction Method of Multipliers (ADMM [Boyd et al., 2011]) by replacing the previous optimization problem with min x,z x Sx + 1{x≥0}(x) + 1{ z 2 2 ≥1}(z) with the constraint x − z = 0. Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 36/37
  • 73. Optimization issues Sparse version writes minβ βT Sβ st β ≥ 0 and β 1 = m βm = 1 ⇒ standard QP problem with linear constrains (ex: package quadprog in R). Non sparse version writes minβ βT Sβ st β ≥ 0 and β 2 = 1 ⇒ QPQC problem (hard to solve). Solved using Alternating Direction Method of Multipliers (ADMM [Boyd et al., 2011]) 1 minx x Sx + y (x − z) + λ 2 x − z 2 under the constraint x ≥ 0 (standard QP problem) 2 project on the unit ball z = x min{ x 2,1} 3 update auxiliary variable y = y + λ(x − z) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 36/37
  • 74. A proposal to improve interpretability of K-PCA in our framework Issue: How to assess the importance of a given species in the K-PCA? Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 37/37
  • 75. A proposal to improve interpretability of K-PCA in our framework Issue: How to assess the importance of a given species in the K-PCA? our datasets are either numeric (environmental) or are built from a n × p count matrix ⇒ for a given species, randomly permute counts and re-do the analysis (kernel computation - with the same optimized weights - and K-PCA) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 37/37
  • 76. A proposal to improve interpretability of K-PCA in our framework Issue: How to assess the importance of a given species in the K-PCA? our datasets are either numeric (environmental) or are built from a n × p count matrix ⇒ for a given species, randomly permute counts and re-do the analysis (kernel computation - with the same optimized weights - and K-PCA) the influence of a given species in a given dataset on a given PC subspace is accessed by computing the Crone-Crosby distance between these two PCA subspaces [Crone and Crosby, 1995] (∼ Frobenius norm between the projectors) Nathalie Vialaneix, MIAT, INRAE Toulouse | Kernel methods for data integration in systems biology 37/37