SlideShare a Scribd company logo
1 of 54
Download to read offline
Multi-omics data integration methods: kernel and other
machine learning approaches
Nathalie Vialaneix
nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
Consortium MIMS
December 15th, 2022
Who am I?
▶ researcher at INRAE (France)
▶ trained as a mathematician −→ statistics and machine learning for
computational biology (omics)
▶ mostly involved in projects on animal genomics (transcriptomics,
Hi-C, ATAC-seq...)
▶ kernel & network methods
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 2
Collected data at genomic level are increasingly publicly
available
the different levels are not always compatible
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 3
Collected data at genomic level are increasingly publicly
available
the different levels are not always compatible
[Foissac et al., 2019]
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 3
Analysis bottlenecks
▶ very large dimensionality and big data (both scaling and statistical issues)
n ∼ {5 − 1000}
p ∼ 10{3−5}
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 4
Analysis bottlenecks
▶ very large dimensionality and big data (both scaling and statistical issues)
▶ missing values and incomplete designs
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 4
Analysis bottlenecks
▶ very large dimensionality and big data (both scaling and statistical issues)
▶ missing values and incomplete designs
▶ highly non Gaussian data: skewed distributions, count data, zero-inflated data, ...
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 4
Analysis bottlenecks
▶ very large dimensionality and big data (both scaling and statistical issues)
▶ missing values and incomplete designs
▶ highly non Gaussian data: skewed distributions, count data, zero-inflated data, ...
▶ non Euclidean data: compositional data (metagenomics), similarity matrices
(Hi-C), spectra (metabolomics), ...
(*)
(**)
(*) image from [Dumuid et al., 2020] (**) image by courtesy of Gaëlle Lefort
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 4
Analysis bottlenecks
▶ very large dimensionality and big data (both scaling and statistical issues)
▶ missing values and incomplete designs
▶ highly non Gaussian data: skewed distributions, count data, zero-inflated data, ...
▶ non Euclidean data: compositional data (metagenomics), similarity matrices
(Hi-C), spectra (metabolomics), ...
In addition: in plant & animal sciences, less discriminative phenotypes, interest is
indirect (in breeding not directly in the individual itself), much less data with poorer
quality annotation (inter-species transfer might be useful).
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 4
Omics data integration
Type of data to integrate
vertical:
horizontal:
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 5
Omics data integration
Type of data to integrate
vertical:
horizontal:
Type of analysis to perform
supervised:
unsupervised:
Left pictures courtesy Harold Duruflé
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 5
Multiple table analyses
(CCA, MFA, PLS, STATIS, ...)
Image from https://dimensionless.in
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 6
Types of data integration methods [Ritchie et al., 2015]
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 7
Want to know them all?
https://github.com/mikelove/awesome-multi-omics
Some specificities: can account for structure in data (network), are dedicated to a
specific omic (single-cell), can account for temporal/spatial information, can include
biological knowledge (mostly GO), ...
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 8
Making methods available for biologists
http://asterics.miat.inrae.fr
Ambition:
▶ easy-to-use and interactive
▶ helps user to know which analysis to
use and how to interpret results
▶ integrates domain expertise
▶ usable online or can be installed
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 9
Scope of the rest of the talk
Unsupervised transformation based integration
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 10
Overview of the talk
Kernel methods
Integrating data with kernels
Improve interpretability
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 11
Main ideas behind kernel methods
Standard (omics) data analyses:
▶ data are (numeric) tables
▶ analyses are based on operations (distances, means, ...) in the variable space
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 12
Main ideas behind kernel methods
Kernel data analyses:
▶ data are arbitrary
▶ analyses are based on transformations of data to “similarities” between samples
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 13
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive definite (n × n)-matrix
K that measures a “similarity” between n entities in X
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 14
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive definite (n × n)-matrix
K that measures a “similarity” between n entities in X
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 14
More formally...
n samples (xi )i ∈ X
kernels: symmetric and positive definite (n × n)-matrix
K that measures a “similarity” between n entities in X
K(x, x′) = ⟨ϕ(x), ϕ(x′)⟩
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 14
Principles of learning from kernels
Start from any statistical method (PCA, regression, k-means clustering) and rewrite all
quantities using:
▶ K to compute distances and dot products
dot product is: Kii′ and distance is:
√
Kii + Ki′i′ − 2Kii′
▶ (implicit) linear or convex combinations of (ϕ(xi ))i to describe all unobserved
elements (centers of gravity and so on...)
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 15
Kernel examples
1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 16
Kernel examples
1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2
2. nodes of a graph: [Kondor and Lafferty, 2002]
3. sequence kernels (between proteins: spectrum kernel [Jaakkola et al., 2000] or
convolution kernel [Saigo et al., 2004])
4. kernel between graphs (used between metabolites based on their fragmentation
trees): [Shen et al., 2014, Brouard et al., 2016]
5. kernel embedding phylogeny information for metagenomics
[Mariette and Villa-Vialaneix, 2018]
6. ...
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 16
Overview of the talk
Kernel methods
Integrating data with kernels
Improve interpretability
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 17
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 18
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
▶ naive approach: K∗ = 1
M
P
m Km
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 18
Multiple kernel (or distance) integration
How to “optimally” combine several kernel datasets?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
▶ naive approach: K∗ = 1
M
P
m Km
▶ supervised framework: K∗ =
P
m βmKm with βm ≥ 0 and
P
m βm = 1 with βm
chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011]
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 18
Combining kernels in an unsupervised setting
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 19
Multiple kernel integration
Ideas of kernel consensus: find a kernel that performs a consensus of all kernels
[Mariette and Villa-Vialaneix, 2018] - R package mixKernel
with consensus based on:
▶ STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994]
▶ criterion that preserves local geometry
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 20
STATIS like framework
Similarities between kernels:
Cmm′ =
⟨Km, Km′
⟩F
∥Km∥F ∥Km′
∥F
=
Trace(KmKm′
)
p
Trace((Km)2)Trace((Km′
)2)
.
[Robert and Escoufier, 1976]
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 21
STATIS like framework
Similarities between kernels:
Cmm′ =
⟨Km, Km′
⟩F
∥Km∥F ∥Km′
∥F
=
Trace(KmKm′
)
p
Trace((Km)2)Trace((Km′
)2)
.
[Robert and Escoufier, 1976]
maximizeβ
M
X
m=1

K∗
(β),
Km
∥Km∥F

F
= maximizeβ β⊤
Cβ
s.t. ∥β∥2 = 1
Solution: first eigenvector of C
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 21
TARA Oceans expedition
Science (May 2015) - Studies on:
▶ eukaryotic plankton diversity
[de Vargas et al., 2015],
▶ ocean viral communities
[Brum et al., 2015],
▶ global plankton interactome
[Lima-Mendez et al., 2015],
▶ global ocean microbiome
[Sunagawa et al., 2015],
▶ . . . .
→ datasets from different types and
different sources analyzed separately.
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 22
Integrating TARA Oceans datasets
▶ For all compositional datasets, include
phylogenetic information (rather than
CLR and alike): weighted Unifrac
distance
▶ Perform PCA (could have been
clustering, linear model, . . . ) in the
feature space.
+ combine with a shuffling approach
to identify most influencing variables
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 23
Application to TARA oceans
Main facts
▶ Oceans typology related to longitude
▶ Rhizaria abundance structure the differences, especially between Arctic Oceans
and Pacific Oceans
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 24
Overview of the talk
Kernel methods
Integrating data with kernels
Improve interpretability
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 25
Selecting features in kernels
Which features are important? (for numerical features only):
▶ label each feature j with a weight wj ∈ {0, 1} (selected or not)
▶ new kernel: Kw(xi , xi′ ) = K(w · xi , w · xi′ )
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 26
Selecting features in kernels
Which features are important? (for numerical features only):
▶ label each feature j with a weight wj ∈ {0, 1} (selected or not)
▶ new kernel: Kw(xi , xi′ ) = K(w · xi , w · xi′ )
▶ find the best weights to optimize a quality criterion
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 26
Selecting features in kernels
Which features are important? (for numerical features only):
▶ label each feature j with a weight wj ∈ {0, 1} (selected or not)
▶ new kernel: Kw(xi , xi′ ) = K(w · xi , w · xi′ )
▶ find the best weights to optimize a quality criterion
How to do it?:
▶ supervised framework: learn w to compute a kernel Kw
x best to predict Y ∈ R
[Allen, 2013, Grandvalet and Canu, 2002]
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 26
Extensions
[Brouard et al., 2022] and mixKernel
▶ extended to unsupervised (exploratory) learning
argmin
w∈{0,1}p
∥Kw
x − Kx ∥2
F s.t
p
X
j=1
wj ≤ d
Continuous relaxation (non-convex and non-smooth), solved with proximal gradient
descent.
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 27
Extensions
[Brouard et al., 2022] and mixKernel
▶ extended to unsupervised (exploratory) learning
argmin
w∈{0,1}p
∥Kw
x − Kx ∥2
F s.t
p
X
j=1
wj ≤ d
▶ extended to predict arbitrary outputs (kernel outputs, including multiple output
regression) optimization problem
Continuous relaxation (non-convex and non-smooth), solved with proximal gradient
descent.
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 27
KOKFS illustration
Used with covid19 dataset from [Haug et al., 2020]:
▶ inputs: 46 government measures (0/1 encoding)
▶ outputs: R time series (with a kernel based on Fréchet distances)
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 28
KOKFS illustration
Used with covid19 dataset from [Haug et al., 2020]:
▶ inputs: 46 government measures (0/1 encoding)
▶ outputs: R time series (with a kernel based on Fréchet distances)
More important measures:
Increase In Medical Supplies And Equipment
Border Health Check
Public Transport Restriction
The Government Provides Assistance To Vulnerable Populations
Individual Movement Restrictions
Increase Availability Of Ppe
Activate Case Notification
Border Restriction
Measures To Ensure Security Of Supply
Port And Ship Restriction
Activate Or Establish Emergency Response
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 28
Future needs for data integration
▶ improve interpretability of methods (integrate more biological knowledge)
▶ generic vs specific: omics are always evolving...
▶ reduce computational needs to achieve the challenge of a more sustainable
research
▶ make them more widely used by the biological community: omics data are still
produced at a much faster rate than their use!
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 29
Thank you for your attention!
Questions?
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 30
References
(unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau:
https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae
Allen, G. I. (2013).
Automatic feature selection via weighted kernels and regularization.
Journal of Computational and Graphical Statistics, 22(2):284–299.
Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2022).
Feature selection for kernel methods in systems biology.
NAR Genomics and Bioinformatics, 4(1):lqac014.
Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016).
Fast metabolite identification with input output kernel regression.
Bioinformatics, 32(12):i28–i36.
Brum, J., Ignacio-Espinoza, J., Roux, S., Doulcier, G., Acinas, S., Alberti, A., Chaffron, S., Cruaud, C., de Vargas, C., Gasol, J., Gorsky, G.,
Gregory, A., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B., Schwenck, S., Speich, S., Dimier, C.,
Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker, P., Karsenti, E., and
Sullivan, M. (2015).
Patterns and ecological drivers of ocean viral communities.
Science, 348(6237).
de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, P., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I., Carmichael, M.,
Poulain, J., Romac, S., Colin, S., Aury, J., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horák, A., Jaillon,
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 30
O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F., Zingone, A., Dimier, C., Picheral, M.,
Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Acinas, S., Bork, P., Bowler, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone,
D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M. E., Speich, S., Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., and
Karsenti, E. (2015).
Eukaryotic plankton diversity in the sunlit ocean.
Science, 348(6237).
Dumuid, D., Pedišić, v., Palarea-Albaladejo, J., Martı́n-Fernández, J. A., Hron, K., and Olds, T. (2020).
Compositional data analysis in time-use epidemiology: what, why, how.
International Journal of Environmental Research and Public Health, 17(7):2220.
Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau,
C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S.,
Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S.,
Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019).
Multi-species annotation of transcriptome and chromatin structure in domesticated animals.
BMC Biology, 17:108.
Gönen, M. and Alpaydin, E. (2011).
Multiple kernel learning algorithms.
Journal of Machine Learning Research, 12:2211–2268.
Grandvalet, Y. and Canu, S. (2002).
Adaptive scaling for feature selection in SVMs.
In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages
569–576. MIT Press.
Haug, N., Geyrhofer, L., Londei, A., Dervic, E., Desvars-Larrive, A., Loreto, V., Pinior, B., Thurner, S., and Klimek, P. (2020).
Ranking the effectiveness of worldwide COVID-19 government interventions.
Nature Human Behaviour, 4(4):1303–1312.
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 30
Jaakkola, T., Diekhans, M., and Haussler, D. (2000).
A discriminative framework for detecting remote protein homologies.
Journal of Computational Biology, 7(1-2):95–114.
Kondor, R. and Lafferty, J. (2002).
Diffusion kernels on graphs and other discrete structures.
In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages 315–322, Sydney,
Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).
The ACT (STATIS method).
Computational Statistics and Data Analysis, 18(1):97–119.
L’Hermier des Plantes, H. (1976).
Structuration des tableaux à trois indices de la statistique.
PhD thesis, Université de Montpellier.
Thèse de troisième cycle.
Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J., Roux, S., Vincent, F., Bittner,
L., Darzi, Y., Wang, B., Audic, S., Berline, L., Bontempi, G., Cabello, A., Coppola, L., Cornejo-Castillo, F., d’Oviedo, F., de Meester, L.,
Ferrera, I., Garet-Delmas, M., Guidi, L., Lara, E., Pesant, S., Royo-Llonch, M., Salazar, F., Sánchez, P., Sebastian, M., Souffreau, C., Dimier,
C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Gorsky, G., Not, F., Ogata, H., Speich, S., Stemmann, L.,
Weissenbach, J., Wincker, P., Acinas, S., Sunagawa, S., Bork, P., Sullivan, M., Karsenti, E., Bowler, C., de Vargas, C., and Raes, J. (2015).
Determinants of community structure in the global plankton interactome.
Science, 348(6237).
Mariette, J. and Villa-Vialaneix, N. (2018).
Unsupervised multiple kernel learning for heterogeneous data integration.
Bioinformatics, 34(6):1009–1015.
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 30
Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015).
Methods of integrating data to uncover genotype-phenotype interactions.
Nature Reviews Genetics, 16(2):85–97.
Robert, P. and Escoufier, Y. (1976).
A unifying tool for linear multivariate statistical methods: the rv-coefficient.
Applied Statistics, 25(3):257–265.
Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004).
Protein homology detection using string alignment kernels.
Bioinformatics, 20(11):1682–1689.
Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014).
Metabolite identification through multiple kernel learning on fragmentation trees.
Bioinformatics, 30(12):i157–i64.
Sunagawa, S., Coelho, L., Chaffron, S., Kultima, J., Labadie, K., Salazar, F., Djahanschiri, B., Zeller, G., Mende, D., Alberti, A.,
Cornejo-Castillo, F., Costea, P., Cruaud, C., d’Oviedo, F., Engelen, S., Ferrera, I., Gasol, J., Guidi, L., Hildebrand, F., Kokoszka, F., Lepoivre,
C., Lima-Mendez, G., Poulain, J., Poulos, B., Royo-Llonch, M., Sarmento, H., Vieira-Silva, S., Dimier, C., Picheral, M., Searson, S.,
Kandels-Lewis, S., Tara Oceans coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Jaillon, O., Not,
F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M., Weissenbach, J., Wincker, P., Karsenti, E., Raes, J., Acinas, S., and Bork,
P. (2015).
Structure and function of the global ocean microbiome.
Science, 348(6237).
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 31
Technical details: Unsupervised Kernel Feature Selection
(UKFS)
Reduced kernel for X ∈ Rp
Learn weights w ∈ {0, 1}p such that the kernel based on w · xi is very similar to the
kernel based on xi
Optimization problem
argmin
w∈{0,1}p
∥Kw
x − Kx ∥2
F s.t
p
X
j=1
wj ≤ d
Continuous relaxation (non-convex and non-smooth)
w∗ := argmin
w∈(R+)p
∥Kw
x − Kx ∥2
F + λ∥w∥1, solved with proximal gradient descent
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 31
UKFS performances
lapl SPEC MCFS NDFS UDFS Autoenc. UKFS
“Carcinom” (n = 174, p = 9 182)
ACC 164.02 106.52 184.17 200.88 138.48 143.13 206.55
COR 28.14 30.75 29.56 27.49 30.30 33.18 24.75
CPU 0.25 2.47 11.69 6,162 99,138  4 days 326
“Glioma” (n = 50, p = 4 434)
ACC 166.31 140.72 172.78 147.77 147.50 132.76 178.57
COR 81.70 70.70 76.43 68.02 72.33 45.96 52.14
CPU 0.02 0.63 1.05 368 2,636 ∼ 12h 23.74
“Koren” (n = 43, p = 980)
ACC 172.90 225.25 233.94 263.04 263.48 239.76 242.39
COR 48.18 52.34 49.94 48.48 48.69 32.60 47.77
CPU 0.01 0.07 1.11 5.88 9.70 ∼ 30 min 10.69
[Brouard et al., 2022]
▶ non redundant features
▶ can incorporate information on relations between variables
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 32
Technical details on kernel output feature selection (KOKFS)
Learn weights w ∈ {0, 1}p such that the kernel based on w · xi best explains the way
the (yi )i relate to each other as described by Ky :
min
h∈H,w∈(R+)p
f (h, w) + λ1∥h∥2
H + λ2∥w∥1,
where f (h, w) =
Pn
i=1 ∥h(w · xi ) − ψ(yi )∥2
Fy
and h of the following form:
h(xi ) = V ϕ(xi ).
This optimization problem is solved using an iterative algorithm alterning optimization
of w (similar to unsupervised framework) and optimization of h (using kernel trick).
Back
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 33
General summary of KOKFS
Multi-omics data integration methods
2022-12-15, Consortium MIMS / Nathalie Vialaneix
p. 34

More Related Content

What's hot

Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsPragya Pai
 
MCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdfMCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdfRajendraChavhan3
 
Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database bhargvi sharma
 
Sequencing, Alignment and Assembly
Sequencing, Alignment and AssemblySequencing, Alignment and Assembly
Sequencing, Alignment and AssemblyShaun Jackman
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics Ranjith Raj V
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
Multi Omics Approach in Medicine
Multi Omics Approach in MedicineMulti Omics Approach in Medicine
Multi Omics Approach in MedicineShreya Gupta
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Genomic mapping by kk sahu
Genomic mapping by kk sahuGenomic mapping by kk sahu
Genomic mapping by kk sahuKAUSHAL SAHU
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 

What's hot (20)

Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Similarity
SimilaritySimilarity
Similarity
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
MCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdfMCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdf
 
Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database
 
Sequencing, Alignment and Assembly
Sequencing, Alignment and AssemblySequencing, Alignment and Assembly
Sequencing, Alignment and Assembly
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics A Brief Introduction to Metabolomics
A Brief Introduction to Metabolomics
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Multi Omics Approach in Medicine
Multi Omics Approach in MedicineMulti Omics Approach in Medicine
Multi Omics Approach in Medicine
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Genomic mapping by kk sahu
Genomic mapping by kk sahuGenomic mapping by kk sahu
Genomic mapping by kk sahu
 
Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)
 
Microarray by dr.prabhash
Microarray by dr.prabhashMicroarray by dr.prabhash
Microarray by dr.prabhash
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Protein database
Protein databaseProtein database
Protein database
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 

Similar to Multi-omics data integration methods: kernel and other machine learning approaches

Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology tuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...Wisdom Dlamini
 
[ADBIS 2021] - Optimizing Execution Plans in a Multistore
[ADBIS 2021] - Optimizing Execution Plans in a Multistore[ADBIS 2021] - Optimizing Execution Plans in a Multistore
[ADBIS 2021] - Optimizing Execution Plans in a MultistoreChiara Forresi
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data FusionIRJET Journal
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883paserv
 
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING VisionGEOMATIQUE2014
 
A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...IJECEIAES
 

Similar to Multi-omics data integration methods: kernel and other machine learning approaches (20)

Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...
 
[ADBIS 2021] - Optimizing Execution Plans in a Multistore
[ADBIS 2021] - Optimizing Execution Plans in a Multistore[ADBIS 2021] - Optimizing Execution Plans in a Multistore
[ADBIS 2021] - Optimizing Execution Plans in a Multistore
 
UNIT 1_2.ppt
UNIT 1_2.pptUNIT 1_2.ppt
UNIT 1_2.ppt
 
Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883
 
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
 
A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...A new model for iris data set classification based on linear support vector m...
A new model for iris data set classification based on linear support vector m...
 

More from tuxette

Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
La famille *down
La famille *downLa famille *down
La famille *downtuxette
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 

More from tuxette (20)

Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
La famille *down
La famille *downLa famille *down
La famille *down
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 

Recently uploaded

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 

Recently uploaded (20)

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 

Multi-omics data integration methods: kernel and other machine learning approaches

  • 1. Multi-omics data integration methods: kernel and other machine learning approaches Nathalie Vialaneix nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu Consortium MIMS December 15th, 2022
  • 2. Who am I? ▶ researcher at INRAE (France) ▶ trained as a mathematician −→ statistics and machine learning for computational biology (omics) ▶ mostly involved in projects on animal genomics (transcriptomics, Hi-C, ATAC-seq...) ▶ kernel & network methods Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 2
  • 3. Collected data at genomic level are increasingly publicly available the different levels are not always compatible Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 3
  • 4. Collected data at genomic level are increasingly publicly available the different levels are not always compatible [Foissac et al., 2019] Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 3
  • 5. Analysis bottlenecks ▶ very large dimensionality and big data (both scaling and statistical issues) n ∼ {5 − 1000} p ∼ 10{3−5} Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 4
  • 6. Analysis bottlenecks ▶ very large dimensionality and big data (both scaling and statistical issues) ▶ missing values and incomplete designs Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 4
  • 7. Analysis bottlenecks ▶ very large dimensionality and big data (both scaling and statistical issues) ▶ missing values and incomplete designs ▶ highly non Gaussian data: skewed distributions, count data, zero-inflated data, ... Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 4
  • 8. Analysis bottlenecks ▶ very large dimensionality and big data (both scaling and statistical issues) ▶ missing values and incomplete designs ▶ highly non Gaussian data: skewed distributions, count data, zero-inflated data, ... ▶ non Euclidean data: compositional data (metagenomics), similarity matrices (Hi-C), spectra (metabolomics), ... (*) (**) (*) image from [Dumuid et al., 2020] (**) image by courtesy of Gaëlle Lefort Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 4
  • 9. Analysis bottlenecks ▶ very large dimensionality and big data (both scaling and statistical issues) ▶ missing values and incomplete designs ▶ highly non Gaussian data: skewed distributions, count data, zero-inflated data, ... ▶ non Euclidean data: compositional data (metagenomics), similarity matrices (Hi-C), spectra (metabolomics), ... In addition: in plant & animal sciences, less discriminative phenotypes, interest is indirect (in breeding not directly in the individual itself), much less data with poorer quality annotation (inter-species transfer might be useful). Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 4
  • 10. Omics data integration Type of data to integrate vertical: horizontal: Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 5
  • 11. Omics data integration Type of data to integrate vertical: horizontal: Type of analysis to perform supervised: unsupervised: Left pictures courtesy Harold Duruflé Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 5
  • 12. Multiple table analyses (CCA, MFA, PLS, STATIS, ...) Image from https://dimensionless.in Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 6
  • 13. Types of data integration methods [Ritchie et al., 2015] Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 7
  • 14. Want to know them all? https://github.com/mikelove/awesome-multi-omics Some specificities: can account for structure in data (network), are dedicated to a specific omic (single-cell), can account for temporal/spatial information, can include biological knowledge (mostly GO), ... Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 8
  • 15. Making methods available for biologists http://asterics.miat.inrae.fr Ambition: ▶ easy-to-use and interactive ▶ helps user to know which analysis to use and how to interpret results ▶ integrates domain expertise ▶ usable online or can be installed Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 9
  • 16. Scope of the rest of the talk Unsupervised transformation based integration Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 10
  • 17. Overview of the talk Kernel methods Integrating data with kernels Improve interpretability Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 11
  • 18. Main ideas behind kernel methods Standard (omics) data analyses: ▶ data are (numeric) tables ▶ analyses are based on operations (distances, means, ...) in the variable space Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 12
  • 19. Main ideas behind kernel methods Kernel data analyses: ▶ data are arbitrary ▶ analyses are based on transformations of data to “similarities” between samples Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 13
  • 20. More formally... n samples (xi )i ∈ X kernels: symmetric and positive definite (n × n)-matrix K that measures a “similarity” between n entities in X Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 14
  • 21. More formally... n samples (xi )i ∈ X kernels: symmetric and positive definite (n × n)-matrix K that measures a “similarity” between n entities in X Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 14
  • 22. More formally... n samples (xi )i ∈ X kernels: symmetric and positive definite (n × n)-matrix K that measures a “similarity” between n entities in X K(x, x′) = ⟨ϕ(x), ϕ(x′)⟩ Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 14
  • 23. Principles of learning from kernels Start from any statistical method (PCA, regression, k-means clustering) and rewrite all quantities using: ▶ K to compute distances and dot products dot product is: Kii′ and distance is: √ Kii + Ki′i′ − 2Kii′ ▶ (implicit) linear or convex combinations of (ϕ(xi ))i to describe all unobserved elements (centers of gravity and so on...) Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 15
  • 24. Kernel examples 1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2 Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 16
  • 25. Kernel examples 1. Rp observations: Gaussian kernel Kii′ = e−γ∥xi −xi′ ∥2 2. nodes of a graph: [Kondor and Lafferty, 2002] 3. sequence kernels (between proteins: spectrum kernel [Jaakkola et al., 2000] or convolution kernel [Saigo et al., 2004]) 4. kernel between graphs (used between metabolites based on their fragmentation trees): [Shen et al., 2014, Brouard et al., 2016] 5. kernel embedding phylogeny information for metagenomics [Mariette and Villa-Vialaneix, 2018] 6. ... Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 16
  • 26. Overview of the talk Kernel methods Integrating data with kernels Improve interpretability Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 17
  • 27. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 18
  • 28. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 ▶ naive approach: K∗ = 1 M P m Km Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 18
  • 29. Multiple kernel (or distance) integration How to “optimally” combine several kernel datasets? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 ▶ naive approach: K∗ = 1 M P m Km ▶ supervised framework: K∗ = P m βmKm with βm ≥ 0 and P m βm = 1 with βm chosen so as to minimize the prediction error [Gönen and Alpaydin, 2011] Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 18
  • 30. Combining kernels in an unsupervised setting Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 19
  • 31. Multiple kernel integration Ideas of kernel consensus: find a kernel that performs a consensus of all kernels [Mariette and Villa-Vialaneix, 2018] - R package mixKernel with consensus based on: ▶ STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994] ▶ criterion that preserves local geometry Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 20
  • 32. STATIS like framework Similarities between kernels: Cmm′ = ⟨Km, Km′ ⟩F ∥Km∥F ∥Km′ ∥F = Trace(KmKm′ ) p Trace((Km)2)Trace((Km′ )2) . [Robert and Escoufier, 1976] Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 21
  • 33. STATIS like framework Similarities between kernels: Cmm′ = ⟨Km, Km′ ⟩F ∥Km∥F ∥Km′ ∥F = Trace(KmKm′ ) p Trace((Km)2)Trace((Km′ )2) . [Robert and Escoufier, 1976] maximizeβ M X m=1 K∗ (β), Km ∥Km∥F F = maximizeβ β⊤ Cβ s.t. ∥β∥2 = 1 Solution: first eigenvector of C Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 21
  • 34. TARA Oceans expedition Science (May 2015) - Studies on: ▶ eukaryotic plankton diversity [de Vargas et al., 2015], ▶ ocean viral communities [Brum et al., 2015], ▶ global plankton interactome [Lima-Mendez et al., 2015], ▶ global ocean microbiome [Sunagawa et al., 2015], ▶ . . . . → datasets from different types and different sources analyzed separately. Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 22
  • 35. Integrating TARA Oceans datasets ▶ For all compositional datasets, include phylogenetic information (rather than CLR and alike): weighted Unifrac distance ▶ Perform PCA (could have been clustering, linear model, . . . ) in the feature space. + combine with a shuffling approach to identify most influencing variables Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 23
  • 36. Application to TARA oceans Main facts ▶ Oceans typology related to longitude ▶ Rhizaria abundance structure the differences, especially between Arctic Oceans and Pacific Oceans Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 24
  • 37. Overview of the talk Kernel methods Integrating data with kernels Improve interpretability Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 25
  • 38. Selecting features in kernels Which features are important? (for numerical features only): ▶ label each feature j with a weight wj ∈ {0, 1} (selected or not) ▶ new kernel: Kw(xi , xi′ ) = K(w · xi , w · xi′ ) Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 26
  • 39. Selecting features in kernels Which features are important? (for numerical features only): ▶ label each feature j with a weight wj ∈ {0, 1} (selected or not) ▶ new kernel: Kw(xi , xi′ ) = K(w · xi , w · xi′ ) ▶ find the best weights to optimize a quality criterion Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 26
  • 40. Selecting features in kernels Which features are important? (for numerical features only): ▶ label each feature j with a weight wj ∈ {0, 1} (selected or not) ▶ new kernel: Kw(xi , xi′ ) = K(w · xi , w · xi′ ) ▶ find the best weights to optimize a quality criterion How to do it?: ▶ supervised framework: learn w to compute a kernel Kw x best to predict Y ∈ R [Allen, 2013, Grandvalet and Canu, 2002] Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 26
  • 41. Extensions [Brouard et al., 2022] and mixKernel ▶ extended to unsupervised (exploratory) learning argmin w∈{0,1}p ∥Kw x − Kx ∥2 F s.t p X j=1 wj ≤ d Continuous relaxation (non-convex and non-smooth), solved with proximal gradient descent. Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 27
  • 42. Extensions [Brouard et al., 2022] and mixKernel ▶ extended to unsupervised (exploratory) learning argmin w∈{0,1}p ∥Kw x − Kx ∥2 F s.t p X j=1 wj ≤ d ▶ extended to predict arbitrary outputs (kernel outputs, including multiple output regression) optimization problem Continuous relaxation (non-convex and non-smooth), solved with proximal gradient descent. Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 27
  • 43. KOKFS illustration Used with covid19 dataset from [Haug et al., 2020]: ▶ inputs: 46 government measures (0/1 encoding) ▶ outputs: R time series (with a kernel based on Fréchet distances) Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 28
  • 44. KOKFS illustration Used with covid19 dataset from [Haug et al., 2020]: ▶ inputs: 46 government measures (0/1 encoding) ▶ outputs: R time series (with a kernel based on Fréchet distances) More important measures: Increase In Medical Supplies And Equipment Border Health Check Public Transport Restriction The Government Provides Assistance To Vulnerable Populations Individual Movement Restrictions Increase Availability Of Ppe Activate Case Notification Border Restriction Measures To Ensure Security Of Supply Port And Ship Restriction Activate Or Establish Emergency Response Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 28
  • 45. Future needs for data integration ▶ improve interpretability of methods (integrate more biological knowledge) ▶ generic vs specific: omics are always evolving... ▶ reduce computational needs to achieve the challenge of a more sustainable research ▶ make them more widely used by the biological community: omics data are still produced at a much faster rate than their use! Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 29
  • 46. Thank you for your attention! Questions? Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 30
  • 47. References (unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau: https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae Allen, G. I. (2013). Automatic feature selection via weighted kernels and regularization. Journal of Computational and Graphical Statistics, 22(2):284–299. Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2022). Feature selection for kernel methods in systems biology. NAR Genomics and Bioinformatics, 4(1):lqac014. Brouard, C., Shen, H., Dürkop, K., d’Alché Buc, F., Böcker, S., and Rousu, J. (2016). Fast metabolite identification with input output kernel regression. Bioinformatics, 32(12):i28–i36. Brum, J., Ignacio-Espinoza, J., Roux, S., Doulcier, G., Acinas, S., Alberti, A., Chaffron, S., Cruaud, C., de Vargas, C., Gasol, J., Gorsky, G., Gregory, A., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B., Schwenck, S., Speich, S., Dimier, C., Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker, P., Karsenti, E., and Sullivan, M. (2015). Patterns and ecological drivers of ocean viral communities. Science, 348(6237). de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, P., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horák, A., Jaillon, Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 30
  • 48. O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F., Zingone, A., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Acinas, S., Bork, P., Bowler, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M. E., Speich, S., Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., and Karsenti, E. (2015). Eukaryotic plankton diversity in the sunlit ocean. Science, 348(6237). Dumuid, D., Pedišić, v., Palarea-Albaladejo, J., Martı́n-Fernández, J. A., Hron, K., and Olds, T. (2020). Compositional data analysis in time-use epidemiology: what, why, how. International Journal of Environmental Research and Public Health, 17(7):2220. Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau, C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S., Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S., Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019). Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biology, 17:108. Gönen, M. and Alpaydin, E. (2011). Multiple kernel learning algorithms. Journal of Machine Learning Research, 12:2211–2268. Grandvalet, Y. and Canu, S. (2002). Adaptive scaling for feature selection in SVMs. In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages 569–576. MIT Press. Haug, N., Geyrhofer, L., Londei, A., Dervic, E., Desvars-Larrive, A., Loreto, V., Pinior, B., Thurner, S., and Klimek, P. (2020). Ranking the effectiveness of worldwide COVID-19 government interventions. Nature Human Behaviour, 4(4):1303–1312. Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 30
  • 49. Jaakkola, T., Diekhans, M., and Haussler, D. (2000). A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, 7(1-2):95–114. Kondor, R. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In Sammut, C. and Hoffmann, A., editors, Proceedings of the 19th International Conference on Machine Learning, pages 315–322, Sydney, Australia. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA. Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994). The ACT (STATIS method). Computational Statistics and Data Analysis, 18(1):97–119. L’Hermier des Plantes, H. (1976). Structuration des tableaux à trois indices de la statistique. PhD thesis, Université de Montpellier. Thèse de troisième cycle. Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J., Roux, S., Vincent, F., Bittner, L., Darzi, Y., Wang, B., Audic, S., Berline, L., Bontempi, G., Cabello, A., Coppola, L., Cornejo-Castillo, F., d’Oviedo, F., de Meester, L., Ferrera, I., Garet-Delmas, M., Guidi, L., Lara, E., Pesant, S., Royo-Llonch, M., Salazar, F., Sánchez, P., Sebastian, M., Souffreau, C., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Gorsky, G., Not, F., Ogata, H., Speich, S., Stemmann, L., Weissenbach, J., Wincker, P., Acinas, S., Sunagawa, S., Bork, P., Sullivan, M., Karsenti, E., Bowler, C., de Vargas, C., and Raes, J. (2015). Determinants of community structure in the global plankton interactome. Science, 348(6237). Mariette, J. and Villa-Vialaneix, N. (2018). Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics, 34(6):1009–1015. Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 30
  • 50. Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015). Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics, 16(2):85–97. Robert, P. and Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods: the rv-coefficient. Applied Statistics, 25(3):257–265. Saigo, H., Vert, J.-P., Ueda, N., and Akutsu, T. (2004). Protein homology detection using string alignment kernels. Bioinformatics, 20(11):1682–1689. Shen, H., Dührkop, K., Böcher, S., and Rousu, J. (2014). Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics, 30(12):i157–i64. Sunagawa, S., Coelho, L., Chaffron, S., Kultima, J., Labadie, K., Salazar, F., Djahanschiri, B., Zeller, G., Mende, D., Alberti, A., Cornejo-Castillo, F., Costea, P., Cruaud, C., d’Oviedo, F., Engelen, S., Ferrera, I., Gasol, J., Guidi, L., Hildebrand, F., Kokoszka, F., Lepoivre, C., Lima-Mendez, G., Poulain, J., Poulos, B., Royo-Llonch, M., Sarmento, H., Vieira-Silva, S., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Jaillon, O., Not, F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M., Weissenbach, J., Wincker, P., Karsenti, E., Raes, J., Acinas, S., and Bork, P. (2015). Structure and function of the global ocean microbiome. Science, 348(6237). Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 31
  • 51. Technical details: Unsupervised Kernel Feature Selection (UKFS) Reduced kernel for X ∈ Rp Learn weights w ∈ {0, 1}p such that the kernel based on w · xi is very similar to the kernel based on xi Optimization problem argmin w∈{0,1}p ∥Kw x − Kx ∥2 F s.t p X j=1 wj ≤ d Continuous relaxation (non-convex and non-smooth) w∗ := argmin w∈(R+)p ∥Kw x − Kx ∥2 F + λ∥w∥1, solved with proximal gradient descent Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 31
  • 52. UKFS performances lapl SPEC MCFS NDFS UDFS Autoenc. UKFS “Carcinom” (n = 174, p = 9 182) ACC 164.02 106.52 184.17 200.88 138.48 143.13 206.55 COR 28.14 30.75 29.56 27.49 30.30 33.18 24.75 CPU 0.25 2.47 11.69 6,162 99,138 4 days 326 “Glioma” (n = 50, p = 4 434) ACC 166.31 140.72 172.78 147.77 147.50 132.76 178.57 COR 81.70 70.70 76.43 68.02 72.33 45.96 52.14 CPU 0.02 0.63 1.05 368 2,636 ∼ 12h 23.74 “Koren” (n = 43, p = 980) ACC 172.90 225.25 233.94 263.04 263.48 239.76 242.39 COR 48.18 52.34 49.94 48.48 48.69 32.60 47.77 CPU 0.01 0.07 1.11 5.88 9.70 ∼ 30 min 10.69 [Brouard et al., 2022] ▶ non redundant features ▶ can incorporate information on relations between variables Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 32
  • 53. Technical details on kernel output feature selection (KOKFS) Learn weights w ∈ {0, 1}p such that the kernel based on w · xi best explains the way the (yi )i relate to each other as described by Ky : min h∈H,w∈(R+)p f (h, w) + λ1∥h∥2 H + λ2∥w∥1, where f (h, w) = Pn i=1 ∥h(w · xi ) − ψ(yi )∥2 Fy and h of the following form: h(xi ) = V ϕ(xi ). This optimization problem is solved using an iterative algorithm alterning optimization of w (similar to unsupervised framework) and optimization of h (using kernel trick). Back Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 33
  • 54. General summary of KOKFS Multi-omics data integration methods 2022-12-15, Consortium MIMS / Nathalie Vialaneix p. 34