SlideShare a Scribd company logo
1 of 24
Download to read offline
La statistique et le machine learning pour l’intégration de
données de la biologie haut débit
Nathalie Vialaneix
nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
Séminaire de lancement Métaprogramme DIGIT-BIO
31 mars 2021
Collected data at genomic level are huge, multi-scaled and of
heterogeneous types... an increasingly publicly available
[Foissac et al., 2019]
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 2
Analysis bottleneck
I as in humans, under-used data because: various experimental conditions
(meta-analyses needed), various scales (not always compatible), various types (not
always simply numeric), unperfect matching between experiments (missing
data/individuals), ...
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 3
Analysis bottleneck
I as in humans, under-used data because: various experimental conditions
(meta-analyses needed), various scales (not always compatible), various types (not
always simply numeric), unperfect matching between experiments (missing
data/individuals), ...
I in addition to humans: less discriminative phenotypes, interest is indirect (in
breeding not directly in the individual itself), much less data with poorer quality
annotation (inter-species transfer might be useful)
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 3
Multiple table analyses
(CCA, MFA, PLS, STATIS, ...)
Image from https://dimensionless.in
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 4
Types of data integration methods [Ritchie et al., 2015]
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 5
Types of data integration methods [Ritchie et al., 2015]
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 5
Common representations using relational data
n samples (xi )i ∈ X, summarized by pairwise similarities or dissimilarities
kernels: symmetric and positive definite (n × n)-matrix
K that measures a “similarity” between n entities in X
K(x, x0) = hφ(x), φ(x0)i
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 6
Multiple kernel (or distance) integration
How to “optimally” combine several relational datasets in an unsupervised
setting?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 7
Multiple kernel (or distance) integration
How to “optimally” combine several relational datasets in an unsupervised
setting?
For kernels K1, . . . , KM obtained on the same n objects, search: Kβ =
PM
m=1 βmKm
with βm ≥ 0 and
P
m βm = 1
Ideas of kernel consensus: find a kernel that performs a consensus of all kernels
[Mariette and Villa-Vialaneix, 2018] - R package mixKernel
with consensus based on:
I STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994]
I criterion that preserves local geometry
minimizeβ
n
X
i,j=1
Wij
|{z}
local proximity
× k∆i (β) − ∆j (β)k
2
| {z }
feature space distance
for βm ≥ 0 and kβk1,2 = 1
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 7
TARA Oceans expedition
Science (May 2015) - Studies on:
I eukaryotic plankton diversity
[de Vargas et al., 2015],
I ocean viral communities
[Brum et al., 2015],
I global plankton interactome
[Lima-Mendez et al., 2015],
I global ocean microbiome
[Sunagawa et al., 2015],
I . . . .
→ datasets from different types and
different sources analyzed separately.
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 8
Integrating TARA Oceans datasets
I For all compositional datasets, include
phylogeny information rather than
standard compositional
transformations (CLR and alike):
weighted Unifrac distance
I Perform PCA (could have been
clustering, linear model, . . . ) in the
feature space.
+ combine with a shuffling approach
to identify most influencing variables
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 9
Application to TARA oceans
Main facts
I Oceans typology related to longitude
I Rhizaria abundance structure the differences, especially between Arctic Oceans
and Pacific Oceans
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 10
What’s next? Improve interpretability
Select variables the most relevant for a given kernel
I built on works that select variables in X to compute a kernel Kx best to predict
Y ∈ R [Allen, 2013, Grandvalet and Canu, 2002]
I extended to unsupervised (exploratory) learning and arbitrary outputs (kernel
outputs, including multiple output regression)
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 11
What’s next? Improve interpretability
Select variables the most relevant for a given kernel
I built on works that select variables in X to compute a kernel Kx best to predict
Y ∈ R [Allen, 2013, Grandvalet and Canu, 2002]
I extended to unsupervised (exploratory) learning and arbitrary outputs (kernel
outputs, including multiple output regression)
Continuous relaxation of a discrete problem
non-convex and non-smooth optimization solved with proximal gradient descent
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 11
UKFS performances
lapl SPEC MCFS NDFS UDFS Autoenc. UKFS
“Carcinom” (n = 174, p = 9 182)
ACC 164.02 106.52 184.17 200.88 138.48 143.13 206.55
COR 28.14 30.75 29.56 27.49 30.30 33.18 24.75
CPU 0.25 2.47 11.69 6,162 99,138 > 4 days 326
“Glioma” (n = 50, p = 4 434)
ACC 166.31 140.72 172.78 147.77 147.50 132.76 178.57
COR 81.70 70.70 76.43 68.02 72.33 45.96 52.14
CPU 0.02 0.63 1.05 368 2,636 ∼ 12h 23.74
“Koren” (n = 43, p = 980)
ACC 172.90 225.25 233.94 263.04 263.48 239.76 242.39
COR 48.18 52.34 49.94 48.48 48.69 32.60 47.77
CPU 0.01 0.07 1.11 5.88 9.70 ∼ 30 min 10.69
[Brouard et al., 2021]
I non redundant features
I can incorporate information on relations between variables
I works also well for multiple output regression and kernel output regression (tested
on covid19 time series)
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 12
What about deep learning?
I Cons: computational cost; data quantity requirement (observations); not easily
interpretable; mostly designed for numeric inputs
I Pros: can be good for prediction
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 13
What about deep learning?
I Cons: computational cost; data quantity requirement (observations); not easily
interpretable; mostly designed for numeric inputs
I Pros: can be good for prediction
[Grattarola and Alippi, 2020]
Ex: information on relations between genes + expression to predict a phenotype
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 13
Can we predict AND understand?
CID1 CID2 CID3
8 semaines ≈ 6 mois
Objectif :
Perte > 8% du poids
5 groupes avec
des régimes différents
Légende :
CID : Clinical
Investigation Day
tissu adipeux
prélevements
sanguins
- quantification des transcripts provenant
du tissu adipeux mesurées par QuantSeq
- données phénotypiques et cliniques
Régime hypocalorique Phase de suivi pondéral
Données :
[Jin et al., 2020] ⇒ outcome of metabolic syndrom
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 14
Thank you for your attention!
Questions?
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 15
References
(unofficial) Beamer template made with the help of Thomas Schiex: https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae
Allen, G. I. (2013).
Automatic feature selection via weighted kernels and regularization.
Journal of Computational and Graphical Statistics, 22(2):284–299.
Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2021).
Feature selection for kernel methods in systems biology.
In preparation.
Brum, J., Ignacio-Espinoza, J., Roux, S., Doulcier, G., Acinas, S., Alberti, A., Chaffron, S., Cruaud, C., de Vargas, C., Gasol, J., Gorsky, G.,
Gregory, A., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B., Schwenck, S., Speich, S., Dimier, C.,
Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker, P., Karsenti, E., and
Sullivan, M. (2015).
Patterns and ecological drivers of ocean viral communities.
Science, 348(6237).
de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, P., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I., Carmichael, M.,
Poulain, J., Romac, S., Colin, S., Aury, J., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horák, A., Jaillon,
O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F., Zingone, A., Dimier, C., Picheral, M.,
Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Acinas, S., Bork, P., Bowler, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone,
D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M. E., Speich, S., Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., and
Karsenti, E. (2015).
Eukaryotic plankton diversity in the sunlit ocean.
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 15
Science, 348(6237).
Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau,
C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S.,
Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S.,
Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019).
Multi-species annotation of transcriptome and chromatin structure in domesticated animals.
BMC Biology, 17:108.
Grandvalet, Y. and Canu, S. (2002).
Adaptive scaling for feature selection in SVMs.
In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages
569–576. MIT Press.
Grattarola, D. and Alippi, C. (2020).
Graph neural networks in TensorFlow and Keras with Spektral.
In Proceedings of the Graph Representation Learning and Beyond – ICML 2020 Workshop.
Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., and Tang, J. (2020).
Graph structure learning for robust graph neural networks.
In Gupta, R., Liu, Y., Tang, J., and Prakash, B. A., editors, Proceedings of of the 26th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining (KDD ’20), pages 66–74, New York, NY, USA. Association for Computing Machinery (ACM).
Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).
The ACT (STATIS method).
Computational Statistics and Data Analysis, 18(1):97–119.
L’Hermier des Plantes, H. (1976).
Structuration des tableaux à trois indices de la statistique.
PhD thesis, Université de Montpellier.
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 15
Thèse de troisième cycle.
Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J., Roux, S., Vincent, F., Bittner,
L., Darzi, Y., Wang, B., Audic, S., Berline, L., Bontempi, G., Cabello, A., Coppola, L., Cornejo-Castillo, F., d’Oviedo, F., de Meester, L.,
Ferrera, I., Garet-Delmas, M., Guidi, L., Lara, E., Pesant, S., Royo-Llonch, M., Salazar, F., Sánchez, P., Sebastian, M., Souffreau, C., Dimier,
C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Gorsky, G., Not, F., Ogata, H., Speich, S., Stemmann, L.,
Weissenbach, J., Wincker, P., Acinas, S., Sunagawa, S., Bork, P., Sullivan, M., Karsenti, E., Bowler, C., de Vargas, C., and Raes, J. (2015).
Determinants of community structure in the global plankton interactome.
Science, 348(6237).
Mariette, J. and Villa-Vialaneix, N. (2018).
Unsupervised multiple kernel learning for heterogeneous data integration.
Bioinformatics, 34(6):1009–1015.
Ritchie, M., Phipson, B., Wu, D., Hu, Y., Law, C., Shi, W., and Smyth, G. (2015).
limma powers differential expression analyses for RNA-sequencing and microarray studies.
Nucleic Acids Research, 43(7):e47.
Sunagawa, S., Coelho, L., Chaffron, S., Kultima, J., Labadie, K., Salazar, F., Djahanschiri, B., Zeller, G., Mende, D., Alberti, A.,
Cornejo-Castillo, F., Costea, P., Cruaud, C., d’Oviedo, F., Engelen, S., Ferrera, I., Gasol, J., Guidi, L., Hildebrand, F., Kokoszka, F., Lepoivre,
C., Lima-Mendez, G., Poulain, J., Poulos, B., Royo-Llonch, M., Sarmento, H., Vieira-Silva, S., Dimier, C., Picheral, M., Searson, S.,
Kandels-Lewis, S., Tara Oceans coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Jaillon, O., Not,
F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M., Weissenbach, J., Wincker, P., Karsenti, E., Raes, J., Acinas, S., and Bork,
P. (2015).
Structure and function of the global ocean microbiome.
Science, 348(6237).
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 16
Technical details: Unsupervised Kernel Feature Selection
(UKFS)
Reduced kernel for X ∈ Rp
Tune weights w ∈ {0, 1}p such that the kernel based on w · X is very similar to the
kernel based on X
Optimization problem
argmin
w∈{0,1}p
kKw
x − Kx k2
F s.t
p
X
j=1
wj ≤ d
Continuous relaxation (non-convex and non-smooth)
w∗ := argmin
w∈(R+)p
kKw
x − Kx k2
F + λkwk1, solved with proximal gradient descent
Statistics & ML for high throughput data integration
31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix
p. 16

More Related Content

What's hot

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysistuxette
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networkstuxette
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetstuxette
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural networktuxette
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...eSAT Journals
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpymustafa sarac
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learningbutest
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyMichiel Stock
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 

What's hot (20)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasets
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept AnalysisOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational Biology
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 

Similar to La statistique et le machine learning pour l'intégration de données de la biologie haut débit

Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataFacultad de Informática UCM
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Nicola Amoroso
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsChristos Argyropoulos
 
Application of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing DataApplication of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing Dataijtsrd
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsCatia Pesquita
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptxcsecem
 
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...Cagatay Turkay
 
Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...
Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...
Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...luizcelsojr
 
Zhao_Danton_SR16_Poster
Zhao_Danton_SR16_PosterZhao_Danton_SR16_Poster
Zhao_Danton_SR16_PosterDanton Zhao
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
 

Similar to La statistique et le machine learning pour l'intégration de données de la biologie haut débit (20)

Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big Data
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive Models
 
Application of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing DataApplication of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing Data
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applications
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
AI Math Agents
AI Math AgentsAI Math Agents
AI Math Agents
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
 
2017 11 cascd
2017 11 cascd2017 11 cascd
2017 11 cascd
 
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
 
Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...
Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...
Ciência de Dados: definição, desafios de modelagem e aplicações multidiscipli...
 
Ml in genomics
Ml in genomicsMl in genomics
Ml in genomics
 
Zhao_Danton_SR16_Poster
Zhao_Danton_SR16_PosterZhao_Danton_SR16_Poster
Zhao_Danton_SR16_Poster
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendation
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
La famille *down
La famille *downLa famille *down
La famille *downtuxette
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
La famille *down
La famille *downLa famille *down
La famille *down
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 

Recently uploaded

Call Girls | 😏💦 03274100048 | Call Girls Near Me
Call Girls | 😏💦 03274100048 | Call Girls Near MeCall Girls | 😏💦 03274100048 | Call Girls Near Me
Call Girls | 😏💦 03274100048 | Call Girls Near MeIfra Zohaib
 
Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...
Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...
Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...riyaescorts54
 
Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...
Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...
Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...aakahthapa70
 
NAGPUR CALL GIRL 92628*71154 NAGPUR CALL
NAGPUR CALL GIRL 92628*71154 NAGPUR CALLNAGPUR CALL GIRL 92628*71154 NAGPUR CALL
NAGPUR CALL GIRL 92628*71154 NAGPUR CALLNiteshKumar82226
 
DIGHA CALL GIRL 92628/1154 DIGHA CALL GI
DIGHA CALL GIRL 92628/1154 DIGHA CALL GIDIGHA CALL GIRL 92628/1154 DIGHA CALL GI
DIGHA CALL GIRL 92628/1154 DIGHA CALL GINiteshKumar82226
 
Call Girls In Sector 85 Noida 9711911712 Escorts ServiCe Noida
Call Girls In Sector 85 Noida 9711911712 Escorts ServiCe NoidaCall Girls In Sector 85 Noida 9711911712 Escorts ServiCe Noida
Call Girls In Sector 85 Noida 9711911712 Escorts ServiCe NoidaDelhi Escorts Service
 
KAKINADA CALL GIRL 92628/71154 KAKINADA C
KAKINADA CALL GIRL 92628/71154 KAKINADA CKAKINADA CALL GIRL 92628/71154 KAKINADA C
KAKINADA CALL GIRL 92628/71154 KAKINADA CNiteshKumar82226
 
BHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALLBHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALLNiteshKumar82226
 
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝thapagita
 
Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...
Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...
Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...riyasharma00119
 
CALL GIRLS 9999288940 women seeking men Locanto No Advance North Goa
CALL GIRLS 9999288940 women seeking men Locanto No Advance North GoaCALL GIRLS 9999288940 women seeking men Locanto No Advance North Goa
CALL GIRLS 9999288940 women seeking men Locanto No Advance North Goadelhincr993
 
NASHIK CALL GIRL 92628*71154 NASHIK CALL
NASHIK CALL GIRL 92628*71154 NASHIK CALLNASHIK CALL GIRL 92628*71154 NASHIK CALL
NASHIK CALL GIRL 92628*71154 NASHIK CALLNiteshKumar82226
 
Best VIP Call Girl Noida Sector 48 Call Me: 8700611579
Best VIP Call Girl Noida Sector 48 Call Me: 8700611579Best VIP Call Girl Noida Sector 48 Call Me: 8700611579
Best VIP Call Girl Noida Sector 48 Call Me: 8700611579diyaspanoida
 
RAJKOT CALL GIRLS 92628/71154 RAJKOT CAL
RAJKOT CALL GIRLS 92628/71154 RAJKOT CALRAJKOT CALL GIRLS 92628/71154 RAJKOT CAL
RAJKOT CALL GIRLS 92628/71154 RAJKOT CALNiteshKumar82226
 
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...aakahthapa70
 
Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...
Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...
Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...aakahthapa70
 
Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.
Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.
Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.riyadelhic riyadelhic
 

Recently uploaded (20)

Call Girls | 😏💦 03274100048 | Call Girls Near Me
Call Girls | 😏💦 03274100048 | Call Girls Near MeCall Girls | 😏💦 03274100048 | Call Girls Near Me
Call Girls | 😏💦 03274100048 | Call Girls Near Me
 
Call Girls In Goa For Fun 9316020077 By Goa Call Girls For Pick Up Night
Call Girls In  Goa  For Fun 9316020077 By  Goa  Call Girls For Pick Up NightCall Girls In  Goa  For Fun 9316020077 By  Goa  Call Girls For Pick Up Night
Call Girls In Goa For Fun 9316020077 By Goa Call Girls For Pick Up Night
 
Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...
Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...
Hot Vip Call Girls Service In Sector 149,9818099198 Young Female Escorts Serv...
 
Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...
Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...
Call Girls In {Green Park Delhi} 9667938988 Indian Russian High Profile Girls...
 
9953056974 Call Girls In Ashok Nagar, Escorts (Delhi) NCR.
9953056974 Call Girls In Ashok Nagar, Escorts (Delhi) NCR.9953056974 Call Girls In Ashok Nagar, Escorts (Delhi) NCR.
9953056974 Call Girls In Ashok Nagar, Escorts (Delhi) NCR.
 
NAGPUR CALL GIRL 92628*71154 NAGPUR CALL
NAGPUR CALL GIRL 92628*71154 NAGPUR CALLNAGPUR CALL GIRL 92628*71154 NAGPUR CALL
NAGPUR CALL GIRL 92628*71154 NAGPUR CALL
 
DIGHA CALL GIRL 92628/1154 DIGHA CALL GI
DIGHA CALL GIRL 92628/1154 DIGHA CALL GIDIGHA CALL GIRL 92628/1154 DIGHA CALL GI
DIGHA CALL GIRL 92628/1154 DIGHA CALL GI
 
Call Girls In Sector 85 Noida 9711911712 Escorts ServiCe Noida
Call Girls In Sector 85 Noida 9711911712 Escorts ServiCe NoidaCall Girls In Sector 85 Noida 9711911712 Escorts ServiCe Noida
Call Girls In Sector 85 Noida 9711911712 Escorts ServiCe Noida
 
KAKINADA CALL GIRL 92628/71154 KAKINADA C
KAKINADA CALL GIRL 92628/71154 KAKINADA CKAKINADA CALL GIRL 92628/71154 KAKINADA C
KAKINADA CALL GIRL 92628/71154 KAKINADA C
 
Goa Call Girls 🥰 +91 9540619990 📍Service Girls In Goa
Goa Call Girls 🥰 +91 9540619990 📍Service Girls In GoaGoa Call Girls 🥰 +91 9540619990 📍Service Girls In Goa
Goa Call Girls 🥰 +91 9540619990 📍Service Girls In Goa
 
BHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALLBHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALL
 
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
 
Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...
Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...
Low Rate Russian Call Girls In Lajpat Nagar ➡️ 7836950116 Call Girls Service ...
 
CALL GIRLS 9999288940 women seeking men Locanto No Advance North Goa
CALL GIRLS 9999288940 women seeking men Locanto No Advance North GoaCALL GIRLS 9999288940 women seeking men Locanto No Advance North Goa
CALL GIRLS 9999288940 women seeking men Locanto No Advance North Goa
 
NASHIK CALL GIRL 92628*71154 NASHIK CALL
NASHIK CALL GIRL 92628*71154 NASHIK CALLNASHIK CALL GIRL 92628*71154 NASHIK CALL
NASHIK CALL GIRL 92628*71154 NASHIK CALL
 
Best VIP Call Girl Noida Sector 48 Call Me: 8700611579
Best VIP Call Girl Noida Sector 48 Call Me: 8700611579Best VIP Call Girl Noida Sector 48 Call Me: 8700611579
Best VIP Call Girl Noida Sector 48 Call Me: 8700611579
 
RAJKOT CALL GIRLS 92628/71154 RAJKOT CAL
RAJKOT CALL GIRLS 92628/71154 RAJKOT CALRAJKOT CALL GIRLS 92628/71154 RAJKOT CAL
RAJKOT CALL GIRLS 92628/71154 RAJKOT CAL
 
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
 
Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...
Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...
Call Girls In {{Laxmi Nagar Delhi}} 9667938988 Indian Russian High Profile Es...
 
Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.
Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.
Call Now ☎9870417354|| Call Girls in Dwarka Escort Service Delhi N.C.R.
 

La statistique et le machine learning pour l'intégration de données de la biologie haut débit

  • 1. La statistique et le machine learning pour l’intégration de données de la biologie haut débit Nathalie Vialaneix nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu Séminaire de lancement Métaprogramme DIGIT-BIO 31 mars 2021
  • 2. Collected data at genomic level are huge, multi-scaled and of heterogeneous types... an increasingly publicly available [Foissac et al., 2019] Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 2
  • 3. Analysis bottleneck I as in humans, under-used data because: various experimental conditions (meta-analyses needed), various scales (not always compatible), various types (not always simply numeric), unperfect matching between experiments (missing data/individuals), ... Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 3
  • 4. Analysis bottleneck I as in humans, under-used data because: various experimental conditions (meta-analyses needed), various scales (not always compatible), various types (not always simply numeric), unperfect matching between experiments (missing data/individuals), ... I in addition to humans: less discriminative phenotypes, interest is indirect (in breeding not directly in the individual itself), much less data with poorer quality annotation (inter-species transfer might be useful) Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 3
  • 5. Multiple table analyses (CCA, MFA, PLS, STATIS, ...) Image from https://dimensionless.in Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 4
  • 6. Types of data integration methods [Ritchie et al., 2015] Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 5
  • 7. Types of data integration methods [Ritchie et al., 2015] Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 5
  • 8. Common representations using relational data n samples (xi )i ∈ X, summarized by pairwise similarities or dissimilarities kernels: symmetric and positive definite (n × n)-matrix K that measures a “similarity” between n entities in X K(x, x0) = hφ(x), φ(x0)i Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 6
  • 9. Multiple kernel (or distance) integration How to “optimally” combine several relational datasets in an unsupervised setting? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 7
  • 10. Multiple kernel (or distance) integration How to “optimally” combine several relational datasets in an unsupervised setting? For kernels K1, . . . , KM obtained on the same n objects, search: Kβ = PM m=1 βmKm with βm ≥ 0 and P m βm = 1 Ideas of kernel consensus: find a kernel that performs a consensus of all kernels [Mariette and Villa-Vialaneix, 2018] - R package mixKernel with consensus based on: I STATIS [L’Hermier des Plantes, 1976, Lavit et al., 1994] I criterion that preserves local geometry minimizeβ n X i,j=1 Wij |{z} local proximity × k∆i (β) − ∆j (β)k 2 | {z } feature space distance for βm ≥ 0 and kβk1,2 = 1 Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 7
  • 11. TARA Oceans expedition Science (May 2015) - Studies on: I eukaryotic plankton diversity [de Vargas et al., 2015], I ocean viral communities [Brum et al., 2015], I global plankton interactome [Lima-Mendez et al., 2015], I global ocean microbiome [Sunagawa et al., 2015], I . . . . → datasets from different types and different sources analyzed separately. Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 8
  • 12. Integrating TARA Oceans datasets I For all compositional datasets, include phylogeny information rather than standard compositional transformations (CLR and alike): weighted Unifrac distance I Perform PCA (could have been clustering, linear model, . . . ) in the feature space. + combine with a shuffling approach to identify most influencing variables Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 9
  • 13. Application to TARA oceans Main facts I Oceans typology related to longitude I Rhizaria abundance structure the differences, especially between Arctic Oceans and Pacific Oceans Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 10
  • 14. What’s next? Improve interpretability Select variables the most relevant for a given kernel I built on works that select variables in X to compute a kernel Kx best to predict Y ∈ R [Allen, 2013, Grandvalet and Canu, 2002] I extended to unsupervised (exploratory) learning and arbitrary outputs (kernel outputs, including multiple output regression) Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 11
  • 15. What’s next? Improve interpretability Select variables the most relevant for a given kernel I built on works that select variables in X to compute a kernel Kx best to predict Y ∈ R [Allen, 2013, Grandvalet and Canu, 2002] I extended to unsupervised (exploratory) learning and arbitrary outputs (kernel outputs, including multiple output regression) Continuous relaxation of a discrete problem non-convex and non-smooth optimization solved with proximal gradient descent Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 11
  • 16. UKFS performances lapl SPEC MCFS NDFS UDFS Autoenc. UKFS “Carcinom” (n = 174, p = 9 182) ACC 164.02 106.52 184.17 200.88 138.48 143.13 206.55 COR 28.14 30.75 29.56 27.49 30.30 33.18 24.75 CPU 0.25 2.47 11.69 6,162 99,138 > 4 days 326 “Glioma” (n = 50, p = 4 434) ACC 166.31 140.72 172.78 147.77 147.50 132.76 178.57 COR 81.70 70.70 76.43 68.02 72.33 45.96 52.14 CPU 0.02 0.63 1.05 368 2,636 ∼ 12h 23.74 “Koren” (n = 43, p = 980) ACC 172.90 225.25 233.94 263.04 263.48 239.76 242.39 COR 48.18 52.34 49.94 48.48 48.69 32.60 47.77 CPU 0.01 0.07 1.11 5.88 9.70 ∼ 30 min 10.69 [Brouard et al., 2021] I non redundant features I can incorporate information on relations between variables I works also well for multiple output regression and kernel output regression (tested on covid19 time series) Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 12
  • 17. What about deep learning? I Cons: computational cost; data quantity requirement (observations); not easily interpretable; mostly designed for numeric inputs I Pros: can be good for prediction Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 13
  • 18. What about deep learning? I Cons: computational cost; data quantity requirement (observations); not easily interpretable; mostly designed for numeric inputs I Pros: can be good for prediction [Grattarola and Alippi, 2020] Ex: information on relations between genes + expression to predict a phenotype Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 13
  • 19. Can we predict AND understand? CID1 CID2 CID3 8 semaines ≈ 6 mois Objectif : Perte > 8% du poids 5 groupes avec des régimes différents Légende : CID : Clinical Investigation Day tissu adipeux prélevements sanguins - quantification des transcripts provenant du tissu adipeux mesurées par QuantSeq - données phénotypiques et cliniques Régime hypocalorique Phase de suivi pondéral Données : [Jin et al., 2020] ⇒ outcome of metabolic syndrom Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 14
  • 20. Thank you for your attention! Questions? Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 15
  • 21. References (unofficial) Beamer template made with the help of Thomas Schiex: https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae Allen, G. I. (2013). Automatic feature selection via weighted kernels and regularization. Journal of Computational and Graphical Statistics, 22(2):284–299. Brouard, C., Mariette, J., Flamary, R., and Vialaneix, N. (2021). Feature selection for kernel methods in systems biology. In preparation. Brum, J., Ignacio-Espinoza, J., Roux, S., Doulcier, G., Acinas, S., Alberti, A., Chaffron, S., Cruaud, C., de Vargas, C., Gasol, J., Gorsky, G., Gregory, A., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B., Schwenck, S., Speich, S., Dimier, C., Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker, P., Karsenti, E., and Sullivan, M. (2015). Patterns and ecological drivers of ocean viral communities. Science, 348(6237). de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, P., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horák, A., Jaillon, O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F., Zingone, A., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Acinas, S., Bork, P., Bowler, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M. E., Speich, S., Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., and Karsenti, E. (2015). Eukaryotic plankton diversity in the sunlit ocean. Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 15
  • 22. Science, 348(6237). Foissac, S., Djebali, S., Munyard, K., Vialaneix, N., Rau, A., Muret, K., Esquerre, D., Zytnicki, M., Derrien, T., Bardou, P., Blanc, F., Cabau, C., Crisci, E., Dhorne-Pollet, S., Drouet, F., Faraut, T., Gonzáles, I., Goubil, A., Lacroix-Lamande, S., Laurent, F., Marthey, S., Marti-Marimon, M., Mormal-Leisenring, R., Mompart, F., Quere, P., Robelin, D., SanCristobal, M., Tosser-Klopp, G., Vincent-Naulleau, S., Fabre, S., Pinard-Van der Laan, M.-H., Klopp, C., Tixier-Boichard, M., Acloque, H., Lagarrigue, S., and Giuffra, E. (2019). Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biology, 17:108. Grandvalet, Y. and Canu, S. (2002). Adaptive scaling for feature selection in SVMs. In Becker, S., Thrun, S., and Obermayer, K., editors, Proceedings of Advances in Neural Information Processing Systems (NIPS 2002), pages 569–576. MIT Press. Grattarola, D. and Alippi, C. (2020). Graph neural networks in TensorFlow and Keras with Spektral. In Proceedings of the Graph Representation Learning and Beyond – ICML 2020 Workshop. Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., and Tang, J. (2020). Graph structure learning for robust graph neural networks. In Gupta, R., Liu, Y., Tang, J., and Prakash, B. A., editors, Proceedings of of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20), pages 66–74, New York, NY, USA. Association for Computing Machinery (ACM). Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994). The ACT (STATIS method). Computational Statistics and Data Analysis, 18(1):97–119. L’Hermier des Plantes, H. (1976). Structuration des tableaux à trois indices de la statistique. PhD thesis, Université de Montpellier. Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 15
  • 23. Thèse de troisième cycle. Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J., Roux, S., Vincent, F., Bittner, L., Darzi, Y., Wang, B., Audic, S., Berline, L., Bontempi, G., Cabello, A., Coppola, L., Cornejo-Castillo, F., d’Oviedo, F., de Meester, L., Ferrera, I., Garet-Delmas, M., Guidi, L., Lara, E., Pesant, S., Royo-Llonch, M., Salazar, F., Sánchez, P., Sebastian, M., Souffreau, C., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Gorsky, G., Not, F., Ogata, H., Speich, S., Stemmann, L., Weissenbach, J., Wincker, P., Acinas, S., Sunagawa, S., Bork, P., Sullivan, M., Karsenti, E., Bowler, C., de Vargas, C., and Raes, J. (2015). Determinants of community structure in the global plankton interactome. Science, 348(6237). Mariette, J. and Villa-Vialaneix, N. (2018). Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics, 34(6):1009–1015. Ritchie, M., Phipson, B., Wu, D., Hu, Y., Law, C., Shi, W., and Smyth, G. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7):e47. Sunagawa, S., Coelho, L., Chaffron, S., Kultima, J., Labadie, K., Salazar, F., Djahanschiri, B., Zeller, G., Mende, D., Alberti, A., Cornejo-Castillo, F., Costea, P., Cruaud, C., d’Oviedo, F., Engelen, S., Ferrera, I., Gasol, J., Guidi, L., Hildebrand, F., Kokoszka, F., Lepoivre, C., Lima-Mendez, G., Poulain, J., Poulos, B., Royo-Llonch, M., Sarmento, H., Vieira-Silva, S., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans coordinators, Bowler, C., de Vargas, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Jaillon, O., Not, F., Ogata, H., Pesant, S., Speich, S., Stemmann, L., Sullivan, M., Weissenbach, J., Wincker, P., Karsenti, E., Raes, J., Acinas, S., and Bork, P. (2015). Structure and function of the global ocean microbiome. Science, 348(6237). Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 16
  • 24. Technical details: Unsupervised Kernel Feature Selection (UKFS) Reduced kernel for X ∈ Rp Tune weights w ∈ {0, 1}p such that the kernel based on w · X is very similar to the kernel based on X Optimization problem argmin w∈{0,1}p kKw x − Kx k2 F s.t p X j=1 wj ≤ d Continuous relaxation (non-convex and non-smooth) w∗ := argmin w∈(R+)p kKw x − Kx k2 F + λkwk1, solved with proximal gradient descent Statistics & ML for high throughput data integration 31 mars 2021 / Séminaire MP DIGIT-BIO / Nathalie Vialaneix p. 16