SlideShare a Scribd company logo
Selective inference and single-cell differential analysis
Nathalie Vialaneix
nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
Club Single-Cell
February 7th, 2022
Outline
Introduction: what is selective inference and why should we bother?
Sketch of basic ideas developed to answer this issue
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 2
Standard single-cell analysis pipeline and double dipping
Image taken from [Fang et al., 2021]
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 3
Standard single-cell analysis pipeline and double dipping
Image taken from [Fang et al., 2021]
here: differential analysis
Dataset is used twice: (clustering
then differential analysis)
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 3
Why is it a problem? Example on simulations...
How can we show the problem?
I simulate dummy data with no signal (e.g., n i.i.d. observations from
Nd (0d , σ2Id ))
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 4
Why is it a problem? Example on simulations...
How can we show the problem?
I simulate dummy data with no signal (e.g., n i.i.d. observations from
Nd (0d , σ2Id ))
I perform the test procedure: clustering then differential analysis between clusters
(Wald test) and obtain p-values
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 4
Why is it a problem? Example on simulations...
How can we show the problem?
I simulate dummy data with no signal (e.g., n i.i.d. observations from
Nd (0d , σ2Id ))
I perform the test procedure: clustering then differential analysis between clusters
(Wald test) and obtain p-values
I What do we expect? Since there is no signal in the data (no true clusters so no
marker genes), p-values ∼ U[0, 1]
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 4
First question [Gao et al., 2021]
Is the average value of vector X in first cluster different of what it is in the section
cluster?
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 5
First question using a train/test approach [Gao et al., 2021]
Is the average value of vector X, in first cluster different of what it is in the second
cluster?
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 6
Second question (at the level of marker gene)
[Zhang et al., 2019]
Is the average expression of a given gene, xj , in first cluster different of what it is in
the second cluster?
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 7
Why do we have this problem?
Main idea:
Clustering “forces” separation between expression measurements whatever the true
underlying signal (or absence of signal).
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 8
Outline
Introduction: what is selective inference and why should we bother?
Sketch of basic ideas developed to answer this issue
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 9
Question 1 [Gao et al., 2021]
Denoting by D := kX(1) − X(2)k and φ a rv from χ2 (with parameters depending on
X), define a perturbed version of the data that:
I pulls clusters apart if φ > D
I push clusters together if φ < D
There is a way to obtain a valid p-value from the distribution of obtained clusters (that
depends on the rv φ).
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 10
Question 1 [Gao et al., 2021]
Is it usable? More or less...
1. either: you have a way to have a explicit description of the perturbed cluster
definition
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 11
Question 1 [Gao et al., 2021]
Is it usable? More or less...
1. either: you have a way to have a explicit description of the perturbed cluster
definition
Only available for HC in [Gao et al., 2021].
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 11
Question 1 [Gao et al., 2021]
Is it usable? More or less...
1. either: you have a way to have a explicit description of the perturbed cluster
definition
Only available for HC in [Gao et al., 2021].
2. or: you simulate the distribution (using random draws of φ)
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 11
Question 1 [Gao et al., 2021]
Is it usable? More or less...
1. either: you have a way to have a explicit description of the perturbed cluster
definition
Only available for HC in [Gao et al., 2021].
2. or: you simulate the distribution (using random draws of φ)
But you need to have plenty of time.
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 11
Question 1 [Gao et al., 2021]
Is it usable? More or less...
1. either: you have a way to have a explicit description of the perturbed cluster
definition
Only available for HC in [Gao et al., 2021].
2. or: you simulate the distribution (using random draws of φ)
But you need to have plenty of time.
The method is available as an R package: clusterpval
https://www.lucylgao.com/clusterpval/
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 11
Experiment
Data from [Zheng et al., 2017] with clustering of peripheral blood mononuclear cells
prior to sequencing (antibody-based bead enrichment + fluorescent activated cell
sorting) ⇒ ground truth
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 12
Experiment
Data from [Zheng et al., 2017] with clustering of peripheral blood mononuclear cells
prior to sequencing (antibody-based bead enrichment + fluorescent activated cell
sorting) ⇒ ground truth
Derivation of:
I negative control (selection of 600 memory T cells)
I positive control (selection of 200 memory T cells + 200 B cells + monocytes)
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 12
Experiment
Data from [Zheng et al., 2017] with clustering of peripheral blood mononuclear cells
prior to sequencing (antibody-based bead enrichment + fluorescent activated cell
sorting) ⇒ ground truth
Derivation of:
I negative control (selection of 600 memory T cells)
I positive control (selection of 200 memory T cells + 200 B cells + monocytes)
Method: clustering with HAC (3 clusters) then differential analysis (Wald test versus
their test)
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 12
Experiment
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 13
Further discussion
I extension of this approach to marker gene detection ongoing (work from Benjamin
Hivert, Boris Hejblum & Rodolphe Thiébaut)
I but extension beyond the 2-by-2 cluster comparison is still challenging as is the
estimation of a variance parameter needed for the method to work
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 14
Question 2 [Zhang et al., 2019]
Use a test based on a truncated distribution
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 15
Question 2 [Zhang et al., 2019]
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 16
Question 2 [Zhang et al., 2019]
Remarks on this approach:
I the separating hyperplane is supposed to be given ⇒ contrains the clustering
method and requires that it is performed on a separate dataset
I genes are supposed to be not correlated (very, very strong assumption...)
I method available as a python tool at
https://github.com/jessemzhang/tn_test
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 17
Experiment 1
Again... data from [Zheng et al., 2017]...
Method:
I use SEURAT for clustering (9 clusters)
I use SEURAT and TN for differential analysis between the first two clusters
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 18
Results
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 19
Experiment 2
Data from [Kolodziejczyk et al., 2015]
Impact of overclustering on results
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 20
References
Fang, R., Preissl, S., Li, Y., Hou, X., Lucero, J., Wang, X., Motamedi, A., Shiau, A. K., Zhou, X., Fangming, X., Mukamel, E. A., Zhang, K.,
Zhang, Y., Behrens, M. M., Ecker, J. R., and Ren, B. (2021).
Comprehensive analysis of single cell ATAC-seq data with SnapATAC.
Nature Communications, 12:1337.
Gao, L. L., Bien, J., and Witten, D. (2021).
Selective inference for hierarchical clustering.
Preprint arXiv 2012.02936.
Kolodziejczyk, A. A., Kim, J. K., Tsang, J. C., Ilicic, T., Henriksson, J., Natarajan, K. N., Tuck, A. C., Gao, X., Bühler, M., Liu, P., Marioni,
J. C., and Teichmann, S. A. (2015).
Single cell RNA-sequencing of pluripotent states unlock modular transcriptional variation.
Cell Stem Cell, 17(4):471–485.
Zhang, J. M., Kamath, G. M., and Tse, D. N. (2019).
Valid post-clustering differential analysis for single-cell RNA-seq.
Cell Systems, 9(4):283–392.e6.
Zheng, G. X., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W., Wilson, R., Ziraldo, S. B., Wheeler, T. D., McDermott, G. P., Zhu, J.,
Gregoy, M. T., Shuga, J., Montesclaros, L., Underwood, J. G., Masquelier, Donald A. andNishimura, S. Y., Schnall-Levin, M., Wyatt, P. W.,
Hindson, C. M., Bharadwaj, R., Wond, A., Ness, K. D., Beppu, L. W., Deeg, H. J., McFarland, C., Loeb, K. R., Valente, W. J., Ericson,
N. G., Stevens, E. A., Radich, J. p., Mikkelsen, T. S., Hindson, B. J., and Bielas, J. H. (2017).
Massively parallel digital transcriptional profiling of single cells.
Nature Communications, 8:14049.
Club Single-Cell
February 7th, 2022 / Nathalie Vialaneix
p. 20

More Related Content

What's hot

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
tuxette
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
tuxette
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
The Statistical and Applied Mathematical Sciences Institute
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
tuxette
 
Reading "Bayesian measures of model complexity and fit"
Reading "Bayesian measures of model complexity and fit"Reading "Bayesian measures of model complexity and fit"
Reading "Bayesian measures of model complexity and fit"
Christian Robert
 
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept AnalysisOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
The Statistical and Applied Mathematical Sciences Institute
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
Gael Varoquaux
 
Joint causal inference on observational and experimental data - NIPS 2016 "Wh...
Joint causal inference on observational and experimental data - NIPS 2016 "Wh...Joint causal inference on observational and experimental data - NIPS 2016 "Wh...
Joint causal inference on observational and experimental data - NIPS 2016 "Wh...
Sara Magliacane
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
eSAT Journals
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Gael Varoquaux
 
F0422052058
F0422052058F0422052058
F0422052058
ijceronline
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
tuxette
 
On the Classification of NP Complete Problems and Their Duality Feature
On the Classification of NP Complete Problems and Their Duality FeatureOn the Classification of NP Complete Problems and Their Duality Feature
On the Classification of NP Complete Problems and Their Duality Feature
ijcsit
 
Study of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsStudy of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN Algorithms
Editor IJCATR
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
Gael Varoquaux
 
4 データ間の距離と類似度
4 データ間の距離と類似度4 データ間の距離と類似度
4 データ間の距離と類似度
Seiichi Uchida
 
B colouring
B colouringB colouring
B colouringxs76250
 

What's hot (18)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
 
Reading "Bayesian measures of model complexity and fit"
Reading "Bayesian measures of model complexity and fit"Reading "Bayesian measures of model complexity and fit"
Reading "Bayesian measures of model complexity and fit"
 
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept AnalysisOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Joint causal inference on observational and experimental data - NIPS 2016 "Wh...
Joint causal inference on observational and experimental data - NIPS 2016 "Wh...Joint causal inference on observational and experimental data - NIPS 2016 "Wh...
Joint causal inference on observational and experimental data - NIPS 2016 "Wh...
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
F0422052058
F0422052058F0422052058
F0422052058
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
 
On the Classification of NP Complete Problems and Their Duality Feature
On the Classification of NP Complete Problems and Their Duality FeatureOn the Classification of NP Complete Problems and Their Duality Feature
On the Classification of NP Complete Problems and Their Duality Feature
 
Study of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsStudy of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN Algorithms
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
4 データ間の距離と類似度
4 データ間の距離と類似度4 データ間の距離と類似度
4 データ間の距離と類似度
 
B colouring
B colouringB colouring
B colouring
 

Similar to Selective inference and single-cell differential analysis

STATS 780 (Bayesian 1 way anova comparison).pdf
STATS 780 (Bayesian 1 way anova comparison).pdfSTATS 780 (Bayesian 1 way anova comparison).pdf
STATS 780 (Bayesian 1 way anova comparison).pdf
KevinLim722425
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
tuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
tuxette
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)
Duke Network Analysis Center
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disaster
Mahendra Poudel
 
Inductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleInductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their Ensemble
Giuseppe Rizzo
 
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
Dongmin Lee
 
Exercise 26 photo.docxLesson 26 Exercise File 2.savQue.docx
Exercise 26 photo.docxLesson 26 Exercise File 2.savQue.docxExercise 26 photo.docxLesson 26 Exercise File 2.savQue.docx
Exercise 26 photo.docxLesson 26 Exercise File 2.savQue.docx
gitagrimston
 

Similar to Selective inference and single-cell differential analysis (10)

STATS 780 (Bayesian 1 way anova comparison).pdf
STATS 780 (Bayesian 1 way anova comparison).pdfSTATS 780 (Bayesian 1 way anova comparison).pdf
STATS 780 (Bayesian 1 way anova comparison).pdf
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disaster
 
Inductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleInductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their Ensemble
 
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
 
PresTrojan0_1212
PresTrojan0_1212PresTrojan0_1212
PresTrojan0_1212
 
Exercise 26 photo.docxLesson 26 Exercise File 2.savQue.docx
Exercise 26 photo.docxLesson 26 Exercise File 2.savQue.docxExercise 26 photo.docxLesson 26 Exercise File 2.savQue.docx
Exercise 26 photo.docxLesson 26 Exercise File 2.savQue.docx
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
tuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
tuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
tuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
tuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
tuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
tuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
tuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
tuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
tuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
tuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
tuxette
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
tuxette
 
La famille *down
La famille *downLa famille *down
La famille *down
tuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
La famille *down
La famille *downLa famille *down
La famille *down
 

Recently uploaded

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 

Recently uploaded (20)

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 

Selective inference and single-cell differential analysis

  • 1. Selective inference and single-cell differential analysis Nathalie Vialaneix nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu Club Single-Cell February 7th, 2022
  • 2. Outline Introduction: what is selective inference and why should we bother? Sketch of basic ideas developed to answer this issue Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 2
  • 3. Standard single-cell analysis pipeline and double dipping Image taken from [Fang et al., 2021] Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 3
  • 4. Standard single-cell analysis pipeline and double dipping Image taken from [Fang et al., 2021] here: differential analysis Dataset is used twice: (clustering then differential analysis) Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 3
  • 5. Why is it a problem? Example on simulations... How can we show the problem? I simulate dummy data with no signal (e.g., n i.i.d. observations from Nd (0d , σ2Id )) Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 4
  • 6. Why is it a problem? Example on simulations... How can we show the problem? I simulate dummy data with no signal (e.g., n i.i.d. observations from Nd (0d , σ2Id )) I perform the test procedure: clustering then differential analysis between clusters (Wald test) and obtain p-values Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 4
  • 7. Why is it a problem? Example on simulations... How can we show the problem? I simulate dummy data with no signal (e.g., n i.i.d. observations from Nd (0d , σ2Id )) I perform the test procedure: clustering then differential analysis between clusters (Wald test) and obtain p-values I What do we expect? Since there is no signal in the data (no true clusters so no marker genes), p-values ∼ U[0, 1] Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 4
  • 8. First question [Gao et al., 2021] Is the average value of vector X in first cluster different of what it is in the section cluster? Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 5
  • 9. First question using a train/test approach [Gao et al., 2021] Is the average value of vector X, in first cluster different of what it is in the second cluster? Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 6
  • 10. Second question (at the level of marker gene) [Zhang et al., 2019] Is the average expression of a given gene, xj , in first cluster different of what it is in the second cluster? Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 7
  • 11. Why do we have this problem? Main idea: Clustering “forces” separation between expression measurements whatever the true underlying signal (or absence of signal). Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 8
  • 12. Outline Introduction: what is selective inference and why should we bother? Sketch of basic ideas developed to answer this issue Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 9
  • 13. Question 1 [Gao et al., 2021] Denoting by D := kX(1) − X(2)k and φ a rv from χ2 (with parameters depending on X), define a perturbed version of the data that: I pulls clusters apart if φ > D I push clusters together if φ < D There is a way to obtain a valid p-value from the distribution of obtained clusters (that depends on the rv φ). Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 10
  • 14. Question 1 [Gao et al., 2021] Is it usable? More or less... 1. either: you have a way to have a explicit description of the perturbed cluster definition Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 11
  • 15. Question 1 [Gao et al., 2021] Is it usable? More or less... 1. either: you have a way to have a explicit description of the perturbed cluster definition Only available for HC in [Gao et al., 2021]. Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 11
  • 16. Question 1 [Gao et al., 2021] Is it usable? More or less... 1. either: you have a way to have a explicit description of the perturbed cluster definition Only available for HC in [Gao et al., 2021]. 2. or: you simulate the distribution (using random draws of φ) Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 11
  • 17. Question 1 [Gao et al., 2021] Is it usable? More or less... 1. either: you have a way to have a explicit description of the perturbed cluster definition Only available for HC in [Gao et al., 2021]. 2. or: you simulate the distribution (using random draws of φ) But you need to have plenty of time. Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 11
  • 18. Question 1 [Gao et al., 2021] Is it usable? More or less... 1. either: you have a way to have a explicit description of the perturbed cluster definition Only available for HC in [Gao et al., 2021]. 2. or: you simulate the distribution (using random draws of φ) But you need to have plenty of time. The method is available as an R package: clusterpval https://www.lucylgao.com/clusterpval/ Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 11
  • 19. Experiment Data from [Zheng et al., 2017] with clustering of peripheral blood mononuclear cells prior to sequencing (antibody-based bead enrichment + fluorescent activated cell sorting) ⇒ ground truth Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 12
  • 20. Experiment Data from [Zheng et al., 2017] with clustering of peripheral blood mononuclear cells prior to sequencing (antibody-based bead enrichment + fluorescent activated cell sorting) ⇒ ground truth Derivation of: I negative control (selection of 600 memory T cells) I positive control (selection of 200 memory T cells + 200 B cells + monocytes) Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 12
  • 21. Experiment Data from [Zheng et al., 2017] with clustering of peripheral blood mononuclear cells prior to sequencing (antibody-based bead enrichment + fluorescent activated cell sorting) ⇒ ground truth Derivation of: I negative control (selection of 600 memory T cells) I positive control (selection of 200 memory T cells + 200 B cells + monocytes) Method: clustering with HAC (3 clusters) then differential analysis (Wald test versus their test) Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 12
  • 22. Experiment Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 13
  • 23. Further discussion I extension of this approach to marker gene detection ongoing (work from Benjamin Hivert, Boris Hejblum & Rodolphe Thiébaut) I but extension beyond the 2-by-2 cluster comparison is still challenging as is the estimation of a variance parameter needed for the method to work Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 14
  • 24. Question 2 [Zhang et al., 2019] Use a test based on a truncated distribution Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 15
  • 25. Question 2 [Zhang et al., 2019] Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 16
  • 26. Question 2 [Zhang et al., 2019] Remarks on this approach: I the separating hyperplane is supposed to be given ⇒ contrains the clustering method and requires that it is performed on a separate dataset I genes are supposed to be not correlated (very, very strong assumption...) I method available as a python tool at https://github.com/jessemzhang/tn_test Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 17
  • 27. Experiment 1 Again... data from [Zheng et al., 2017]... Method: I use SEURAT for clustering (9 clusters) I use SEURAT and TN for differential analysis between the first two clusters Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 18
  • 28. Results Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 19
  • 29. Experiment 2 Data from [Kolodziejczyk et al., 2015] Impact of overclustering on results Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 20
  • 30. References Fang, R., Preissl, S., Li, Y., Hou, X., Lucero, J., Wang, X., Motamedi, A., Shiau, A. K., Zhou, X., Fangming, X., Mukamel, E. A., Zhang, K., Zhang, Y., Behrens, M. M., Ecker, J. R., and Ren, B. (2021). Comprehensive analysis of single cell ATAC-seq data with SnapATAC. Nature Communications, 12:1337. Gao, L. L., Bien, J., and Witten, D. (2021). Selective inference for hierarchical clustering. Preprint arXiv 2012.02936. Kolodziejczyk, A. A., Kim, J. K., Tsang, J. C., Ilicic, T., Henriksson, J., Natarajan, K. N., Tuck, A. C., Gao, X., Bühler, M., Liu, P., Marioni, J. C., and Teichmann, S. A. (2015). Single cell RNA-sequencing of pluripotent states unlock modular transcriptional variation. Cell Stem Cell, 17(4):471–485. Zhang, J. M., Kamath, G. M., and Tse, D. N. (2019). Valid post-clustering differential analysis for single-cell RNA-seq. Cell Systems, 9(4):283–392.e6. Zheng, G. X., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W., Wilson, R., Ziraldo, S. B., Wheeler, T. D., McDermott, G. P., Zhu, J., Gregoy, M. T., Shuga, J., Montesclaros, L., Underwood, J. G., Masquelier, Donald A. andNishimura, S. Y., Schnall-Levin, M., Wyatt, P. W., Hindson, C. M., Bharadwaj, R., Wond, A., Ness, K. D., Beppu, L. W., Deeg, H. J., McFarland, C., Loeb, K. R., Valente, W. J., Ericson, N. G., Stevens, E. A., Radich, J. p., Mikkelsen, T. S., Hindson, B. J., and Bielas, J. H. (2017). Massively parallel digital transcriptional profiling of single cells. Nature Communications, 8:14049. Club Single-Cell February 7th, 2022 / Nathalie Vialaneix p. 20