SlideShare a Scribd company logo
1 of 29
Download to read offline
Inference of species interaction networks from
abundances
Rapha¨elle Momal
Supervision: S. Robin1
and C. Ambroise2
1UMR AgroParisTech / INRA MIA-Paris
2LaMME, Evry
September 26th, 2019
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 1 / 23
The thesis project
In a few words
A project using mathematics, statistical modelling and machine learning
techniques for applications in microbiology, metagenomics, or ecology.
Direction:
St´ephane Robin Chistophe Ambroise
Supports:
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 2 / 23
The thesis project
Network example in ecology
Pocock et. al 2012
Tool to better
understand species
interactions,
eco-systems
organizations
Allows for resilience
analyses, pathogens
control, ecosystem
comparison, response
prediction...
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 3 / 23
The thesis project
Aim of network inference from abundance data
date site
apr93 km03
apr93 km03
apr93 km03
apr93 km03
apr93 km17
apr93 km17
...
...
EFI ELA GDE GME HFA
71 1 5 6 0
118 2 3 0 0
69 0 6 2 0
56 0 0 0 0
0 1 1 0 0
0 0 2 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(a) covariates X (b) species abundances Y (c) inferred network
Data sample from the Fatala river dataset (ade4 R package).
Unknown underlying structure
Unobserved interaction data
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 4 / 23
Statistical modelling The dependency structure
Graphical models: a statistical framework for conditional
dependence
Example:
A1
A2
A3
A4
Connected: all variables are
dependant
Direct dependence or
conditional independence
A4 is independent from (A1, A3)
conditionally on A2
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 5 / 23
Statistical modelling The dependency structure
Graphical models: a statistical framework for conditional
dependence
Example:
A1
A2
A3
A4
Connected: all variables are
dependant
Direct dependence or
conditional independence
A4 is independent from (A1, A3)
conditionally on A2
P(A1, . . . , Ap) ∝
C∈CG
ψC (AC )
where CG = set of maximal cliques of G.
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 5 / 23
Statistical modelling The observed counts
P N model
Yij ∼ P (exp(oij + xi θj + Zij)) .
A latent variable model
easy handling of multi-variate data, offsets and covariates
Random effects Z add dependence among species. Classically (Aitchison
and Ho, 1989):
Z ∼ N(0, Σ)
We foster sparsity with a mixture of tree structures:
Z ∼ p(T)N(0, ΣT ), T ∼
jk
βjk/B
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 6 / 23
Inference EM algorithm
Maximum likelihood with hidden data
observations Y
hidden parameters H
⇒ log p(Y ) intractable.
EM algorithm maximizes a surrogate for the log-likelihood :
Q = E[log p(Y , H)|Y ] = log p(Y , h)p(h|Y )dh
In most cases the conditional density p(h|Y ) is intractable.
Variational EM (VEM) resorts to a proxy q(h) = ˜p(h|Y ).
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 7 / 23
Inference EM algorithm
Two hidden quantities
Our model includes two hidden layers of parameters. We need to compute
conditional probabilities:
p(T|Y ): computationally complex but tractable thanks to an
algebraic mathematical tool (E: Kirshner (2008), M: Meil˘a and
Jaakkola (2006)).
p(Z|Y ): no close form, a VEM gives ˆΣ and ˆθ (VEM: Chiquet et al.
(2017)).
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 8 / 23
Inference Using trees
Mixture of trees: sparse and efficient
Sparse structures:
#Gp = 2
p(p−1)
2 reduced to #Tp = p(p−2)
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 9 / 23
Inference Using trees
Mixture of trees: sparse and efficient
Sparse structures:
#Gp = 2
p(p−1)
2 reduced to #Tp = p(p−2)
Suitable algebraic tool:
Matrix tree theorem (Chaiken and Kleitman, 1978)
T∈T (k,l)∈T
ψk,l (Y ) = det(Lψ(Y )) → Θ(p3
)
Approach: infer the network by averaging spanning trees
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 9 / 23
Inference Using trees
Concept of tree averaging
Z1 Z2
Z3Z4
P{T = t1|Y }
Z1 Z2
Z3Z4
P{T = t2|Y }
Z1 Z2
Z3Z4
P{T = t3|Y }
Z1 Z2
Z3Z4
P{T = t4|Y }
...
Compute edge
probabilities: Z1 Z2
Z3Z4
P{(j, k) ∈ T|Y }
Thresholding
probabilities:
Z1 Z2
Z3Z4
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 10 / 23
Inference Using trees
EMtree algorithm
Input: Abundance data, covariates, offsets
1rst step: VEM algorithm to fit PLN model ⇒ ˆθ, ˆΣZ .
2nd step: EM algorithm to update the βjk ⇒ conditional probabilities
for all edges.
Yij ∼ P exp(oij + xi θj + Zij ) .
Z ∼ p(T)N(0, ΣT ), T ∼
jk
βjk/B
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 11 / 23
Inference Using trees
EMtree algorithm
Input: Abundance data, covariates, offsets
1rst step: VEM algorithm to fit PLN model ⇒ ˆθ, ˆΣZ .
2nd step: EM algorithm to update the βjk ⇒ conditional probabilities
for all edges.
Thresholding: Select edges with probability above the probability of
edges in a tree drawn uniformly (2/p)
Resampling: Strengthen the results: only edges selected in more than
80% of S sub-samples are kept.
Available for download at https://github.com/Rmomal/EMtree
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 11 / 23
Application Fatala River fishes
Inferred networks
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 12 / 23
Evaluating performances
Evaluation strategy
Alternatives:
Two methods on transformed counts, no covariates:
SpiecEasi algorithm Kurtz et al. (2015)
gCoda Fang et al. (2017)
One taking raw counts and covariates:
MInt Biswas et al. (2016) (uses PLN model)
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 13 / 23
Evaluating performances
Evaluation strategy
Alternatives:
Two methods on transformed counts, no covariates:
SpiecEasi algorithm Kurtz et al. (2015)
gCoda Fang et al. (2017)
One taking raw counts and covariates:
MInt Biswas et al. (2016) (uses PLN model)
Simulation design:
1 Choose G and define ΣG accordingly
2 Sample count data Y from P N(X, ΣG )
3 Infer the network with EMtree, SpiecEasi, gCoda, and MInt
4 Compare results with presence/absence of edges (FDR, AUC)
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 13 / 23
Evaluating performances Results
Difficulty level
False Discovery Rate (FDR): how many false edges there is among what is detected ?
ratio: number of detections over the number of true edges
EMtree is a sparser approach than MInt
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 14 / 23
Evaluating performances Results
Network density
Area under the (ROC) curve (AUC): ”how good is a classifier to rank true positives
higher”
100 observations, 20 species:
Effect of graph density on the evolution of AUC median and inter-quartile intervals in Erd¨os and
Cluster structures.
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 15 / 23
Evaluating performances Results
To be published soon
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 16 / 23
Conclusion
Contributions:
Formal probabilistic model for network inference from
count data
R package: https://github.com/Rmomal/EMtree
Preprint: Tree-based Reconstruction of Ecological Network from
Abundance Data. https://arxiv.org/pdf/1905.02452.pdf
Perspectives:
Sign and strength of interactions according to graphical
models theory
Missing major actor (species/covariates)
More collaborations with experts in macro-ecology field
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 17 / 23
Thank you
Contact :
email raphaelle.momal@agroparistech.fr
Web Rmomal.github.io
Twitter @MomalRaphaelle
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 18 / 23
References I
Aitchison, J. and Ho, C. (1989). The multivariate Poisson-log normal distribution. Biometrika, 76(4):643–653.
Biswas, S., McDonald, M., Lundberg, D. S., Dangl, J. L., and Jojic, V. (2016). Learning microbial interaction networks from
metagenomic count data. Journal of Computational Biology, 23(6):526–535.
Chaiken, S. and Kleitman, D. J. (1978). Matrix tree theorems. Journal of combinatorial theory, Series A, 24(3):377–381.
Chiquet, J., Mariadassou, M., and Robin, S. (2017). Variational inference for probabilistic Poisson PCA. Technical report,
arXiv:1703.06633. to appear in Annals of Applied Statistics.
Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on
Information Theory, 14(3):462–467.
Fang, H., Huang, C., Zhao, H., and Deng, M. (2017). gcoda: conditional dependence network inference for compositional data.
Journal of Computational Biology, 24(7):699–708.
Kirshner, S. (2008). Learning with tree-averaged densities and distributions. In Advances in Neural Information Processing
Systems, pages 761–768.
Kurtz, Z. D., M¨uller, C. L., Miraldi, E. R., Littman, D. R., Blaser, M. J., and Bonneau, R. A. (2015). Sparse and
compositionally robust inference of microbial ecological networks. PLoS computational biology, 11(5):e1004226.
Meil˘a, M. and Jaakkola, T. (2006). Tractable bayesian learning of tree belief networks. Statistics and Computing, 16(1):77–92.
Meil˘a, M. and Jordan, M. I. (2000). Learning with mixtures of trees. Journal of Machine Learning Research, 1:1–48.
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 19 / 23
Conditional probability computation
Kirchhoff’s theorem (matrix tree, Aitchison and Ho (1989))
For all W = (akl )k,l a symmetric matrix, the corresponding Laplacian Q(W ) is defined
as follows:
Quv (W ) =
−auv 1 ≤ u < v ≤ n
n
i=1 avi 1 ≤ u = v ≤ n.
Then for all u et v:
|Q∗
uv (W )| =
T∈T {k,l}∈ET
akl
P((k, l) ∈ T|Z) =
T∈T :(k,l)∈T
P(T|Z) =
(k,l)∈T P(T)P(Z|T)
T P(T)P(Z|T)
= 1 −
|Q∗
uv (βΨ−kl
)|
|Q∗
uv (βΨ)|
= τkl
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 19 / 23
Tree structured data
Data dependency structure relies on a tree
Likelihood factorizes on nodes and edges
(Chow and Liu, 1968):
P(Z|T) =
d
j=1
P(Zj )
k,l∈T
ψkl (Z) ,
Where
ψkl (Z) =
P(Zk, Zl )
P(Zk) × P(Zl )
.
Rmq : with standardised gaussian data, ˆΨ = [ ˆψkl ] ∝ (1 − ˆρZ
2
)−1/2
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 20 / 23
Direct EM algorithm ?
Complete likelihood :
P(Y , Z, T) = P(T) × P(Z|T) × P(Y |Z)
log(P(Y , Z, T)) =
k,l
1{(k,l)∈T}(log(βkl ) + log(ψkl (Z))) − log(B)
+
k
(log(P(Zk )) + log(P(Yk |Zk )))
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 21 / 23
Direct EM algorithm ?
Complete likelihood :
P(Y , Z, T) = P(T) × P(Z|T) × P(Y |Z)
log(P(Y , Z, T)) =
k,l
1{(k,l)∈T}(log(βkl ) + log(ψkl (Z))) − log(B)
+
k
(log(P(Zk )) + log(P(Yk |Zk )))
Conditional expectation :
Eθ[log(P(Y , Z, T))|Y ] =
k,l∈V
P((k, l) ∈ T|Y ) log(βkl ) + E[1{(k,l)∈T}log(ψkl (Z)|Y )]
+
k
E[log(P(Zk ))|Y ] + E[log(P(Yk |Zk ))|Y ] − log(B)
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 21 / 23
M step
M step
Goal : optimization of weights βkl .
argmax
βkl



k,l∈V
τkl (log(βkl ) + log(ψkl )) − log(B) +
k
log(P(Zk))



With high combinatorial complexity of B =
T∈T k,l∈T
βkl
How to compute
∂B
∂βkl
?
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 22 / 23
M step
βkl update
A result from Meil˘a Meil˘a and Jordan (2000)
Inverting a minor of the laplacien Q, we define M :



Muv = [Q∗−1
]uu + [Q∗−1
]vv − 2[Q∗−1
]uv u, v < n
Mnv = Mvn = [Q∗−1
]vv v < n
Mvv = 0.
On peut montrer que :
∂|Q∗
uv (W )|
∂βkl
= Mkl × |Q∗
uv (W )|
∂Eθ[log(P(Z, T))|Z]
∂βkl
=
τkl
βkl
−
1
B
∂B
∂βkl
ˆβh+1
kl =
τh
kl
Mh
kl
Rapha¨elle Momal WiMLDS Meetup 2019 September 26th
, 2019 23 / 23

More Related Content

Similar to Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenring, PhD Candidate in Applied Mathematics @AgroParisTech

A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...
aimsnist
 
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchApplication of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Cemal Ardil
 
Null-values imputation using different modification random forest algorithm
Null-values imputation using different modification random forest algorithmNull-values imputation using different modification random forest algorithm
Null-values imputation using different modification random forest algorithm
IAESIJAI
 

Similar to Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenring, PhD Candidate in Applied Mathematics @AgroParisTech (20)

Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN
 
A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...A Framework and Infrastructure for Uncertainty Quantification and Management ...
A Framework and Infrastructure for Uncertainty Quantification and Management ...
 
AI Math Agents
AI Math AgentsAI Math Agents
AI Math Agents
 
New Research Articles 2019 April Issue International Journal on Computational...
New Research Articles 2019 April Issue International Journal on Computational...New Research Articles 2019 April Issue International Journal on Computational...
New Research Articles 2019 April Issue International Journal on Computational...
 
PMED Transition Workshop - Virtual Tumor Populations from a Randomized Reacti...
PMED Transition Workshop - Virtual Tumor Populations from a Randomized Reacti...PMED Transition Workshop - Virtual Tumor Populations from a Randomized Reacti...
PMED Transition Workshop - Virtual Tumor Populations from a Randomized Reacti...
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Scikit-Learn in Particle Physics
Scikit-Learn in Particle PhysicsScikit-Learn in Particle Physics
Scikit-Learn in Particle Physics
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
AI that/for matters
AI that/for mattersAI that/for matters
AI that/for matters
 
An Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link DiscoveryAn Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link Discovery
 
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchApplication of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatch
 
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
 
PRIOR BUSH FIRE IDENTIFICATION MECHANISM BASED ON MACHINE LEARNING ALGORITHMS
PRIOR BUSH FIRE IDENTIFICATION MECHANISM BASED ON MACHINE LEARNING ALGORITHMSPRIOR BUSH FIRE IDENTIFICATION MECHANISM BASED ON MACHINE LEARNING ALGORITHMS
PRIOR BUSH FIRE IDENTIFICATION MECHANISM BASED ON MACHINE LEARNING ALGORITHMS
 
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
GIS-Based Adaptive Neuro Fuzzy Inference System (ANFIS) Method for Vulnerabil...
 
Support fuzzy set machines tutorial
Support fuzzy set machines tutorialSupport fuzzy set machines tutorial
Support fuzzy set machines tutorial
 
Null-values imputation using different modification random forest algorithm
Null-values imputation using different modification random forest algorithmNull-values imputation using different modification random forest algorithm
Null-values imputation using different modification random forest algorithm
 
AIAA Future of Fluids 2018 Balaji
AIAA Future of Fluids 2018 BalajiAIAA Future of Fluids 2018 Balaji
AIAA Future of Fluids 2018 Balaji
 

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 

Reconstruction of ecological interaction networks by Raphaëlle Momal-Leisenring, PhD Candidate in Applied Mathematics @AgroParisTech

  • 1. Inference of species interaction networks from abundances Rapha¨elle Momal Supervision: S. Robin1 and C. Ambroise2 1UMR AgroParisTech / INRA MIA-Paris 2LaMME, Evry September 26th, 2019 Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 1 / 23
  • 2. The thesis project In a few words A project using mathematics, statistical modelling and machine learning techniques for applications in microbiology, metagenomics, or ecology. Direction: St´ephane Robin Chistophe Ambroise Supports: Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 2 / 23
  • 3. The thesis project Network example in ecology Pocock et. al 2012 Tool to better understand species interactions, eco-systems organizations Allows for resilience analyses, pathogens control, ecosystem comparison, response prediction... Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 3 / 23
  • 4. The thesis project Aim of network inference from abundance data date site apr93 km03 apr93 km03 apr93 km03 apr93 km03 apr93 km17 apr93 km17 ... ... EFI ELA GDE GME HFA 71 1 5 6 0 118 2 3 0 0 69 0 6 2 0 56 0 0 0 0 0 1 1 0 0 0 0 2 0 0 . . . . . . . . . . . . . . . (a) covariates X (b) species abundances Y (c) inferred network Data sample from the Fatala river dataset (ade4 R package). Unknown underlying structure Unobserved interaction data Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 4 / 23
  • 5. Statistical modelling The dependency structure Graphical models: a statistical framework for conditional dependence Example: A1 A2 A3 A4 Connected: all variables are dependant Direct dependence or conditional independence A4 is independent from (A1, A3) conditionally on A2 Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 5 / 23
  • 6. Statistical modelling The dependency structure Graphical models: a statistical framework for conditional dependence Example: A1 A2 A3 A4 Connected: all variables are dependant Direct dependence or conditional independence A4 is independent from (A1, A3) conditionally on A2 P(A1, . . . , Ap) ∝ C∈CG ψC (AC ) where CG = set of maximal cliques of G. Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 5 / 23
  • 7. Statistical modelling The observed counts P N model Yij ∼ P (exp(oij + xi θj + Zij)) . A latent variable model easy handling of multi-variate data, offsets and covariates Random effects Z add dependence among species. Classically (Aitchison and Ho, 1989): Z ∼ N(0, Σ) We foster sparsity with a mixture of tree structures: Z ∼ p(T)N(0, ΣT ), T ∼ jk βjk/B Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 6 / 23
  • 8. Inference EM algorithm Maximum likelihood with hidden data observations Y hidden parameters H ⇒ log p(Y ) intractable. EM algorithm maximizes a surrogate for the log-likelihood : Q = E[log p(Y , H)|Y ] = log p(Y , h)p(h|Y )dh In most cases the conditional density p(h|Y ) is intractable. Variational EM (VEM) resorts to a proxy q(h) = ˜p(h|Y ). Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 7 / 23
  • 9. Inference EM algorithm Two hidden quantities Our model includes two hidden layers of parameters. We need to compute conditional probabilities: p(T|Y ): computationally complex but tractable thanks to an algebraic mathematical tool (E: Kirshner (2008), M: Meil˘a and Jaakkola (2006)). p(Z|Y ): no close form, a VEM gives ˆΣ and ˆθ (VEM: Chiquet et al. (2017)). Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 8 / 23
  • 10. Inference Using trees Mixture of trees: sparse and efficient Sparse structures: #Gp = 2 p(p−1) 2 reduced to #Tp = p(p−2) Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 9 / 23
  • 11. Inference Using trees Mixture of trees: sparse and efficient Sparse structures: #Gp = 2 p(p−1) 2 reduced to #Tp = p(p−2) Suitable algebraic tool: Matrix tree theorem (Chaiken and Kleitman, 1978) T∈T (k,l)∈T ψk,l (Y ) = det(Lψ(Y )) → Θ(p3 ) Approach: infer the network by averaging spanning trees Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 9 / 23
  • 12. Inference Using trees Concept of tree averaging Z1 Z2 Z3Z4 P{T = t1|Y } Z1 Z2 Z3Z4 P{T = t2|Y } Z1 Z2 Z3Z4 P{T = t3|Y } Z1 Z2 Z3Z4 P{T = t4|Y } ... Compute edge probabilities: Z1 Z2 Z3Z4 P{(j, k) ∈ T|Y } Thresholding probabilities: Z1 Z2 Z3Z4 Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 10 / 23
  • 13. Inference Using trees EMtree algorithm Input: Abundance data, covariates, offsets 1rst step: VEM algorithm to fit PLN model ⇒ ˆθ, ˆΣZ . 2nd step: EM algorithm to update the βjk ⇒ conditional probabilities for all edges. Yij ∼ P exp(oij + xi θj + Zij ) . Z ∼ p(T)N(0, ΣT ), T ∼ jk βjk/B Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 11 / 23
  • 14. Inference Using trees EMtree algorithm Input: Abundance data, covariates, offsets 1rst step: VEM algorithm to fit PLN model ⇒ ˆθ, ˆΣZ . 2nd step: EM algorithm to update the βjk ⇒ conditional probabilities for all edges. Thresholding: Select edges with probability above the probability of edges in a tree drawn uniformly (2/p) Resampling: Strengthen the results: only edges selected in more than 80% of S sub-samples are kept. Available for download at https://github.com/Rmomal/EMtree Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 11 / 23
  • 15. Application Fatala River fishes Inferred networks Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 12 / 23
  • 16. Evaluating performances Evaluation strategy Alternatives: Two methods on transformed counts, no covariates: SpiecEasi algorithm Kurtz et al. (2015) gCoda Fang et al. (2017) One taking raw counts and covariates: MInt Biswas et al. (2016) (uses PLN model) Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 13 / 23
  • 17. Evaluating performances Evaluation strategy Alternatives: Two methods on transformed counts, no covariates: SpiecEasi algorithm Kurtz et al. (2015) gCoda Fang et al. (2017) One taking raw counts and covariates: MInt Biswas et al. (2016) (uses PLN model) Simulation design: 1 Choose G and define ΣG accordingly 2 Sample count data Y from P N(X, ΣG ) 3 Infer the network with EMtree, SpiecEasi, gCoda, and MInt 4 Compare results with presence/absence of edges (FDR, AUC) Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 13 / 23
  • 18. Evaluating performances Results Difficulty level False Discovery Rate (FDR): how many false edges there is among what is detected ? ratio: number of detections over the number of true edges EMtree is a sparser approach than MInt Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 14 / 23
  • 19. Evaluating performances Results Network density Area under the (ROC) curve (AUC): ”how good is a classifier to rank true positives higher” 100 observations, 20 species: Effect of graph density on the evolution of AUC median and inter-quartile intervals in Erd¨os and Cluster structures. Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 15 / 23
  • 20. Evaluating performances Results To be published soon Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 16 / 23
  • 21. Conclusion Contributions: Formal probabilistic model for network inference from count data R package: https://github.com/Rmomal/EMtree Preprint: Tree-based Reconstruction of Ecological Network from Abundance Data. https://arxiv.org/pdf/1905.02452.pdf Perspectives: Sign and strength of interactions according to graphical models theory Missing major actor (species/covariates) More collaborations with experts in macro-ecology field Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 17 / 23
  • 22. Thank you Contact : email raphaelle.momal@agroparistech.fr Web Rmomal.github.io Twitter @MomalRaphaelle Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 18 / 23
  • 23. References I Aitchison, J. and Ho, C. (1989). The multivariate Poisson-log normal distribution. Biometrika, 76(4):643–653. Biswas, S., McDonald, M., Lundberg, D. S., Dangl, J. L., and Jojic, V. (2016). Learning microbial interaction networks from metagenomic count data. Journal of Computational Biology, 23(6):526–535. Chaiken, S. and Kleitman, D. J. (1978). Matrix tree theorems. Journal of combinatorial theory, Series A, 24(3):377–381. Chiquet, J., Mariadassou, M., and Robin, S. (2017). Variational inference for probabilistic Poisson PCA. Technical report, arXiv:1703.06633. to appear in Annals of Applied Statistics. Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467. Fang, H., Huang, C., Zhao, H., and Deng, M. (2017). gcoda: conditional dependence network inference for compositional data. Journal of Computational Biology, 24(7):699–708. Kirshner, S. (2008). Learning with tree-averaged densities and distributions. In Advances in Neural Information Processing Systems, pages 761–768. Kurtz, Z. D., M¨uller, C. L., Miraldi, E. R., Littman, D. R., Blaser, M. J., and Bonneau, R. A. (2015). Sparse and compositionally robust inference of microbial ecological networks. PLoS computational biology, 11(5):e1004226. Meil˘a, M. and Jaakkola, T. (2006). Tractable bayesian learning of tree belief networks. Statistics and Computing, 16(1):77–92. Meil˘a, M. and Jordan, M. I. (2000). Learning with mixtures of trees. Journal of Machine Learning Research, 1:1–48. Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 19 / 23
  • 24. Conditional probability computation Kirchhoff’s theorem (matrix tree, Aitchison and Ho (1989)) For all W = (akl )k,l a symmetric matrix, the corresponding Laplacian Q(W ) is defined as follows: Quv (W ) = −auv 1 ≤ u < v ≤ n n i=1 avi 1 ≤ u = v ≤ n. Then for all u et v: |Q∗ uv (W )| = T∈T {k,l}∈ET akl P((k, l) ∈ T|Z) = T∈T :(k,l)∈T P(T|Z) = (k,l)∈T P(T)P(Z|T) T P(T)P(Z|T) = 1 − |Q∗ uv (βΨ−kl )| |Q∗ uv (βΨ)| = τkl Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 19 / 23
  • 25. Tree structured data Data dependency structure relies on a tree Likelihood factorizes on nodes and edges (Chow and Liu, 1968): P(Z|T) = d j=1 P(Zj ) k,l∈T ψkl (Z) , Where ψkl (Z) = P(Zk, Zl ) P(Zk) × P(Zl ) . Rmq : with standardised gaussian data, ˆΨ = [ ˆψkl ] ∝ (1 − ˆρZ 2 )−1/2 Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 20 / 23
  • 26. Direct EM algorithm ? Complete likelihood : P(Y , Z, T) = P(T) × P(Z|T) × P(Y |Z) log(P(Y , Z, T)) = k,l 1{(k,l)∈T}(log(βkl ) + log(ψkl (Z))) − log(B) + k (log(P(Zk )) + log(P(Yk |Zk ))) Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 21 / 23
  • 27. Direct EM algorithm ? Complete likelihood : P(Y , Z, T) = P(T) × P(Z|T) × P(Y |Z) log(P(Y , Z, T)) = k,l 1{(k,l)∈T}(log(βkl ) + log(ψkl (Z))) − log(B) + k (log(P(Zk )) + log(P(Yk |Zk ))) Conditional expectation : Eθ[log(P(Y , Z, T))|Y ] = k,l∈V P((k, l) ∈ T|Y ) log(βkl ) + E[1{(k,l)∈T}log(ψkl (Z)|Y )] + k E[log(P(Zk ))|Y ] + E[log(P(Yk |Zk ))|Y ] − log(B) Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 21 / 23
  • 28. M step M step Goal : optimization of weights βkl . argmax βkl    k,l∈V τkl (log(βkl ) + log(ψkl )) − log(B) + k log(P(Zk))    With high combinatorial complexity of B = T∈T k,l∈T βkl How to compute ∂B ∂βkl ? Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 22 / 23
  • 29. M step βkl update A result from Meil˘a Meil˘a and Jordan (2000) Inverting a minor of the laplacien Q, we define M :    Muv = [Q∗−1 ]uu + [Q∗−1 ]vv − 2[Q∗−1 ]uv u, v < n Mnv = Mvn = [Q∗−1 ]vv v < n Mvv = 0. On peut montrer que : ∂|Q∗ uv (W )| ∂βkl = Mkl × |Q∗ uv (W )| ∂Eθ[log(P(Z, T))|Z] ∂βkl = τkl βkl − 1 B ∂B ∂βkl ˆβh+1 kl = τh kl Mh kl Rapha¨elle Momal WiMLDS Meetup 2019 September 26th , 2019 23 / 23