Boosting probabilistic graphical model inference by incorporating prior knowledge from multiple sources
1. Praveen, Paurush, and Holger Fröhlich. "Boosting probabilistic
graphical model inference by incorporating prior knowledge
from multiple sources." PloS one 8.6 (2013): e67410.
@St_Hakky
これは論文を読んだ際のまとめであり、私が書いた論文ではありません。
2. Introduction
• Graphical models is useful for extracting meaningful
biological insights from experimental data in life science
research.
• (Dynamic) Bayesian Networks
• Gaussian Graphical Models
• Poisson Graphical Models
• And so many other methods…
• These models can infer features of cellular networks in a
data driven manner.
3. Problem
• Network inference from experimental data is
challenging because of …
• the typical low signal to noise ratio
• High dimensional
• Low number of replicates and noisy measurements(the
sample size is very small)
• The inference of regulatory network from such data is
hence challenging and often fails to reach the desired
level of accuracy.
4. How to deal with this problem?
• One can either work at experimental level by
increasing the sample size.
• This is the best way, but practically this is very difficult.
• The limitation of cost and time.
• We can deal with this problem at inference level
and we will consider about it.
5. There are so many data in Bio
• There are many knowledge in Bio : pathway
databases [6–8], Gene Ontology [9] and others.
• So, Integrating known information from databases
and biological literature as prior knowledge thus
appears to be beneficial.
6. Integrating heterogenous information into the
learning process is not straight forward.
• However, biological knowledge covers many
different aspects and is widely distributed across
multiple knowledge resources.
• So, integrating heterogenous information into the
learning process is not straight forward.
7. The usuful databases
• Pathway databases [6–8]
• Gene Ontology [9]
• Promoter sequences
• Protein-protein interactions
• Evolutionary information
• KEGG pathway
• GO annotation
8. How to deal with the problem at
inference level
• From one information
• Generate sample data
• Integrating one or more particular information into
learning process
• Gene regulatory networks were inferred from a
combination of gene expression data with transcription
factor binding motifs
• promoter sequences, protein-protein interactions,
evolutionary information, KEGG pathways, GO anotation
9. Several approaches have been
proposed
• On the technical side several approaches for
integrating prior knowledge into the inference
of probabilistic graphical models have been
published.
10. Prior Research
• In [16] and [17] the authors only generate candidate structures with
significance above a certain threshold according to prior knowledge.
• [16] Larsen, Peter, et al. "A statistical method to incorporate biological knowledge for
generating testable novel gene regulatory interactions from microarray
experiments." BMC bioinformatics 8.1 (2007): 1.
• [17] Almasri, Eyad, et al. “Incorporating literature knowledge in Bayesian network for
inferring gene networks with gene expression data.” International Symposium on
Bioinformatics Research and Applications. Springer Berlin Heidelberg, 2008.
• Another idea is to introduce a probabilistic Bayesian prior over network
structures. [18] introduced a prior for individual edges based on an a-
priori assumed degree of belief.
• [18] Fröhlich, Holger, et al. “Large scale statistical inference of signaling pathways
from rnai and microarray data.” BMC bioinformatics 8.1 (2007): 1.
11. Prior Research
• [19] describes a more general set of priors, which can also capture
global network properties, such as scale-free behavior.
• [19] Mukherjee, Sach, and Terence P. Speed. "Network inference using informative
priors." Proceedings of the National Academy of Sciences 105.38 (2008): 14313-
14318.
• [20] use a similar form of prior as [18], but additionally combine
multiple information sources via a linear weighting scheme.
• Werhli, Adriano V., and Dirk Husmeier. "Reconstructing gene regulatory networks
with Bayesian networks by combining expression data with multiple sources of prior
knowledge." Statistical applications in genetics and molecular biology 6.1 (2007).
12. Prior Research
• [21] treat different information sources as statistically independent,
and consequently the overall prior is just the product over the
priors for the individual information sources.
• [21] Gao, Shouguo, and Xujing Wang. "Quantitative utilization of prior biological knowledge in the Bayesian
network modeling of gene expression data." BMC bioinformatics 12.1 (2011): 1.
13. We focus on…
• The focus of this paper is on construction of
consensus priors from multiple, heterogenous
knowledge sources.
14. Proposal Methods
• Latent Factor Model(LFM)
• LFM relies on the idea of a generative process for the
individual information sources and uses Bayesian
inference to estimate a consensus prior.
• Noisy-OR
• The second model integrates different information
sources via a Noisy-OR gate.
15. Materials and Methods
• 1.1:Edge-wise Priors for Data Driven Network
Inference.
• 1.2 : Latent Factor Model (LFM)
• 1.3 : Noisy-OR Model (NOM)
• 1.4 : Information Sources
16. Edge-wise Priors for Data Driven
Network Inference
• D : experimental data
• Φ : network graph (represented by an m×m
adjacency matrix) to infer from data.
• Bayes rule :
17. Edge-wise Priors for Data Driven
Network Inference
• P(Φ) is a prior. We assume P(Φ) can be
decomposed into
• Φ𝑖𝑗 is a matrix of prior edge confidence.
• Φ𝑖𝑗 = 1 : high prior degree of belief in the
existence of the edge i→j
18. Our purpose
• Our purpos is to compile Φ𝑖𝑗 in a consistent
manner from n available resources.
• 𝑋(1), 𝑋(2), , , 𝑋(𝑛) ∶ we have n edge confidence
matrix with n information matrix.
19. Latent Factor Model(LFM)
• LFM is based on the idea that the prior information
encoded in matrices 𝑋(1), 𝑋(2), , , 𝑋(𝑛) all originate
from the true but unknown network Φ.
20. Latent Factor Model(LFM)
• We use this notion to conduct joint Bayesian
inference on W as well as additional
parameters 𝜃 = (α, β) given 𝑋(1)
, 𝑋(2)
, , , 𝑋(𝑛)
• The idea behind this equation is that we can identify
Φ with the posterior 𝑃 Φ, 𝜃 𝑋 1 , 𝑋 2 , , , 𝑋 𝑛
21. Latent Factor Model(LFM)
• The entries of each matrix 𝑋(𝑘) can be assumed to
follow beta distribution.
• 詳しい説明は省きますが、αとβの値をいじれるの
で、それぞれの情報源の重要度を独立に決めるこ
とができます
22. MCMC to get Φ and 𝜃 = (α, β)
• Employ an adaptive Markov Chain Monte Carlo
strategy to learn the latent variable Φ together
with parameters 𝜃 = (α, β) .
• In network space MCMC moves are edge insertion,
deletion and reversal.
• In parameter space α and β are adapted on log-scale
using a multivariate Gaussian transition kernel.
24. Noisy-OR model(NOM)
• The NOM represents a non-deterministic
disjunctive relation between an effect.
• Its possible causes and has been extensively used in
artificial intelligence.
25. Noisy-OR model(NOM)
• The Noisy-OR principle is governed by two
hallmarks:
• First, each cause has a probability to produce the effect.
• Second, the probability of each cause being sufficient to
produce the effect is independent of the presence of
other causes.
26. Noisy-OR model(NOM)
• In our case 𝑋(1), 𝑋(2), , , 𝑋(𝑛) are interpreted as
causes and Φ𝑖𝑗 as effect. The link between both is
given by
• Φ𝑖𝑗 = 1 : high prior degree of belief in the
existence of the edge i→j.
27. LFM vs NOM
• LFM
• A high level of agreement between information sources
is required in order to achieve high values in Φ
• NOM
• In consequence Φ𝑖𝑗 becomes close to 1, if the edge i→j
has a high confidence in at least one information source,
because then the product gets close to 0.
28. NOM.RNK
• In addition to NOM, we also experimented with a
variant based on relative ranks.
29. Information Sources
• We use the following information as prior
information
• GO annotation
• Pathway Databases
• KEGG, PathwayCommons
• Protain domain annotation
• InterPro
• Protatin domain interaction
• DOMINE
30. GO annotation
• Interacting proteins often function in the same
biological process.
• So, two proteins acting in the same biological
process are more likely to interact than two
proteins involved in different processes.
• We used the default similarity measure for gene
products implemented in the R-package GOSim.
31. Pathway Databases
• KEGG
• R-package : KEGGgraph
• PathwaysCommons database
• To compute a condence value for each interaction
between a pair of genes/proteins we can work at the
level of this graph in different ways:
• Shortest path distance between the two entities
• Dijkstra’s algorithm
• R-package : RBGL
• In this paper, we use this one because we observed that the use of
the diffusion kernel in place of the shortest path measure did not
affect the results much.
• Diffusion kernels
• R-package : pathClass
32. Protein domain annotation
• Protein domain annotation was compared on the
basis of a binary vector representation via the
cosine similarity.
• proteins with similar domains are more likely to act
in similar biological pathways.
• The condence of interaction between two proteins
can thus be seen as a function of the similarity of
the domain annotations of proteins.
33. Protein domain annotation
• Protein domain annotation can be retrieved from
the Inter-Pro database.
• A “1" in a component thus indicates that the
protein is annotated with the corresponding
domain. Otherwise a “0" is filled in.
• The similarity between two binary vectors u, v
(domain signatures) is presented in terms of the
cosine similarity
34. Protatin domain interaction
• The relative frequency of interacting protein
domain pairs was taken as another confidence measure
for an edge a-b.
• Two proteins are more likely to interact if they contain
domains, which can potentially interact.
• The DOMINE database collates known and predicted
domain-domain interactions
• where H is the number of hit pairs found in the
DOMINE database and DA and DB are the the number of
domains in proteins A and B, respectively
35. Results
• 2.1 : Correlation of Prior Edge Confidences with
True Biological Network
• Network Sampling
• Simulated Information Sources
• Weighting of Information Sources
• Real Information Sources
• 2.2 : Enhancement of Data Driven Network
Reconstruction Accuracy
• Simulated Data
• Application to Breast Cancer
• Application to Yeast Heat-Shock Network
36. Results
• 2.1 : Correlation of Prior Edge Confidences with
True Biological Network
• Network Sampling
• Simulated Information Sources
• Weighting of Information Sources
• Real Information Sources
• 2.2 : Enhancement of Data Driven Network
Reconstruction Accuracy
• Simulated Data
• Application to Breast Cancer
• Application to Yeast Heat-Shock Network
38. Results
• 2.1 : Correlation of Prior Edge Confidences with
True Biological Network
• Network Sampling
• Simulated Information Sources
• Weighting of Information Sources
• Real Information Sources
• 2.2 : Enhancement of Data Driven Network
Reconstruction Accuracy
• Simulated Data
• Application to Breast Cancer
• Application to Yeast Heat-Shock Network
39. Enhancement of Data Driven
Network Reconstruction Accuracy
• We next investigated, in how far our priors could
enhance the reconstruction performance of
Bayesian Networks learned from data.
• From the set of best fitting network structures the
one with minimum BIC value was selected.
40. Network reconstruction for the
breast cancer data
• (Right) Without using any prior
• (Left) Using the NOM prior
• Black edges could be verified with established literature knowledge
• Grey edges could not be verified.
42. Yeast heat-shock response network obtained
via Bayesian network reconstruction
• (a) Network without any prior knowledge
• (b) The gold standard network from YEASTRACT database
• (c) Network reconstructed with prior knowledge (NOM).
43. Reconstruction performance of Yeast heat-shock
response network with Bayesian Networks and
different priors (NP = No Prior).
44. Conclusion
• We propose two methods:
• Latent Factor Model(LFM)
• Noisy-OR model(NOM)
• Result :
• Both of our models yielded priors which were
significantly closer to the true biological network than
competing methods.
• They could significantly enhance the reconstruction
performance of Bayesian Networks compared to a
situation without any prior.
45. Conclusion
• Our methods were also superior to purely using
STRING edge confidence scores as prior information.
• The proposal methods are robust for network size.
They can manage small , medium, large network.
• The LFM’s computational cost is higher than NOM, so we
should use NOM to large network.
Editor's Notes
This specifically implies that direct correlations between edge confidences across matrices can be explained by this hidden dependency. In other words W is a latent factor explaining correlations between the X (k)