Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.
Workload-aware materialization for efficient variable elimination on Bayesian networks
1. Workload-aware Materialization for
Efficient Variable Elimination on
Bayesian Networks
Cigdem Aslay1, Martino Ciaperoni2, Aristides Gionis3, Michael Mathioudakis4
Aarhus University, Denmark1,
Aalto University, Finland2,
KTH Royal Institute of Technology, Sweden3,
University of Helsinki, Finland4
2. 1
Bayesian Networks
I Important inference tool for many real-world
applications where complex probabilistic modeling
and reasoning is required
I Revolutionized the field of Artificial Intelligence by
changing its focus from logic to probability
I allowed the machines to “think” and reason probabilistically
I Turing award to Judea Pearl in 2011
3. 2
Applications in DBMS
BNs offer several advantages compared to other approximation
methods like synopses that suffer from high-dimensional data
[Getoor et al., 2001].
I Selectivity estimation
– estimate the size of a query result for optimizing
– the query execution plan and query profiling
I Approximate query processing
– avoid costly scan of tables by approximating the
– observed data distribution within some error range
I Predictive queries
– infer the attributes of future data entries
4. 3
Bayesian Networks
I A probabilistic graphical model that represents
a set of random variables and their conditional
(in-)dependencies via a DAG (Pearl, 1985)
I each node represents a random variable,
I each edge represents the dependency of the child
on the parent variable
I each node is associated with a conditional probability table
(CPT) quantifying the probability that the node takes
a particular value conditioned on the values of its parents
5. 4
Bayesian Networks
I Each variable in a BN is conditionally independent of all its
non-descendants given the value of its parents, simplifying
the joint probability distribution given by the chain rule.
Pr(A, S, E, O, R, T) = Pr(T | O, R) Pr(O | E) Pr(R | E)
Pr(E | A, S) Pr(A) Pr(S).
6. 5
Querying a Bayesian Network
I Query: among young females who commute by car, what
fraction live in big cities?
Pr(R = big | A = young, S = female, T = car)
7. 6
Querying a Bayesian Network
I Inference in Bayesian networks is NP-hard [Koller et al,
2009]
– reduction from 3-SAT
I Approximate inference is also NP-hard [Koller et al, 2009]
I Efficiency can be compromised due to worst-case
exponential time-complexity
8. 7
Querying a Bayesian Network
I Given a BN, w.l.o.g. we consider queries of the form
q = Pr(Xq, Yq = yq)
where Xq ⊆ X is a set of free variables and Yq ⊆ X is
a set of bound variables with corresponding values yq
I The answer to such query is a table indexed by the
combinations of values of variables Xq
9. 8
Querying a Bayesian Network
A query q can be naively answered by eliminating the set Zq of
variables that do not appear in q
Pr(Xq, Yq = yq) =
X
Zq
Pr(Xq, Yq = yq, Zq) .
I perform a join of all CPTs into a table H
I for each a ∈ Yq, select entries of H that satisfy Yq = yq
I for a variable a ∈ Zq, compute a sum over each group
of values of variables
I resulting table is the answer to query q
10. 9
Querying a Bayesian Network
I Running time of this naive approach is O(rn) on a BN
with n discrete variables taking r possible values each
I Variable elimination (VE) algorithm [Zhang et al., 1994]
improves to O nr2
, by taking advantage of the
factorization of the joint probability Pr(Xq, Yq = yq, Zq)
– given an elimination order on variables, process each
– variable A one at a time by performing natural joins
– over only the CPTs that contain the variable A
11. 10
Elimination Tree
I Elimination order gives rise to an elimination tree T
I leaf nodes of T correspond to the original CPTs
I each internal node of T corresponds to an intermediate
factor, created during the execution of VE algorithm, by the
join of the factors that form this node’s children in T
12. 11
Elimination Tree
I consider the query q = Pr(T, A = young) and the order
σ = hA, S, T, E, O, Ri
ψA(S, E; A = young) = Pr(E | A = young, S) Pr(A = young)
13. 12
Materialization of Factors
I The intermediate factors created during the execution of
Variable Elimination can be costly to compute every time
a query requests them
I the same intermediate tables can be used for the
evaluation of many different queries
I what can we do?
I precompute and materialize “useful” tables that bring the
largest computational benefit, i.e., those that are involved
in the evaluation of many expensive queries
14. 13
Materialization of Factors
I When is a materialized factor useful for a query q?
– if it is one of the factors computed during the
– evaluation of q, and
– if there is no other materialized factor that could be
– used in its place with greater cost savings
15. 14
Materialization of Factors
Definition (Partial-Cost)
The partial cost c(u) of a node u ∈ V in the elimination tree is
the computational effort required to compute the corresponding
factor given the factors of its children.
Definition (Total-Cost)
The total cost of a node u ∈ V in the elimination tree is the total
cost of computing the factor at node u, i.e.,
b(u) =
X
x∈Tu
c(x) ,
where c(x) is the partial cost of node x.
16. 15
Materialization of Factors
I for a given q, let δq(u; R) = 1 if u is useful for q w.r.t. R and
0 otherwise
I given a query workload, the total benefit of materializing R:
Definition (Benefit)
B(R) =
X
q
Pr(q)
X
u∈R
δq(u; R) b(u)
=
X
u∈R
Pr(δq(u; R) = 1) b(u)
=
X
u∈R
E[δq(u; R)] b(u) .
17. 16
Materialization of Factors
Problem (1)
Given a Bayesian network N, an elimination tree T = (V, E),
and budget K, select a set of nodes R ⊆ V to materialize,
whose total size is at most K, so as to optimize B(R).
Problem (2)
Given a Bayesian network N, an elimination tree T = (V, E),
and an integer k, select at most k nodes R ⊆ V to materialize
so as to optimize B(R).
18. 17
Materialization Algorithms
I Optimal materialization via dynamic programming
I Given an elimination tree, defined over n variables, with a
height h:
– Problem (1) admits pseudo-polynomial time DP
– algorithm which runs in O nhK2
– Problem (2) admits polynomial time DP algorithm
– which runs in O nhk2
I Benefit function is monotone and submodular
– Greedy algorithm provides (1 − 1
e )-approximation
– [Nemhauser et al., 1978, Sviridenko 2004]
19. 18
Experiments: Datasets
Table: Statistics of Bayesian networks.
Network nodes edges parameters avg. degree
MILDEW 35 46 547 K 2.63
PATHFINDER 109 195 98 K 2.96
MUNIN#1 186 273 19 K 2.94
ANDES 220 338 2.3 K 3.03
DIABETES 413 602 461 K 2.92
LINK 714 1 125 20 K 3.11
MUNIN#2 1 003 1 244 84 K 2.94
MUNIN 1 041 1 397 98 K 2.68
TPCH#1 17 17 1.5 K 2.00
TPCH#2 31 31 7.4 K 2.00
TPCH#3 38 39 355 K 2.05
TPCH#4 35 37 27 K 2.11
I 12 real-world Bayesian networks (provided by bnlearn
repository and [Tzoumas et al. 2013])
I All datasets and implementation publicly available
https://github.com/aslayci/qtm
20. 19
Experiments: Elimination Order
I Finding the optimal elimination order is NP-hard [Koller et
al, 2009]
Table: Statistics of elimination trees.
Tree nodes height max. #
children
MILDEW (MF) 70 17 3
PATHFINDER (MF) 218 12 54
MUNIN#1 (WMF) 372 23 7
ANDES (MF) 440 38 5
DIABETES (MF) 826 77 4
LINK (MF) 1 428 56 15
MUNIN#2 (MF) 2 006 23 8
MUNIN (WMF) 2 082 24 8
TPCH#1 (MW) 34 8 3
TPCH#2 (MW) 62 11 5
TPCH#3 (MW) 76 13 5
TPCH#4 (MW) 70 11 4
21. 20
Experiments: Algorithms
I Variable elimination with materialization (VE-k) [this work]
I Junction tree (JT) algorithm [Lauritzen et al., 1998]
– tree calibrated by precomputing and materializing
– the joint probability distributions at its nodes
I Indexed junction tree (IND) [Kanagal and Deshpande,
2009]
– hierarchical index built on the calibrated junction tree
22. 21
Experiments: Setting
I Two different workload schemes considered
– uniform: each variable has equal probability to appear
– in a random query
– skewed: a variable appearing in a random query
– correlated with its elimination order
I 250 random queries per workload scheme
I Partial cost c(u) of each node u in the tree assigned
from the computational cost of computing the
corresponding factor from its children in the elimination tree
25. 24
Experiments: Results
Table: Materialization phase statistics.
Disk Space (MB) Time (seconds)
Network VE-n JT IND VE-n JT IND
MILDEW 1.7 373 1 354 5 18 360 18 360
PATHFINDER 1 17 23 1 302 305
MUNIN#1 317 NA NA 270 NA NA
ANDES 4.1 70 78 2 3 682 3 686
DIABETES 15 945 3 286 2 41 228 41 247
LINK 245 3 735 3 824 100 98 533 98 647
MUNIN#2 9 480 573 8 21 348 21 635
MUNIN 14 2 866 2 972 16 110 342 110 645
TPCH#1 1 1 1 0.01 0.306 0.322
TPCH#2 1 1 1 0.02 1.866 1.882
TPCH#3 1 NA NA 0.02 NA NA
TPCH#4 1 4.7 6.9 0.02 106 107
I faster and lighter precomputation in the offline phase
26. 25
Conclusion
I A small level of materialization for Variable Elimination
offers significant advantage over Junction tree algorithms
– more efficient inference in the online phase
– faster and lighter precomputation in the offline phase
I Many interesting avenues for future work
– other Bayesian network inference algorithms
– other Machine Learning models