SlideShare a Scribd company logo
1 of 27
Workload-aware Materialization for
Efficient Variable Elimination on
Bayesian Networks
Cigdem Aslay1, Martino Ciaperoni2, Aristides Gionis3, Michael Mathioudakis4
Aarhus University, Denmark1,
Aalto University, Finland2,
KTH Royal Institute of Technology, Sweden3,
University of Helsinki, Finland4
1
Bayesian Networks
I Important inference tool for many real-world
applications where complex probabilistic modeling
and reasoning is required
I Revolutionized the field of Artificial Intelligence by
changing its focus from logic to probability
I allowed the machines to “think” and reason probabilistically
I Turing award to Judea Pearl in 2011
2
Applications in DBMS
BNs offer several advantages compared to other approximation
methods like synopses that suffer from high-dimensional data
[Getoor et al., 2001].
I Selectivity estimation
– estimate the size of a query result for optimizing
– the query execution plan and query profiling
I Approximate query processing
– avoid costly scan of tables by approximating the
– observed data distribution within some error range
I Predictive queries
– infer the attributes of future data entries
3
Bayesian Networks
I A probabilistic graphical model that represents
a set of random variables and their conditional
(in-)dependencies via a DAG (Pearl, 1985)
I each node represents a random variable,
I each edge represents the dependency of the child
on the parent variable
I each node is associated with a conditional probability table
(CPT) quantifying the probability that the node takes
a particular value conditioned on the values of its parents
4
Bayesian Networks
I Each variable in a BN is conditionally independent of all its
non-descendants given the value of its parents, simplifying
the joint probability distribution given by the chain rule.
Pr(A, S, E, O, R, T) = Pr(T | O, R) Pr(O | E) Pr(R | E)
Pr(E | A, S) Pr(A) Pr(S).
5
Querying a Bayesian Network
I Query: among young females who commute by car, what
fraction live in big cities?
Pr(R = big | A = young, S = female, T = car)
6
Querying a Bayesian Network
I Inference in Bayesian networks is NP-hard [Koller et al,
2009]
– reduction from 3-SAT
I Approximate inference is also NP-hard [Koller et al, 2009]
I Efficiency can be compromised due to worst-case
exponential time-complexity
7
Querying a Bayesian Network
I Given a BN, w.l.o.g. we consider queries of the form
q = Pr(Xq, Yq = yq)
where Xq ⊆ X is a set of free variables and Yq ⊆ X is
a set of bound variables with corresponding values yq
I The answer to such query is a table indexed by the
combinations of values of variables Xq
8
Querying a Bayesian Network
A query q can be naively answered by eliminating the set Zq of
variables that do not appear in q
Pr(Xq, Yq = yq) =
X
Zq
Pr(Xq, Yq = yq, Zq) .
I perform a join of all CPTs into a table H
I for each a ∈ Yq, select entries of H that satisfy Yq = yq
I for a variable a ∈ Zq, compute a sum over each group
of values of variables
I resulting table is the answer to query q
9
Querying a Bayesian Network
I Running time of this naive approach is O(rn) on a BN
with n discrete variables taking r possible values each
I Variable elimination (VE) algorithm [Zhang et al., 1994]
improves to O nr2

, by taking advantage of the
factorization of the joint probability Pr(Xq, Yq = yq, Zq)
– given an elimination order on variables, process each
– variable A one at a time by performing natural joins
– over only the CPTs that contain the variable A
10
Elimination Tree
I Elimination order gives rise to an elimination tree T
I leaf nodes of T correspond to the original CPTs
I each internal node of T corresponds to an intermediate
factor, created during the execution of VE algorithm, by the
join of the factors that form this node’s children in T
11
Elimination Tree
I consider the query q = Pr(T, A = young) and the order
σ = hA, S, T, E, O, Ri
ψA(S, E; A = young) = Pr(E | A = young, S) Pr(A = young)
12
Materialization of Factors
I The intermediate factors created during the execution of
Variable Elimination can be costly to compute every time
a query requests them
I the same intermediate tables can be used for the
evaluation of many different queries
I what can we do?
I precompute and materialize “useful” tables that bring the
largest computational benefit, i.e., those that are involved
in the evaluation of many expensive queries
13
Materialization of Factors
I When is a materialized factor useful for a query q?
– if it is one of the factors computed during the
– evaluation of q, and
– if there is no other materialized factor that could be
– used in its place with greater cost savings
14
Materialization of Factors
Definition (Partial-Cost)
The partial cost c(u) of a node u ∈ V in the elimination tree is
the computational effort required to compute the corresponding
factor given the factors of its children.
Definition (Total-Cost)
The total cost of a node u ∈ V in the elimination tree is the total
cost of computing the factor at node u, i.e.,
b(u) =
X
x∈Tu
c(x) ,
where c(x) is the partial cost of node x.
15
Materialization of Factors
I for a given q, let δq(u; R) = 1 if u is useful for q w.r.t. R and
0 otherwise
I given a query workload, the total benefit of materializing R:
Definition (Benefit)
B(R) =
X
q
Pr(q)
X
u∈R
δq(u; R) b(u)
=
X
u∈R
Pr(δq(u; R) = 1) b(u)
=
X
u∈R
E[δq(u; R)] b(u) .
16
Materialization of Factors
Problem (1)
Given a Bayesian network N, an elimination tree T = (V, E),
and budget K, select a set of nodes R ⊆ V to materialize,
whose total size is at most K, so as to optimize B(R).
Problem (2)
Given a Bayesian network N, an elimination tree T = (V, E),
and an integer k, select at most k nodes R ⊆ V to materialize
so as to optimize B(R).
17
Materialization Algorithms
I Optimal materialization via dynamic programming
I Given an elimination tree, defined over n variables, with a
height h:
– Problem (1) admits pseudo-polynomial time DP
– algorithm which runs in O nhK2

– Problem (2) admits polynomial time DP algorithm
– which runs in O nhk2

I Benefit function is monotone and submodular
– Greedy algorithm provides (1 − 1
e )-approximation
– [Nemhauser et al., 1978, Sviridenko 2004]
18
Experiments: Datasets
Table: Statistics of Bayesian networks.
Network nodes edges parameters avg. degree
MILDEW 35 46 547 K 2.63
PATHFINDER 109 195 98 K 2.96
MUNIN#1 186 273 19 K 2.94
ANDES 220 338 2.3 K 3.03
DIABETES 413 602 461 K 2.92
LINK 714 1 125 20 K 3.11
MUNIN#2 1 003 1 244 84 K 2.94
MUNIN 1 041 1 397 98 K 2.68
TPCH#1 17 17 1.5 K 2.00
TPCH#2 31 31 7.4 K 2.00
TPCH#3 38 39 355 K 2.05
TPCH#4 35 37 27 K 2.11
I 12 real-world Bayesian networks (provided by bnlearn
repository and [Tzoumas et al. 2013])
I All datasets and implementation publicly available
https://github.com/aslayci/qtm
19
Experiments: Elimination Order
I Finding the optimal elimination order is NP-hard [Koller et
al, 2009]
Table: Statistics of elimination trees.
Tree nodes height max. #
children
MILDEW (MF) 70 17 3
PATHFINDER (MF) 218 12 54
MUNIN#1 (WMF) 372 23 7
ANDES (MF) 440 38 5
DIABETES (MF) 826 77 4
LINK (MF) 1 428 56 15
MUNIN#2 (MF) 2 006 23 8
MUNIN (WMF) 2 082 24 8
TPCH#1 (MW) 34 8 3
TPCH#2 (MW) 62 11 5
TPCH#3 (MW) 76 13 5
TPCH#4 (MW) 70 11 4
20
Experiments: Algorithms
I Variable elimination with materialization (VE-k) [this work]
I Junction tree (JT) algorithm [Lauritzen et al., 1998]
– tree calibrated by precomputing and materializing
– the joint probability distributions at its nodes
I Indexed junction tree (IND) [Kanagal and Deshpande,
2009]
– hierarchical index built on the calibrated junction tree
21
Experiments: Setting
I Two different workload schemes considered
– uniform: each variable has equal probability to appear
– in a random query
– skewed: a variable appearing in a random query
– correlated with its elimination order
I 250 random queries per workload scheme
I Partial cost c(u) of each node u in the tree assigned
from the computational cost of computing the
corresponding factor from its children in the elimination tree
22
Experiments: Results
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 35
k
Cost
Savings
%
skewed uniform
●
●
●
●
●
●
●
●
0
10
20
30
40
50
60
70
80
90
100
1 5 10 20 109
k
skewed uniform
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
10
20
30
40
50
60
70
80
90
100
1 5 10 20 186
k
skewed uniform
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 38
k
skewed uniform
(a) MILDEW (b) PATHFINDER (c) MUNIN#1 (d) TPCH#3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
10
20
30
40
50
60
70
80
90
100
1 5 10 20 413
k
Cost
Savings
%
skewed uniform
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
10
20
30
40
50
60
70
80
90
100
1 5 10 20 714
k
skewed uniform
0
10
20
30
40
50
60
70
80
90
100
1 5 10 20 1003
k
skewed uniform
0
10
20
30
40
50
60
70
80
90
100
1 5 10 20 1041
k
skewed uniform
(e) DIABETES (f) LINK (g) MUNIN#2 (h) MUNIN
Figure: Cost savings for uniform and skewed workloads. x-axis: budget k.
y-axis: cost savings in query running time compared to no materialization.
I an average gain of 70%, and reaching up to a gain of 99%
over a uniform workload of queries
23
Experiments: Results
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+04
1e+07
1e+10
VE−1 VE−5 VE−35 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
●
●
●
●
●
●
1e+01
1e+03
1e+05
1e+07
VE−1 VE−5VE−109 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+03
1e+07
1e+11
VE−1 VE−5VE−186 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+02
1e+05
1e+08
1e+11
VE−1 VE−5 VE−38 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
(a) MILDEW (b) PATHFINDER (c) MUNIN#1 (d) TPCH#3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+04
1e+07
1e+10
1e+13
VE−1 VE−5VE−413 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+02
1e+05
1e+08
VE−1 VE−5VE−714 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+02
1e+05
1e+08
VE−1 VE−5VE−1003 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1e+02
1e+05
1e+08
1e+11
VE−1 VE−5VE−1041 JT IND
Algorithm
Total
Costs
(log
scale)
skewed uniform
(e) DIABETES (f) LINK (g) MUNIN#2 (h) MUNIN
Figure: Comparison of total costs under uniform and skewed workloads for
different algorithms.
I more efficient inference in the online phase
24
Experiments: Results
Table: Materialization phase statistics.
Disk Space (MB) Time (seconds)
Network VE-n JT IND VE-n JT IND
MILDEW 1.7 373 1 354 5 18 360 18 360
PATHFINDER  1 17 23  1 302 305
MUNIN#1 317 NA NA 270 NA NA
ANDES 4.1 70 78 2 3 682 3 686
DIABETES 15 945 3 286 2 41 228 41 247
LINK 245 3 735 3 824 100 98 533 98 647
MUNIN#2 9 480 573 8 21 348 21 635
MUNIN 14 2 866 2 972 16 110 342 110 645
TPCH#1  1  1  1 0.01 0.306 0.322
TPCH#2  1  1  1 0.02 1.866 1.882
TPCH#3  1 NA NA 0.02 NA NA
TPCH#4  1 4.7 6.9 0.02 106 107
I faster and lighter precomputation in the offline phase
25
Conclusion
I A small level of materialization for Variable Elimination
offers significant advantage over Junction tree algorithms
– more efficient inference in the online phase
– faster and lighter precomputation in the offline phase
I Many interesting avenues for future work
– other Bayesian network inference algorithms
– other Machine Learning models
26
Thank you!

More Related Content

What's hot

Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
[Term project] Junction-point process
[Term project] Junction-point process[Term project] Junction-point process
[Term project] Junction-point processKyunghoon Kim
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_KimSundong Kim
 
Speeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in SparkSpeeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in SparkHans De Sterck
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...idescitation
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishtuxette
 
Introduction to Reinforcement Learning for Molecular Design
Introduction to Reinforcement Learning for Molecular Design Introduction to Reinforcement Learning for Molecular Design
Introduction to Reinforcement Learning for Molecular Design Dan Elton
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
 

What's hot (20)

Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
[Term project] Junction-point process
[Term project] Junction-point process[Term project] Junction-point process
[Term project] Junction-point process
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
K means clustering
K means clusteringK means clustering
K means clustering
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
Speeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in SparkSpeeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in Spark
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
 
Clustering
ClusteringClustering
Clustering
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
Introduction to Reinforcement Learning for Molecular Design
Introduction to Reinforcement Learning for Molecular Design Introduction to Reinforcement Learning for Molecular Design
Introduction to Reinforcement Learning for Molecular Design
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 

Similar to Workload-aware materialization for efficient variable elimination on Bayesian networks

CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...IJCNCJournal
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...cscpconf
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 
Chaotic based Pteropus algorithm for solving optimal reactive power problem
Chaotic based Pteropus algorithm for solving optimal reactive power problemChaotic based Pteropus algorithm for solving optimal reactive power problem
Chaotic based Pteropus algorithm for solving optimal reactive power problemIJAAS Team
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportShikhar Agarwal
 
A New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming ProblemsA New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming ProblemsJody Sullivan
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)NYversity
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...ANIRBANMAJUMDAR18
 
Availability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active ComponentsAvailability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active Componentstheijes
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniquejournalBEEI
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrisonComputer Science Club
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Université de Liège (ULg)
 

Similar to Workload-aware materialization for efficient variable elimination on Bayesian networks (20)

Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
Chaotic based Pteropus algorithm for solving optimal reactive power problem
Chaotic based Pteropus algorithm for solving optimal reactive power problemChaotic based Pteropus algorithm for solving optimal reactive power problem
Chaotic based Pteropus algorithm for solving optimal reactive power problem
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project Report
 
A New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming ProblemsA New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming Problems
 
F5233444
F5233444F5233444
F5233444
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
 
Gy3312241229
Gy3312241229Gy3312241229
Gy3312241229
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Availability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active ComponentsAvailability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active Components
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...
 

Recently uploaded

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Workload-aware materialization for efficient variable elimination on Bayesian networks

  • 1. Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks Cigdem Aslay1, Martino Ciaperoni2, Aristides Gionis3, Michael Mathioudakis4 Aarhus University, Denmark1, Aalto University, Finland2, KTH Royal Institute of Technology, Sweden3, University of Helsinki, Finland4
  • 2. 1 Bayesian Networks I Important inference tool for many real-world applications where complex probabilistic modeling and reasoning is required I Revolutionized the field of Artificial Intelligence by changing its focus from logic to probability I allowed the machines to “think” and reason probabilistically I Turing award to Judea Pearl in 2011
  • 3. 2 Applications in DBMS BNs offer several advantages compared to other approximation methods like synopses that suffer from high-dimensional data [Getoor et al., 2001]. I Selectivity estimation – estimate the size of a query result for optimizing – the query execution plan and query profiling I Approximate query processing – avoid costly scan of tables by approximating the – observed data distribution within some error range I Predictive queries – infer the attributes of future data entries
  • 4. 3 Bayesian Networks I A probabilistic graphical model that represents a set of random variables and their conditional (in-)dependencies via a DAG (Pearl, 1985) I each node represents a random variable, I each edge represents the dependency of the child on the parent variable I each node is associated with a conditional probability table (CPT) quantifying the probability that the node takes a particular value conditioned on the values of its parents
  • 5. 4 Bayesian Networks I Each variable in a BN is conditionally independent of all its non-descendants given the value of its parents, simplifying the joint probability distribution given by the chain rule. Pr(A, S, E, O, R, T) = Pr(T | O, R) Pr(O | E) Pr(R | E) Pr(E | A, S) Pr(A) Pr(S).
  • 6. 5 Querying a Bayesian Network I Query: among young females who commute by car, what fraction live in big cities? Pr(R = big | A = young, S = female, T = car)
  • 7. 6 Querying a Bayesian Network I Inference in Bayesian networks is NP-hard [Koller et al, 2009] – reduction from 3-SAT I Approximate inference is also NP-hard [Koller et al, 2009] I Efficiency can be compromised due to worst-case exponential time-complexity
  • 8. 7 Querying a Bayesian Network I Given a BN, w.l.o.g. we consider queries of the form q = Pr(Xq, Yq = yq) where Xq ⊆ X is a set of free variables and Yq ⊆ X is a set of bound variables with corresponding values yq I The answer to such query is a table indexed by the combinations of values of variables Xq
  • 9. 8 Querying a Bayesian Network A query q can be naively answered by eliminating the set Zq of variables that do not appear in q Pr(Xq, Yq = yq) = X Zq Pr(Xq, Yq = yq, Zq) . I perform a join of all CPTs into a table H I for each a ∈ Yq, select entries of H that satisfy Yq = yq I for a variable a ∈ Zq, compute a sum over each group of values of variables I resulting table is the answer to query q
  • 10. 9 Querying a Bayesian Network I Running time of this naive approach is O(rn) on a BN with n discrete variables taking r possible values each I Variable elimination (VE) algorithm [Zhang et al., 1994] improves to O nr2 , by taking advantage of the factorization of the joint probability Pr(Xq, Yq = yq, Zq) – given an elimination order on variables, process each – variable A one at a time by performing natural joins – over only the CPTs that contain the variable A
  • 11. 10 Elimination Tree I Elimination order gives rise to an elimination tree T I leaf nodes of T correspond to the original CPTs I each internal node of T corresponds to an intermediate factor, created during the execution of VE algorithm, by the join of the factors that form this node’s children in T
  • 12. 11 Elimination Tree I consider the query q = Pr(T, A = young) and the order σ = hA, S, T, E, O, Ri ψA(S, E; A = young) = Pr(E | A = young, S) Pr(A = young)
  • 13. 12 Materialization of Factors I The intermediate factors created during the execution of Variable Elimination can be costly to compute every time a query requests them I the same intermediate tables can be used for the evaluation of many different queries I what can we do? I precompute and materialize “useful” tables that bring the largest computational benefit, i.e., those that are involved in the evaluation of many expensive queries
  • 14. 13 Materialization of Factors I When is a materialized factor useful for a query q? – if it is one of the factors computed during the – evaluation of q, and – if there is no other materialized factor that could be – used in its place with greater cost savings
  • 15. 14 Materialization of Factors Definition (Partial-Cost) The partial cost c(u) of a node u ∈ V in the elimination tree is the computational effort required to compute the corresponding factor given the factors of its children. Definition (Total-Cost) The total cost of a node u ∈ V in the elimination tree is the total cost of computing the factor at node u, i.e., b(u) = X x∈Tu c(x) , where c(x) is the partial cost of node x.
  • 16. 15 Materialization of Factors I for a given q, let δq(u; R) = 1 if u is useful for q w.r.t. R and 0 otherwise I given a query workload, the total benefit of materializing R: Definition (Benefit) B(R) = X q Pr(q) X u∈R δq(u; R) b(u) = X u∈R Pr(δq(u; R) = 1) b(u) = X u∈R E[δq(u; R)] b(u) .
  • 17. 16 Materialization of Factors Problem (1) Given a Bayesian network N, an elimination tree T = (V, E), and budget K, select a set of nodes R ⊆ V to materialize, whose total size is at most K, so as to optimize B(R). Problem (2) Given a Bayesian network N, an elimination tree T = (V, E), and an integer k, select at most k nodes R ⊆ V to materialize so as to optimize B(R).
  • 18. 17 Materialization Algorithms I Optimal materialization via dynamic programming I Given an elimination tree, defined over n variables, with a height h: – Problem (1) admits pseudo-polynomial time DP – algorithm which runs in O nhK2 – Problem (2) admits polynomial time DP algorithm – which runs in O nhk2 I Benefit function is monotone and submodular – Greedy algorithm provides (1 − 1 e )-approximation – [Nemhauser et al., 1978, Sviridenko 2004]
  • 19. 18 Experiments: Datasets Table: Statistics of Bayesian networks. Network nodes edges parameters avg. degree MILDEW 35 46 547 K 2.63 PATHFINDER 109 195 98 K 2.96 MUNIN#1 186 273 19 K 2.94 ANDES 220 338 2.3 K 3.03 DIABETES 413 602 461 K 2.92 LINK 714 1 125 20 K 3.11 MUNIN#2 1 003 1 244 84 K 2.94 MUNIN 1 041 1 397 98 K 2.68 TPCH#1 17 17 1.5 K 2.00 TPCH#2 31 31 7.4 K 2.00 TPCH#3 38 39 355 K 2.05 TPCH#4 35 37 27 K 2.11 I 12 real-world Bayesian networks (provided by bnlearn repository and [Tzoumas et al. 2013]) I All datasets and implementation publicly available https://github.com/aslayci/qtm
  • 20. 19 Experiments: Elimination Order I Finding the optimal elimination order is NP-hard [Koller et al, 2009] Table: Statistics of elimination trees. Tree nodes height max. # children MILDEW (MF) 70 17 3 PATHFINDER (MF) 218 12 54 MUNIN#1 (WMF) 372 23 7 ANDES (MF) 440 38 5 DIABETES (MF) 826 77 4 LINK (MF) 1 428 56 15 MUNIN#2 (MF) 2 006 23 8 MUNIN (WMF) 2 082 24 8 TPCH#1 (MW) 34 8 3 TPCH#2 (MW) 62 11 5 TPCH#3 (MW) 76 13 5 TPCH#4 (MW) 70 11 4
  • 21. 20 Experiments: Algorithms I Variable elimination with materialization (VE-k) [this work] I Junction tree (JT) algorithm [Lauritzen et al., 1998] – tree calibrated by precomputing and materializing – the joint probability distributions at its nodes I Indexed junction tree (IND) [Kanagal and Deshpande, 2009] – hierarchical index built on the calibrated junction tree
  • 22. 21 Experiments: Setting I Two different workload schemes considered – uniform: each variable has equal probability to appear – in a random query – skewed: a variable appearing in a random query – correlated with its elimination order I 250 random queries per workload scheme I Partial cost c(u) of each node u in the tree assigned from the computational cost of computing the corresponding factor from its children in the elimination tree
  • 23. 22 Experiments: Results 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 35 k Cost Savings % skewed uniform ● ● ● ● ● ● ● ● 0 10 20 30 40 50 60 70 80 90 100 1 5 10 20 109 k skewed uniform ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 10 20 30 40 50 60 70 80 90 100 1 5 10 20 186 k skewed uniform 0 10 20 30 40 50 60 70 80 90 100 1 3 5 7 38 k skewed uniform (a) MILDEW (b) PATHFINDER (c) MUNIN#1 (d) TPCH#3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 10 20 30 40 50 60 70 80 90 100 1 5 10 20 413 k Cost Savings % skewed uniform ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 10 20 30 40 50 60 70 80 90 100 1 5 10 20 714 k skewed uniform 0 10 20 30 40 50 60 70 80 90 100 1 5 10 20 1003 k skewed uniform 0 10 20 30 40 50 60 70 80 90 100 1 5 10 20 1041 k skewed uniform (e) DIABETES (f) LINK (g) MUNIN#2 (h) MUNIN Figure: Cost savings for uniform and skewed workloads. x-axis: budget k. y-axis: cost savings in query running time compared to no materialization. I an average gain of 70%, and reaching up to a gain of 99% over a uniform workload of queries
  • 24. 23 Experiments: Results ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+04 1e+07 1e+10 VE−1 VE−5 VE−35 JT IND Algorithm Total Costs (log scale) skewed uniform ● ● ● ● ● ● 1e+01 1e+03 1e+05 1e+07 VE−1 VE−5VE−109 JT IND Algorithm Total Costs (log scale) skewed uniform ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+03 1e+07 1e+11 VE−1 VE−5VE−186 JT IND Algorithm Total Costs (log scale) skewed uniform ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+02 1e+05 1e+08 1e+11 VE−1 VE−5 VE−38 JT IND Algorithm Total Costs (log scale) skewed uniform (a) MILDEW (b) PATHFINDER (c) MUNIN#1 (d) TPCH#3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+04 1e+07 1e+10 1e+13 VE−1 VE−5VE−413 JT IND Algorithm Total Costs (log scale) skewed uniform ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+02 1e+05 1e+08 VE−1 VE−5VE−714 JT IND Algorithm Total Costs (log scale) skewed uniform ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+02 1e+05 1e+08 VE−1 VE−5VE−1003 JT IND Algorithm Total Costs (log scale) skewed uniform ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+02 1e+05 1e+08 1e+11 VE−1 VE−5VE−1041 JT IND Algorithm Total Costs (log scale) skewed uniform (e) DIABETES (f) LINK (g) MUNIN#2 (h) MUNIN Figure: Comparison of total costs under uniform and skewed workloads for different algorithms. I more efficient inference in the online phase
  • 25. 24 Experiments: Results Table: Materialization phase statistics. Disk Space (MB) Time (seconds) Network VE-n JT IND VE-n JT IND MILDEW 1.7 373 1 354 5 18 360 18 360 PATHFINDER 1 17 23 1 302 305 MUNIN#1 317 NA NA 270 NA NA ANDES 4.1 70 78 2 3 682 3 686 DIABETES 15 945 3 286 2 41 228 41 247 LINK 245 3 735 3 824 100 98 533 98 647 MUNIN#2 9 480 573 8 21 348 21 635 MUNIN 14 2 866 2 972 16 110 342 110 645 TPCH#1 1 1 1 0.01 0.306 0.322 TPCH#2 1 1 1 0.02 1.866 1.882 TPCH#3 1 NA NA 0.02 NA NA TPCH#4 1 4.7 6.9 0.02 106 107 I faster and lighter precomputation in the offline phase
  • 26. 25 Conclusion I A small level of materialization for Variable Elimination offers significant advantage over Junction tree algorithms – more efficient inference in the online phase – faster and lighter precomputation in the offline phase I Many interesting avenues for future work – other Bayesian network inference algorithms – other Machine Learning models