by Federico Cerutti; Lance Kaplan; Angelika Kimmig; Murat Sensoy
Paper accepted at AAAI2019
We enable aProbLog—a probabilistic logical programming
approach—to reason in presence of uncertain probabilities
represented as Beta-distributed random variables. We
achieve the same performance of state-of-the-art algorithms
for highly specified and engineered domains, while simultaneously
we maintain the flexibility offered by aProbLog
in handling complex relational domains. Our motivation is
that faithfully capturing the distribution of probabilities is
necessary to compute an expected utility for effective decision
making under uncertainty: unfortunately, these probability
distributions can be highly uncertain due to sparse data. To
understand and accurately manipulate such probability distributions
we need a well-defined theoretical framework that is
provided by the Beta distribution, which specifies a distribution
of probabilities representing all the possible values of a
probability when the exact value is unknown.
2. Reasoning about objects:
attributes and relations
Reasoning about
Confidence in
Probabilities
Probabilistic
Reasoning
Expressing statements like:
“there is a relation between smoking
and asthma”
E.g. Logic Programming
Our proposal
Some attributes and relations in
the real world are probabilistic
E.g. Probabilistic Logic
Programming
Let’s toss a coin 3 times. We obtain 2 heads and 1 tail:
is the coin fair?
Let’s toss the same coin 3000 times. We obtain 2000 heads and
1000 tails: is the coin fair?
E.g. Dempster-Shafer,
possibility theory, imprecise
probabilities…
2
3. Probabilistic Logic Programming
0.6::asthma(X) :- smokes(X).
smokes(bill).
Probability of a query:
Ppqq :“
ÿ
Λ1ĎΛ,Λ1|ùq
PΛpΛ1
q
“
ÿ
Λ1ĎΛ,Λ1|ùq
ź
λiPΛ1
pi ¨
ź
λiPΛzΛ1
p1 ´ piq
Bill suffers from asthma with probability 0.6 if he smokes
Sato T. 1995. A statistical learning method for logic programs with distribution semantics. In Proceedings of
ICLP-1995, 715–729.
De Raedt L., Kimmig A., and Toivonen H. 2007. Problog: A probabilistic prolog and its application in link
discovery. In Proceedings IJCAI-2007, 2462–2467.
3
4. Where numbers come from?
# Smokes Asthma
1 T T
2 T T
3 T F
4 T T
5 T T
6 T F
7 T T
8 T F
9 T T
10 T F
π: true—unknown—probability of asthma conditioned by
smoking
Let y be the number of occurrence of asthma over n patients
when the patient smokes (y = 6)
From Bayes’ theorem, we can estimate the posterior
distribution of π given the data on the basis of a prior:
gpπ|yq 9 gpπq ¨ fpy|πq
The conjugate of a binomial is the Beta distribution. If:
gpπ; a, bq “ Betapa, bq “
Γpa ` bq
Γpaq ` Γpbq
πa´1
p1 ´ πqb´1
then: gpπ|yq “ Betapy ` a, n ´ y ` bq
If a “ b “ 1 (uniform prior), then gpπ|yq “ Betapy ` 1, n ´ y ` 1q
In the example, gpπ|y “ 6, n “ 10q “ Betap7, 5q4
6. Proposal: extend Probabilistic Logic Programming to manipulate Beta-distributed
random variables rather than probabilities
Advantage: enable reasoning both about the probabilities of things and the uncertainty
associated with our inferences
Technical Solution:
1. Derive addition and multiplication operators over Beta distributions returning a Beta
distribution via moment-matching to be used within the algebraic ProbLog
(aProbLog) proposal*
2. Extend aProbLog to include a conditioning operator
3. Derive a conditioning operator over Beta distributions returning a Beta distribution
via moment-matching
*Kimmig, A.; Van den Broeck, G.; and De Raedt, L. 2011. An algebraic prolog for reasoning about possible
worlds. In Proceedings of AAAI 2011, 209–214.
6
7. Step 0: aProbLog*
Ppqq “
ÿ
Λ1ĎΛ,Λ1|ùq
ź
λiPΛ1
pi ¨
ź
λiPΛzΛ1
p1 ´ piq
ó
Apqq “
à
IPIpqq
â
iPI
δpiq
Requirement: commutative semiring xA, ‘, b, e‘
, eb
y
*Kimmig, A.; Van den Broeck, G.; and De Raedt, L. 2011. An algebraic prolog for reasoning about possible
worlds. In Proceedings of AAAI 2011, 209–214.
7
8. Step 1: Addition and Multiplication Operators for Beta Variables
Given X and Y independent Beta-distributed random variables:
• the sum (‘β
) of X and Y is defined as the Beta-distributed random variable Z such
that:
E rZs “ E rX ` Ys “ E rXs ` E rYs
and
Var pZq “ Var pX ` Yq “ Var pXq ` Var pYq
• the product (bβ
) of X and Y is defined as the Beta-distributed random variable Z
such that:
E rZs “ E rXYs “ E rXs E rYs
and
Var pZq “ Var pXYq “ Var pXq pE rYsq2
` Var pYq pE rXsq2
` Var pXq Var pYq
8
9. Step 2: Conditioning operator
Apq|E “ eq “ ApIpq ^ E “ eqq m ApIpE “ eqq
(label of q ^ E “ e given the label of E “ e)
9
10. Step 3: Conditioning Operator for Beta Variables
Given X and Y Beta-distributed random variables,
Y “ ApIpE “ eqq “ ApIpq ^ E “ eqq ‘β
ApIp␣q ^ E “ eqq, with ApIpq ^ E “ eqq “ X.
The conditioning-division (mβ
) of X by Y is defined as the Beta-distributed random
variable Z such that:
E rZs “ E
„
X
Y
ȷ
“ E rXs E
„
1
Y
ȷ
»
E rXs
E rYs
and
Var pZq » pE rZsq2
p1 ´ E rZsq2
¨
ˆ
Var pXq
pE rXsq2
`
Var pYq ´ Var pXq
pE rYs ´ E rXsq2
`
2Var pXq
E rXs pE rYs ´ E rXsq
˙
10
11. Summary of the main contribution
Sβ
A new aProbLog parametrisation with our newly defined operators ‘β
, bβ
, and mβ
11
14. p1::stress(X) :- person(X).
...
ó 100 random choices for pX, e.g. p1 = 0.3
Generate 10 Beta
distributions from
Nins “ 10 samples of p1
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
Sample set #1
Sample set #2
Sample set #3
Sample set #4
Sample set #5
Sample set #6
Sample set #7
Sample set #8
Sample set #9
Sample set #10
Generate 10 Beta
distributions from
Nins “ 50 samples of p1
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
7
Sample set #1
Sample set #2
Sample set #3
Sample set #4
Sample set #5
Sample set #6
Sample set #7
Sample set #8
Sample set #9
Sample set #10
Generate 10 Beta
distributions from
Nins “ 100 samples of p1
0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
Sample set #1
Sample set #2
Sample set #3
Sample set #4
Sample set #5
Sample set #6
Sample set #7
Sample set #8
Sample set #9
Sample set #10
14
15. State of the art*
A Beta-distributed random variable X „ Betapα, αq is equivalent to a subjective logic
opinion
SSL is the aProbLog parametrisation that uses the operators ‘SL, bSL, and mSL *
*Jøsang, A. 2016. Subjective Logic: A Formalism for Reasoning Under Uncertainty. Springer
15
16. Nins Sβ
SSL
10 Actual 0.1014 0.1514
Predicted 0.1727 0.1178
50 Actual 0.0620 0.1123
Predicted 0.0926 0.0815
100 Actual 0.0641 0.1253
Predicted 0.1150 0.0893
RMSE for the queried variables in the Friends & Smokers program.
Best results for the actual RMSE highlighted.
16
18. EXPERIMENT 2:
Sβ
is as good as state-of-the-art approaches on
Bayesian network benchmarks
18 Image: https://pxhere.com/en/photo/690975
19. H
A
B
C
E
G L
D
F
Net1
ó to logic program
pA::a.
pB1::b :- a.
pB2::b :- +a.
...
C
A
B
E
H
F
L
D
G
Net2
ó to logic program
pA::a.
pB1::b :- a.
pB2::b :- +a.
...
C
A
B
E
H
F L
D
G
Net3
ó to logic program
pA::a.
pB1::b :- a.
pB2::b :- +a.
...
pA = P(A) pB1 = PpB|Aq pB2 = PpB|Aq …19
20. State of the art
SBN
Subjective Bayesian Network*
• Bayesian network where
the conditionals are
subjective opinions instead
of dogmatic probabilities
• It builds on top of Pearl’s
message-passing inference
method
GBT
Belief Networks†
• Using Dempster-Shafer
theory
• Forward propagation and
backward propagation
enabled via the
generalized Bayes
theorem (GBT)
Credal
Credal Network‡
• Replace single probability
values with closed
intervals representing
the possible range of
probability values
• It extends Pearl’s
message-passing
inference method
*Kaplan, L., and Ivanovska, M. 2018. Efficient belief propagation in second-order bayesian networks for
singlyconnected graphs. International Journal of Approximate Reasoning 93:132–152.
†Smets, P. 1993. Belief functions: The disjunctive rule of combination and the generalized Bayesian theorem.
International Journal of Approximate Reasoning 9:1–35.
‡Zaffalon, M., and Fagiuoli, E. 1998. 2U: An exact interval propagation algorithm for polytrees with binary
variables. Artificial Intelligence 106(1):77–107.
20
21. Nins Sβ
SSL SBN GBT Credal
Net1 10 Actual 0.1505 0.2078 0.1505 0.1530 0.1631
Predicted 0.1994 0.1562 0.1470 0.0868 0.2009
50 Actual 0.0555 0.0895 0.0555 0.0619 0.0553
Predicted 0.0950 0.0579 0.0563 0.0261 0.0761
100 Actual 0.0766 0.1182 0.0766 0.0795 0.0771
Predicted 0.1280 0.0772 0.0763 0.0373 0.1028
Net2 10 Actual 0.1387 0.2089 0.1387 0.1416 0.1459
Predicted 0.2031 0.1662 0.1391 0.1050 0.1849
50 Actual 0.0537 0.0974 0.0537 0.0561 0.0528
Predicted 0.1002 0.0671 0.0520 0.0342 0.0683
100 Actual 0.0730 0.1229 0.0726 0.0752 0.0728
Predicted 0.1380 0.0863 0.0725 0.0482 0.0949
Net3 …
RMSE for the queried variables in the various Bayesian networks (selection).
Best results for the actual RMSE highlighted.
21
22. Sβ
SSL SBN GBT Credal
0.0 0.2 0.4 0.6 0.8 1.0
Desired Confidence
0.0
0.2
0.4
0.6
0.8
1.0
ActualConfidence
SL Beta
SL Operators
SBN
GBT
Credal
Net1, Nins “ 10
Sβ
SSL SBN GBT Credal
0.0 0.2 0.4 0.6 0.8 1.0
Desired Confidence
0.0
0.2
0.4
0.6
0.8
1.0
ActualConfidence
SL Beta
SL Operators
SBN
GBT
Credal
Net1, Nins “ 50
…
Actual versus desired significance of bounds derived from the uncertainty for the various
Bayesian networks (selection).
Best closest to the diagonal
22
24. Reasoning about objects:
attributes and relations
Reasoning about
Confidence in
Probabilities
Probabilistic
Reasoning
Expressing statements like:
“there is a relation between smoking
and asthma”
E.g. Logic Programming
Our proposal
Some attributes and relations in
the real world are probabilistic
E.g. Probabilistic Logic
Programming
Let’s toss a coin 3 times. We obtain 2 heads and 1 tail:
is the coin fair?
Let’s toss the same coin 3000 times. We obtain 2000 heads and
1000 tails: is the coin fair?
E.g. Dempster-Shafer,
possibility theory, imprecise
probabilities…
24
25. • We enabled the aProbLog approach to probabilistic logic programming to reason in
presence of uncertain probabilities represented as Beta-distributed random
variables
• The proposed operators outperform existing proposal for uncertain probabilities
• The proposed operators are as good as the state-of-the-art approaches for
uncertain probabilities in Bayesian networks while being able to handle much more
complex problems
25
27. • Provide a different characterisation of the variance in the conditioning operator
• Test the boundaries of our approximations to provide practitioners with pragmatic
assessments and assurances
• Introduce an expectation-maximisation (EM) algorithm for parameter learning
27