Based on the theory of meadows an equational axiomatisation is given for probability functions on finite event spaces. Completeness of the axioms is stated with some pointers to how that is shown.Then a simplified model courtroom subjective probabilistic reasoning is provided in terms of a protocol with two proponents: the trier of fact (TOF, the judge), and the moderator of evidence (MOE, the scientific witness). Then the idea is outlined of performing of a step of Bayesian reasoning by way of applying a transformation of the subjective probability function of TOF on the basis of different pieces of information obtained from MOE. The central role of the so-called Adams transformation is outlined. A simple protocol is considered where MOE transfers to TOF first a likelihood ratio for a hypothesis H and a potential piece of evidence E and thereupon the additional assertion that E holds true. As an alternative a second protocol is considered where MOE transfers two successive likelihoods (the quotient of both being the mentioned ratio) followed with the factuality of E. It is outlined how the Adams transformation allows to describe information processing at TOF side in both protocols and that the resulting probability distribution is the same in both cases. Finally it is indicated how the Adams transformation also allows the required update of subjective probability at MOE side so that both sides in the protocol may be assumed to comply with the demands of subjective probability.
Equational axioms for probability calculus and modelling of Likelihood ratio transfer mediated reasoning
1. Equational axioms for probability calculus and
modelling of Likelihood ratio transfer mediated
reasoning
Jan Bergstra
Informatics Institute, Faculty of Science
University of Amsterdam
j.a.bergstra@uva.nl
ESTEC March 8, 2019
Jan Bergstra Informatics Institute ESTEC March 8, 2019 1 / 23
2. Commutative rings
(x + y) + z = x + (y + z) (1)
x + y = y + x (2)
x + 0 = x (3)
x + (−x) = 0 (4)
(x · y) · z = x · (y · z) (5)
x · y = y · x (6)
1 · x = x (7)
x · (y + z) = x · y + x · z (8)
Jan Bergstra Informatics Institute ESTEC March 8, 2019 2 / 23
3. Division by zero
Add a function symbol for inverse (x−1) and have division (x/y or x
y ) as
a derived operator.
Now what about 0−1? A survey of the 8 known options.
0−1 = 0, material inverse (material division): meadows,
0−1 = 1, (inverse not involutive),
0−1 = 17, (ad hoc value),
0−1 = ⊥ (error value) common inverse,
0−1 = ∞ (unsigned infinite), natural inverse: wheels,
0 · 0−1 = ⊥,
0−1 = +∞ (positive signed infinite), transrational numbers,
transreal numbers,
∞ + (−∞) = ⊥),
0−1 ↑ undefined (divergence), partial inverse.
0 · 0−1 = 1 formal multiplicative inverse.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 3 / 23
4. Meadows: Md = CR + (9) + (10)
(x−1
)−1
= x (9)
x · (x · x−1
) = x (10)
We find: Md 0−1 = 0 · (0 · 0−1) = 0.
x/y = x/y =
x
y
= x ÷ y = x · y−1
(11)
Defining equations for the different operator symbols for division
Jan Bergstra Informatics Institute ESTEC March 8, 2019 4 / 23
5. Signed meadows
Idea for sign function s(0) = 0, x > 0 → s(x) = 1, x < 0 → s(x) = −1.
Axioms (without ordering):
s(x · x−1
) = x · x−1
(12)
s(1 − x · x−1
) = 1 − x · x−1
(13)
s(−1) = −1 (14)
s(x−1
) = s(x) (15)
s(x · y) = s(x) · s(y) (16)
0s(x)−s(y) · (s(x + y) − s(x)) = 0 (17)
|x| = s(x) · x
Sign: axioms for the sign operator & abs. value. Completeness result:
R0(s) |= t = r ⇐⇒ Md + Sign t = r
Jan Bergstra Informatics Institute ESTEC March 8, 2019 5 / 23
6. Event space
Events viewed as propositions about samples.
(x ∨ y) ∧ y = y (18)
(x ∧ y) ∨ y = y (19)
x ∧ (y ∨ z) = (y ∧ x) ∨ (z ∧ x) (20)
x ∨ (y ∧ z) = (y ∨ x) ∧ (z ∨ x) (21)
x ∧ ¬x = ⊥ (22)
x ∨ ¬x = (23)
BA: a self-dual equational basis for Boolean algebras (Padmanabhan
1983)
Jan Bergstra Informatics Institute ESTEC March 8, 2019 6 / 23
7. Probability functions
P( ) = 1 (24)
P(⊥) = 0 (25)
P(x) = |P(x)| (26)
P(x ∨ y) = P(x) + P(y) − P(x ∧ y) (27)
P(x ∧ y) · P(y) · P(y)−1
= P(x ∧ y) (28)
P(x | y) = P(x ∧ y) · P(y)−1
PFP: a version of Kolmogorov’s axioms for a probability function.
Completeness:
Md + Sign + BA + PFP proves all equations t = r which hold in any
structure made from a Boolean algebra E and a probability function
P : E → R0.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 7 / 23
8. Formalizing Kolmogorov’s axioms & Bayes’ rule
Original presentation of Kolmogorov’s axioms: use set theory and real
numbers and define which P’s are probability functions.
Given this definition Md + Sign + BA + PFP is a formalisation of that
definition. In the completeness statement the definition is used and its
correspondence with the formalisation is stated.
Bayes’ rule is derivable from Md + Sign + BA + PFP (without using
P(x ∨ y) = P(x) + P(y) − P(x ∧ y))
P(x | y) =
P(y | x) · P(x)
P(y)
(with inverse instead of division: P(x | y) = P(y | x) · P(x) · P(y)−1.)
Jan Bergstra Informatics Institute ESTEC March 8, 2019 8 / 23
9. Proof of Bayes’ rule (BR) from Md + Sign + BA + PFP
P(x | y) =
P(x ∧ y)
P(y)
=
P(y ∧ x)
P(y)
=
P(y ∧ x) · P(x) · P(x)−1
P(y)
=
P(y∧x)
P(x) · P(x)
P(y)
=
P(y | x) · P(x)
P(y)
In the presence of Md + BA + ”definition of conditional probability”,
BR follows from equation no. 27 (P(x ∧ y) · P(y) · P(y)−1 = P(x ∧ y)).
In fact this works both ways: BR implies equation 27.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 9 / 23
10. Second form of Bayes’ rule
BR2, a second form of Bayes’ rule
P(x | y) =
P(y | x) · P(x)
P(y | z) · P(z) + P(y | ¬z) · P(¬z)
.
BR2 is derivable from Md + Sign + BA + PFP and is equivalent
with P(x ∨ y) = P(x) + P(y) − P(x ∧ y).
BR2 is stronger than BR.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 10 / 23
11. PFP: Alternative axioms for a probability function
P( ) = 1
P(⊥) = 0
P(x) = |P(x)|
P(x | y) =
P(x ∧ y)
P(y)
P(x | y) =
P(y | x) · P(x)
P(y | z) · P(z) + P(y | ¬z) · P(¬z)
Jan Bergstra Informatics Institute ESTEC March 8, 2019 11 / 23
12. Why making inverse total? Four arguments!
1 Raising a run time exception at division by 0 may create a system
risk (if other exceptions are also raised in a real time context).
Proving software correctness (in advance) over a meadow
prevents such exceptions from being raised.
2 Several software verification tools use a total version of division,
because the (any) logic of partial functions is significantly more
complicated than the logic of total functions.
3 Limitation to classical two-valued logic. See next page.
4 Simplification of theoretical work: fewer cases to be distinguished,
fewer (negative) conditions occur.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 12 / 23
13. Why making inverse total? Limitation to classical
two-valued logic
It is a common idea that the follwoing assertion Φ is valid:
Φ ≡ x = 0 → x/x = 1
The idea is that the condition prevents one from having to divide by
zero and that one is comfortable with: ∀x.Φ(x). But how can that be?
Substitution of 0 for x must be allowed and must turn Φ(x) into a valid
assertion so that also Φ(0) holds, i.e.
0 = 0 → 0/0 = 1
and in other words: 0 = 0 ∨ 0/0 = 1. But for the latter to hold (in
classical 2-valued logic) both parts of the disjunction must have a truth
value. Thus we must know either 0/0 = 1 or ¬(0/0 = 1). However,
when viewing 0/0 as undefined (or even worse, as incorrectly typed)
neither of these assertions is plausible.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 13 / 23
14. Bayesian reasoning
CHALLENGE: understand courtroom Bayesian reasoning from first
principles (that is principles which are found in basic papers).
Conclusion: not at all easy. It is an oversimplification to say that judges
should acquire the theoretical background which consists of a few
formulae and their application.The whole subject is deeply puzzling.
Principal agents:
TOF (trier of fact, the judge or a jury),
MOE (moderator of evidence, “getuige deskundige”),
a defendant, a prosecutor, several lawyers.
Here focus on TOF and MOE.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 14 / 23
15. Subjective probability: a crash course
Exam question to person X: what is the probability Pbom that there are
birds on the moon (not in a spacecraft)? Survey of answers by X with
a corresponding assessment (VERY LOW< LOW < DEFECTIVE <
ADEQUATE) of the “probability theory competence (ptc)” of X:
1 X replies that (s)he must visit the moon before answering the
question (ptc VERY LOW because X does not understand the
concept of prior odds).
2 Pbom = 0: valid answer (ptc ADEQUATE).
3 Pbom = 10−5: valid answer (ptc ADEQUATE).
4 Pbom > 0: X has not understood how to work with (subjective)
probabilities as these must be precise! (ptc DEFECTIVE).
5 10−20 ≤ Pbom ≤ 10−10: X has not understood how to work with
(subjective) probabilities, no intervals! (ptc DEFECTIVE, though
NFI experts may produce such intervals for likelihood ratio’s)
6 I don’t know: X has not understood the concept of probability at
all. (Because precisely by assigning a value to Pbom, X may
express his/her lack of knowledge. ptc LOW)
Jan Bergstra Informatics Institute ESTEC March 8, 2019 15 / 23
16. Credal state (partial belief state, most beliefs missed)
H (hypothesis: e.g. the defendant is guilty of criminal action C).
E some assertion about evidence of relevance for H.
H and E are propositions, also called events.
All agents at each moment maintain a proposition space with
probability function (credal state, subjective probability).
for TOF proposition space (= event space) ETOF with probability
function PTOF on ETOF .
for MOE: EMOE with probability function PMOE on EMOE .
proposition kinetics: the event space changes (for instance E is
added to ETOF , or is removed from ETOF ).
conditioning: modification (update) of probability function on the
basis of newly acquired information.
Bayes conditioning, (for processing the information that “L is true”)
Jeffrey conditioning, (for processing the information that “P(L) = p”)
single likelihood Adams conditioning, (for processing a new value
for a conditional probability, i.e. a likelihood),
double likelihood Adams conditioning), (for processing a new value
for a likelihood ratio).
Jan Bergstra Informatics Institute ESTEC March 8, 2019 16 / 23
17. Probability function transformations: a survey
Bayes conditioning (without proposition kinetics). Suppose
SA = SA(L, M, N) and PA(M) = p > 0. Then PA is
obtained by Bayes conditioning if it satisfies the following
equation:
PA = P0
A(•|M)
Jeffrey notation: for all X ∈ SA, PA(X) = P0
A(X|M).
Bayes conditioning with proposition kinetics. Now the resulting credal
state is (SA(L, N), PA) M has been removed from the
proposition space.
Bayes conditioning on a non-primitive proposition. SA = SA(L, M, N).
Φ is a closed propositional sentence making use of
primitives L, N, and M. PA(Φ) = p > 0. Then PA is
obtained by Bayes conditioning on Φ if it satisfies:
PA = P0
A(•|Φ)
the proposition space is not modified.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 17 / 23
18. Jeffrey conditioning. Let for example SA = SA(L, M, N). Suppose
PA(M) > 0. Then PA is obtained by Jeffrey conditioning if
for some p ∈ [0, 1] it satisfies the following equation:
PA = p · P0
A(•|M) + (1 − p) · P0
A(•|¬M)
Jefrey conditioning involves no proposition kinetics.
Proposition space reduction. Consider SA = SA(L, M, N), one may
wish to forget about say M. Proposition kinetics now
leads to a reduced proposition space SA(L, N) in which
only the propositions generated by L and N are left.
Parametrized proposition space expansion. Let SA = SA(H). One may
wish to expand SA to a proposition space by introducing
M to it in such a manner that a subsequent reduct brings
one back in SA.
PA(H) is left unchanged and PA(H ∧ M) and PA(¬H ∧ M)
are be fixed with definite values serving as parameters for
the transformation.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 18 / 23
19. Single likelihood Adams conditioning. Let 0 < l ≤ 1 be a rational
number. Assume that H and E are among the generators
of SA. Single likelihood Adams conditioning leaves the
proposition space unchanged and transforms the
probability function PA to Ql.
Ql =
PA(H ∧ E ∧ •) · l
P0
A
(E|H)
+ PA(H ∧ ¬E ∧ •) · 1−l
P0
A
(¬E|H)
+
PA(¬H ∧ •)
Double likelihood Adams conditioning. Let 0 < l, l ≤ 1 be two rational
numbers. H and E are among the generators of SA.
Double likelihood Adams conditioning leaves the
proposition space SA of A unchanged and transforms the
probability function PA to
Ql,l =
PA(H ∧ E ∧ •) · l
P0
A
(E|H)
+ PA(H ∧ ¬E ∧ •) · 1−l
P0
A
(¬E|H)
+
PA(¬H ∧ E ∧ •) · l
P0
A
(E|¬H)
+ PA(¬H ∧ ¬E ∧ •) · 1−l
P0
A
(¬E|¬H)
Jan Bergstra Informatics Institute ESTEC March 8, 2019 19 / 23
20. LRTMR protocol (likelihood ratio transfer mediated
reasoning)
Often the term likelihood is used to denote a certain conditional
probability. We write Lα for a likelihood and LR0
α for a particular ratio of
likelihoods, commonly referred to as a likelihood ratio.
Lα(X, Y) = Pα(X|Y) and LRα(X, Y, ¬Y) =
Lα(X, Y)
Lα(X, ¬Y)
It is now assumed that both E and H are among the generators of both
proposition spaces STOF and SMOE . Further TOF and MOE have prior
credal states (STOF , PTOF ) and (SMOE , PMOE ). The reasoning protocol
LRTMR involves the following steps:
It is checked by MOE that 0 < PMOE (H) < 1 and
0 < PMOE (E) < 1, otherwise MOE raises an exception and the
protocol aborts.
MOE determines the value r of the likelihood ratio
LRMOE (E, H, ¬H) = LMOE (E,H)
LMOE (E,¬H) = PMOE (E|H)
PMOE (E|¬H) with respect to its
probability function PMOE .
Jan Bergstra Informatics Institute ESTEC March 8, 2019 20 / 23
21. MOE communicates to TOF the value r and a description of
LRMOE (E, H, ¬H), that is a description of what propositions r is a
likelihood ratio of.
MOE communicates its newly acquired information to TOF that it
now considers PMOE (E) = 1, i.e. E being true, to be an adequate
representation of the state of affairs (Thus MOE has updated its
probability function.)
TOF trusts MOE to the extent that TOF prefers those of MOE’s
quantitative values that MOE communicates over its own values
for the same probabilities, likelihoods, and likelihood ratios.
TOF takes all information into account and applies various
conditioning operators to end up with its new (updated, posterior)
belief function PTOF .
TOF becomes aware of it having updated its beliefs, with the effect
that PTOF (H) = r·PTOF (H)
1+(r−1)·PTOF (H). TOF checks whether a threshold
is exceeded so that a sound judgement on H can be made.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 21 / 23
22. Some conclusions
1 Upon receiving the value of the likelihood ratio r from MOE, TOF
can (and must) update its probability function by means of the
double likelihood Adams conditioning. (DOGMA: agents must
always and immediately take all new information into account by
updating their credal states).
2 Upon subsequently receiving the information that E is true
(according to MOE) TOF applies Bayes conditioning (after Adams
conditioning).
3 MOE must first transfer the likelihood ratio. Only thereafter MOE
contemplates the truth of E. (If MOE first settles the truth of E
then the likelihood ratio equals 1 and the protocol collapses, or
MOE fails to communicate its proper beliefs).
4 MOE communicates the truth of E in a separate (second)
message, after having updated its own probability function.
5 After the first message of MOE, TOF must apply Adams
conditioning. This is missed by all accounts that I have read.
Jan Bergstra Informatics Institute ESTEC March 8, 2019 22 / 23
23. Further remarks
MOE may transfer both likelihoods in separate successive
messages. Then TOF can apply single likelihood Adams
conditioning after both messages, with the same effect as in the
protocol.
In principle TOF may receive likelihood ratio’s regarding different
pieces of evidence E, E , E etc. But then E , E , E must be
independent (this requires a highly non-trivial bookkeeping by
TOF).
For TOF there is no way around subjective probability.
It is not clear from the literature of forensic science if MOE is
supposed to think in terms of subjective probability as well. (Not a
necessity as TOF may freely turn MOE’s “objective” probabilities
into its own subjective probabilities, but opinions diverge.)
If MOE must adhere to subjective probability then (i) single
message reporting is not an option, and (ii) TOF must apply at
least two successive updates of its probability function (even in the
simplest case).
Jan Bergstra Informatics Institute ESTEC March 8, 2019 23 / 23