Omiros' talk on the Bernoulli factory problem

Simulating events of unknown probability by
reverse time martingales
(poprzez martyngaly z czasem odwr´conym)
o

Omiros Papaspiliopoulos
UPF, Barcelona

joint work with
Krzysztof Latuszynski Ioannis Kosmidis Gareth O. Roberts
(Warwick University)

Motivation

The Bernoulli factory - general results and previous approaches

Reverse time martingale approach to sampling

Application to the Bernoulli Factory problem

Generic description of the problem

Let p ∈ (0, 1) be unknown.

Given a black box that samples p−coins

Can we construct a black box that samples f (p) coins for known f ?
For example f (p) = min(1, 2p)

Some history

(see for example Peres, 1992)
von Neumann posed and solved the problem: f (p) = 1/2

Some history

(see for example Peres, 1992)
von Neumann posed and solved the problem: f (p) = 1/2
1. set n = 1;
2. sample Xn , Xn+1
3. if (Xn , Xn+1 ) = (0, 1) output 1 and STOP
4. if (Xn , Xn+1 ) = (1, 0) output 0 and STOP
5. set n := n + 2 and GOTO 2.
Lets check why this work

The Bernoulli Factory problem

for known f and unknown p, how to generate an f (p)−coin?

von Neumann: f (p) = 1/2

Asmussen conjectured f (p) = 2p, but it turned out diﬃcult

Exact simulation of diﬀusions as Bernoulli factory

This is the description of EA closest in spirit to Beskos and Roberts
(2005)

Simulate XT at time T > 0 from:

dXt = α(Xt ) dt + dWt , X0 = x ∈ R, t ∈ [0, T ] (1)

driven by the Brownian motion {Wt ; 0 ≤ t ≤ T }

Ω ≡ C ([0, T ], R), co-ordinate mappings Bt : Ω → R, t ∈ [0, T ], such
that for any t, Bt (ω) = ω(t) and the cylinder σ-algebra
C = σ({Bt ; 0 ≤ t ≤ T }).

W x = {Wtx ; 0 ≤ t ≤ T } the Brownian motion started at x ∈ R, and by
W x,u = {Wtx,u ; 0 ≤ t ≤ T } the Brownian bridge.

1. The drift function α is diﬀerentiable.
2. The function h(u) = exp{A(u) − (u − x)2 /2T }, u ∈ R, for
u
A(u) = 0 α(y )dy , is integrable.
3. The function (α2 + α )/2 is bounded below by > −∞, and above
by r + < ∞.

1 2
φ(u) = [(α + α )/2 − ] ∈ [0, 1] , (2)
r

Q be the probability measure induced by the solution X of (1) on (Ω, C),
W the corresponding probability measure for W x , and Z be the
probability measure deﬁned as the following simple change of measure
from W: dW/dZ(ω) ∝ exp{−A(BT )}. Note that a stochastic process
distributed according to Z has similar dynamics to the Brownian motion,
with the exception of the distribution of the marginal distribution at time
T (with density, say, h) which is biased according to A.

Q be the probability measure induced by the solution X of (1) on (Ω, C),
W the corresponding probability measure for W x , and Z be the
probability measure deﬁned as the following simple change of measure
from W: dW/dZ(ω) ∝ exp{−A(BT )}. Note that a stochastic process
distributed according to Z has similar dynamics to the Brownian motion,
with the exception of the distribution of the marginal distribution at time
T (with density, say, h) which is biased according to A. Then,

T
dQ
(ω) ∝ exp −rT T −1 φ(Bt )dt ≤1 Z − a.s. (3)
dZ 0

EA using Bernoulli factory

1. simulate u ∼ h
T
2. generate a Cs coin where s := e −rTJ , and J := 0
T −1 φ(Wtx,u )dt;
3. If Cs = 1 output u and STOP;
4. If Cs = 0 GOTO 1.

Exploiting the Markov property, we can assume from now on that rT < 1.

The challenging part of the algorithm is Step 2, since exact computation
of J is impossible due to the integration over a Brownian bridge path.

On the other hand, it is easy to generate J-coins:
x,u
CJ = I(ψ < φ(Wχ )), ψ ∼ U(0, 1) , χ ∼ U(0, T )

independent of the Brownian bridge W x,u and of each other.

Therefore, we deal with another instance of the problem studied in this
article: given p-coins how to generate f (p)-coins, where here f is the
exponential function.

(Note the interplay between unbiased estimation and exact simulation
here)

Keane and O’Brien - existence result

Keane and O’Brien (1994):

Let p ∈ P ⊆ (0, 1) → [0, 1]

then it is possible to simulate an f (p)−coin ⇐⇒
f is constant
f is continuous and for some n ∈ N and all p ∈ P satisﬁes
n
min f (p), 1 − f (p) ≥ min p, 1 − p

Note that the result rules out min{1, 2p}, but not min{1 − 2 , 2p}

Keane and O’Brien - existence result

Keane and O’Brien (1994):

Let p ∈ P ⊆ (0, 1) → [0, 1]

then it is possible to simulate an f (p)−coin ⇐⇒
f is constant
f is continuous and for some n ∈ N and all p ∈ P satisﬁes
n
min f (p), 1 − f (p) ≥ min p, 1 − p

however their proof is not constructive
Note that the result rules out min{1, 2p}, but not min{1 − 2 , 2p}

Nacu-Peres (2005) Theorem - Bernstein polynomial
approach
There exists an algorithm which simulates f ⇐⇒ there exist
polynomials
n n
n n
gn (x, y ) = a(n, k)x k y n−k , hn (x, y ) = b(n, k)x k y n−k
k k
k=0 k=0

with following properties

approach
polynomials
n n
n n
k k
k=0 k=0

0 ≤ a(n, k) ≤ b(n, k) ≤ 1

approach
polynomials
n n
n n
k k
k=0 k=0

0 ≤ a(n, k) ≤ b(n, k) ≤ 1
n n
k a(n, k) and k b(n, k) are integers

approach
polynomials
n n
n n
k k
k=0 k=0

0 ≤ a(n, k) ≤ b(n, k) ≤ 1
n n
limn→∞ gn (p, 1 − p) = f (p) = limn→∞ hn (p, 1 − p)

approach
polynomials
n n
n n
k k
k=0 k=0

0 ≤ a(n, k) ≤ b(n, k) ≤ 1
n n
for all m < n
(x+y )n−m gm (x, y ) gn (x, y ) and (x+y )n−m hm (x, y ) hn (x, y )

approach
polynomials
n n
n n
k k
k=0 k=0

0 ≤ a(n, k) ≤ b(n, k) ≤ 1
n n
for all m < n
Nacu & Peres provide coeﬃcients for f (p) = min{2p, 1 − 2ε}
explicitly.

approach
polynomials
n n
n n
k k
k=0 k=0

0 ≤ a(n, k) ≤ b(n, k) ≤ 1
n n
for all m < n
Nacu & Peres provide coeﬃcients for f (p) = min{2p, 1 − 2ε}
explicitly.
Given an algorithm for f (p) = min{2p, 1 − 2ε} Nacu & Peres
develop a calculus that collapses every real analytic g to nesting
the algorithm for f and simulating g .

too nice to be true?

at time n the N-P algorithm computes sets An and Bn - subsets
of all 01 strings of length n


n
the cardinalities of An and Bn are precisely k a(n, k) and
n
k b(n, k)


n
n
k b(n, k)
the upper polynomial approximation is converging slowly to f


n
n
k b(n, k)
length of 01 strings is 215 = 32768 and above, e.g.
225 = 16777216


n
n
k b(n, k)
225 = 16777216
25
one has to deal eﬃciently with the set of 22 strings, of length
225 each.


n
n
k b(n, k)
225 = 16777216
25
225 each.
we shall develop a reverse time martingale approach to the
problem


n
n
k b(n, k)
225 = 16777216
25
225 each.
we shall develop a reverse time martingale approach to the
problem
we will construct reverse time super- and submartingales that
perform a random walk on the Nacu-Peres polynomial coeﬃcients
a(n, k), b(n, k) and result in a black box that has algorithmic
cost linear in the number of original p−coins

Before giving the most general algorithm, let us think gradually how to
simulate events of unknown probability constructively

Algorithm 1 - randomization

Lemma: Sampling events of probability s ∈ [0, 1] is equivalent to
constructing an unbiased estimator of s taking values in [0, 1]
with probability 1.


with probability 1.
ˆ ˆ ˆ
Proof: Let S, s.t. ES = s and P(S ∈ [0, 1]) = 1 be the
ˆ
estimator. Then draw G0 ∼ U(0, 1), obtain S and deﬁne a coin
ˆ
Cs := I{G0 ≤ S}.

ˆ s ˆ s
P(Cs = 1) = E I(G0 ≤ S) = E E I(G0 ≤ ˆ) | S = ˆ ˆ
= ES = s.

The converse is straightforward since an s−coin is an unbiased
estimator of s with values in [0, 1].


with probability 1.
ˆ ˆ ˆ
Proof: Let S, s.t. ES = s and P(S ∈ [0, 1]) = 1 be the
ˆ
estimator. Then draw G0 ∼ U(0, 1), obtain S and deﬁne a coin
ˆ
Cs := I{G0 ≤ S}.

ˆ s ˆ s
P(Cs = 1) = E I(G0 ≤ S) = E E I(G0 ≤ ˆ) | S = ˆ ˆ
= ES = s.

The converse is straightforward since an s−coin is an unbiased
estimator of s with values in [0, 1].
Algorithm 1
1. simulate G0 ∼ U(0, 1);
2. ˆ
obtain S;
3. ˆ
if G0 ≤ S set Cs := 1, otherwise set Cs := 0;
4. output Cs .

Algorithm 2 - lower and upper monotone deterministic
bounds

let l1 , l2 , ... and u1 , u2 , ... be sequences of lower and upper
monmotone bounds for s converging to s, i.e.

li s and ui s.

bounds


li s and ui s.

Algorithm 2
1. simulate G0 ∼ U(0, 1); set n = 1;
2. compute ln and un ;
3. if G0 ≤ ln set Cs := 1;
4. if G0 > un set Cs := 0;
5. if ln < G0 ≤ un set n := n + 1 and GOTO 2;
6. output Cs .

bounds


li s and ui s.

Algorithm 2
1. simulate G0 ∼ U(0, 1); set n = 1;
2. compute ln and un ;
3. if G0 ≤ ln set Cs := 1;
4. if G0 > un set Cs := 0;
5. if ln < G0 ≤ un set n := n + 1 and GOTO 2;
6. output Cs .
Remark: P(N > n) = un − ln .

This is a practically useful technique, suggested for example in Devroye
(1986), and implemented for simulation from random measures in
P+Roberts (2008).

Algorithm 3 - monotone stochastic bounds

Ln ≤ Un (4)
Ln ∈ [0, 1] and Un ∈ [0, 1] (5)
Ln−1 ≤ Ln and Un−1 ≥ Un (6)
E Ln = ln s and E Un = un s. (7)

F0 = {∅, Ω}, Fn = σ{Ln , Un }, Fk,n = σ{Fk , Fk+1 , ...Fn } for k ≤ n.


Ln ≤ Un (4)
Ln ∈ [0, 1] and Un ∈ [0, 1] (5)
Ln−1 ≤ Ln and Un−1 ≥ Un (6)
E Ln = ln s and E Un = un s. (7)


Algorithm 3
1. simulate G0 ∼ U(0, 1); set n = 1;
2. obtain Ln and Un ; conditionally on F1,n−1
3. if G0 ≤ Ln set Cs := 1;
4. if G0 > Un set Cs := 0;
5. if Ln < G0 ≤ Un set n := n + 1 and GOTO 2;
6. output Cs .


Lemma
Assume (4), (5), (6) and (7). Then Algorithm 3 outputs a valid s−coin.
Moreover the probability that it needs N > n iterations equals un − ln .

Proof.
Probability that Algorithm 3 needs more then n iterations equals
E(Un − Ln ) = un − ln → 0 as n → ∞. And since 0 ≤ Un − Ln is a
decreasing sequence a.s., we also have Un − Ln → 0 a.s. So there exists a
ˆ
random variable S, such that for almost every realization of sequences
{Ln (ω)}n≥1 and {Un (ω)}n≥1 we have Ln (ω) ˆ
S(ω) and Un (ω) ˆ
S(ω).
ˆ
By (5) we have S ∈ [0, 1] a.s. Thus for a ﬁxed ω the algorithm outputs
ˆ ˆ
an S(ω)−coin a.s (by Algorithm 2). Clearly E Ln ≤ E S ≤ E Un and
ˆ
hence E S = s.

Algorithm 4 - reverse time martingales

Ln ≤ Un
Ln ∈ [0, 1] and Un ∈ [0, 1]
Ln−1 ≤ Ln and Un−1 ≥ Un
E Ln = ln s and E Un = un s.

The ﬁnal step is to weaken 3rd condition and let Ln be a reverse time
supermartingale and Un a reverse time submartingale with respect to
Fn,∞ . Precisely, assume that for every n = 1, 2, ... we have

E (Ln−1 | Fn,∞ ) = E (Ln−1 | Fn ) ≤ Ln a.s. and (8)
E (Un−1 | Fn,∞ ) = E (Un−1 | Fn ) ≥ Un a.s. (9)


Algorithm 4
˜
1. simulate G0 ∼ U(0, 1); set n = 1; set L0 ≡ L0 ≡ 0 and
˜
U0 ≡ U0 ≡ 1
2. obtain Ln and Un given F0,n−1 ,
3. compute L∗ = E (Ln−1 | Fn ) and Un = E (Un−1 | Fn ).
n
∗

4. compute

˜ ˜ Ln − L∗ ˜
n ˜
Ln = Ln−1 + ∗ Un−1 − Ln−1 (10)
Un − L∗n

˜ ˜ U ∗ − Un ˜ ˜
Un = Un−1 − n Un−1 − Ln−1 (11)
∗
Un − L∗ n

5. ˜
if G0 ≤ Ln set Cs := 1;
6. ˜
if G0 > Un set Cs := 0;
7. ˜ ˜
if Ln < G0 ≤ Un set n := n + 1 and GOTO 2;
8. output Cs .


Theorem
Assume (4), (5), (7), (8) and (9). Then Algorithm 4 outputs a valid
s−coin. Moreover the probability that it needs N > n iterations equals
un − ln .

˜ ˜
We show that L and U satisfy (4), (5), (7) and (6) and hence
Algorithm 4 is valid since Algorithm 3 was valid.
In fact, we have constructed a mean preserving transformation, in the
˜ ˜
sense E[Ln ] = E[Ln ], and E[Un ] = E[Un ]. Therefore, the proof is based
on establishing this property and appealing to Algorithm 3.

By construction L0 = 0, U0 = 1 a.s thus L∗ = 0, U0 = 1
0
∗

0
∗

˜ ˜
Therefore, L1 = L1 , and U1 = U1 (intuitive)

0
∗

˜ ˜
Thus,
˜ L2 − L∗
2
L2 = L1 + ∗ (U1 − L1 )
U2 − L∗
2

0
∗

˜ ˜
Thus,
˜ L2 − L∗
2
L2 = L1 + ∗ (U1 − L1 )
U2 − L∗
2
Take conditional expectation given F2
Therefore result holds for n = 1, 2, then use induction (following the
same approach)

Unbiased estimators

Theorem
Suppose that for an unknown value of interest s ∈ R, there exist a
constant M < ∞ and random sequences Ln and Un s.t.

P(Ln ≤ Un ) = 1 for every n = 1, 2, ...
P(Ln ∈ [−M, M]) = 1 and P(Un ∈ [−M, M]) = 1 for every n = 1, 2, ...
E Ln = ln s and E U n = un s
E (Ln−1 | Fn,∞ ) = E (Ln−1 | Fn ) ≤ Ln a.s. and
E (Un−1 | Fn,∞ ) = E (Un−1 | Fn ) ≥ Un a.s.

Then one can construct an unbiased estimator of s.

Proof.
After rescaling, one can use Algorithm 4 to sample events of probability
(M + s)/2M, which gives an unbiased estimator of (M + s)/2M and
consequently of s.

A version of the Nacu-Peres Theorem

An algorithm that simulates a function f on P ⊆ (0, 1) exists if and only
if for all n ≥ 1 there exist polynomials gn (p) and hn (p) of the form
n n
n n
gn (p) = a(n, k)p k (1−p)n−k and hn (p) = b(n, k)p k (1−p)n−k ,
k k
k=0 k=0

s.t.
(i) 0 ≤ a(n, k) ≤ b(n, k) ≤ 1,
(ii) limn→∞ gn (p) = f (p) = limn→∞ hn (p),
(iii) For all m < n, their coeﬃcients satisfy
k n−m m k n−m m
k−i i k−i i
a(n, k) ≥ n a(m, i), b(n, k) ≤ n b(m, i). (12)
i=0 k i=0 k


Proof: polynomials ⇒ algorithm.
Let X1 , X2 , . . . iid tosses of a p−coin.


Deﬁne {Ln , Un }n≥1 as follows:


n
if i=1 Xi = k, let Ln = a(n, k) and Un = b(n, k).


n
if i=1 Xi = k, let Ln = a(n, k) and Un = b(n, k).
In the rest of the proof we check that (4), (5), (7), (8) and (9) hold
for {Ln , Un }n≥1 with s = f (p). Thus executing Algorithm 4 with
{Ln , Un }n≥1 yields a valid f (p)−coin.
Clearly (4) and (5) hold due to (i). For (7) note that
E Ln = gn (p) f (p) and E Un = hn (p) f (p).


Proof - continued.
To obtain (8) and (9) deﬁne the sequence of random variables Hn
to be


Proof - continued.
to be
n
the number of heads in {X1 , . . . , Xn }, i.e. Hn = i=1 Xi


Proof - continued.
to be
n
and let Gn = σ(Hn ). Thus
Ln = a(n, Hn ) and Un = b(n, Hn ), hence Fn ⊆ Gn and it is
enough to check that E(Lm |Gn ) ≤ Ln and E(Um |Gn ) ≥ Un for m < n.


Proof - continued.
to be
n
and let Gn = σ(Hn ). Thus
Ln = a(n, Hn ) and Un = b(n, Hn ), hence Fn ⊆ Gn and it is
enough to check that E(Lm |Gn ) ≤ Ln and E(Um |Gn ) ≥ Un for m < n.
The distribution of Hm given Hn is hypergeometric and
Hn n−m m
Hn −i i
E(Lm |Gn ) = E(a(m, Hm )|Hn ) = n a(m, i) ≤ a(n, Hn ) = Ln .
i=0 Hn

Clearly the distribution of Hm given Hn is the same as the
distribution of Hm given {Hn , Hn+1 , . . . }. The argument for Un is
identical.

Practical issues for Bernoulli factory

Given a function f , ﬁnding polynomial envelopes satisfying properties
required is not easy. Section 3 of N-P provides explicit formulas for
polynomial envelopes of f (p) = min{2p, 1 − 2ε} that satisfy conditions,
precisely a(n, k) and b(n, k) satisfy (ii) and (iii) and one can easily
compute n0 = n0 (ε) s.t. for n ≥ n0 condition (i) also holds, which is
enough for the algorithm (however n0 is substantial, e.g. n0 (ε) = 32768
for ε = 0.1 and it increases as ε decreases).

The probability that Algorithm 4 needs N > n inputs equals
hn (p) − gn (p). The polynomials provided in N-P satisfy
hn (p) − gn (p) ≤ C ρn for p ∈ [0, 1/2 − 4ε] guaranteeing fast convergence,
and hn (p) − gn (p) ≤ Dn−1/2 elsewhere. Using similar techniques one can
establish polynomial envelopes s.t. hn (p) − gn (p) ≤ C ρn for
p ∈ [0, 1] (1/2 − (2 + c)ε, 1/2 − (2 − c)ε).

Moreover, we note that despite the fact that the techniques developed in
N-P for simulating a real analytic g exhibit exponentially decaying tails,
they are often not practical. Nesting k times the algorithm for
f (p) = min{2p, 1 − 2ε} is very inefficient. One needs at least n0 (ε)k of
original p−coins for a single output.

Nevertheless, both algorithms, i.e. the original N-P and our martingale
modification, use the same number of original p−coins for a single
f (p)−coin output with f (p) = min{2p, 1 − 2ε} and consequently also for
simulating any real analytic function using methodology of N-P. A
significant improvement in terms of p−coins can be achieved only if the
monotone super/sub-martingales can be constructed directly and used
along with Algorithm 3. This is discussed in the next subsection.

Bernoulli Factory for alternating series expansions

Proposition
Let f : [0, 1] → [0, 1] have an alternating series expansion
∞
f (p) = (−1)k ak p k with 1 ≥ a0 ≥ a1 ≥ . . .
k=0

Then an f (p)−coin can be simulated by Algorithm 3 and the probability
that it needs N > n iterations equals an p n .

Proof.
Let X1 , X2 , . . . be a sequence of p−coins and deﬁne

U0 := a0 L0 := 0,
n
Un−1 − an k=1 Xk if n is odd,
Ln :=
Ln−1 if n is even,
Un−1 if n is odd,
Un := n
Ln−1 + an k=1 Xk if n is even.

Clearly (4), (5), (6) and (7) are satisﬁed with s = f (p). Moreover,

un − ln = E Un − E Ln = an p n ≤ an .

Thus if an → 0, the algorithm converges for p ∈ [0, 1], otherwise for
p ∈ [0, 1).

Proof.
Let X1 , X2 , . . . be a sequence of p−coins and deﬁne

U0 := a0 L0 := 0,
n
Un−1 − an k=1 Xk if n is odd,
Ln :=
Ln−1 if n is even,
Un−1 if n is odd,
Un := n
Ln−1 + an k=1 Xk if n is even.

Clearly (4), (5), (6) and (7) are satisﬁed with s = f (p). Moreover,

un − ln = E Un − E Ln = an p n ≤ an .

Thus if an → 0, the algorithm converges for p ∈ [0, 1], otherwise for
p ∈ [0, 1).

The exponential function is precisely in this family

Some References

A. Beskos, G.O. Roberts. Exact simulation of diffusions. Ann. Appl.
Probab. 15(4): 2422–2444, 2005.
A. Beskos, O. Papaspiliopoulos, G.O. Roberts. Retrospective exact
simulation of diffusion sample paths with applications. Bernoulli 12:
1077–1098, 2006.
A. Beskos, O. Papaspiliopoulos, G.O. Roberts, and P. Fearnhead.
Exact and computationally efficient likelihood-based estimation for
discreetly observed diffusion processes (with discussion). Journal of
the Royal Statistical Society B, 68(3):333–382, 2006.
L. Devroye. Non-uniform random variable generation.
Springer-Verlag, New York, 1986.
M.S. Keane and G.L. O’Brien. A Bernoulli factory. ACM
Transactions on Modelling and Computer Simulation (TOMACS),
4(2):213–219, 1994.

P.E. Kloeden and E.Platen. Numerical solution of stochastic
diﬀerential equations, Springer-Verlag, 1995.
K. Latuszynski, I. Kosmidis, O. Papaspiliopoulos, G.O. Roberts.
Simulating events of unknown probabilities via reverse time
martingales. Random Structures and Algorithms, to appear.
S. Nacu and Y. Peres. Fast simulation of new coins from old.
Annals of Applied Probability, 15(1):93–115, 2005.
O. Papaspiliopoulos, G.O. Roberts. Retrospective Markov chain
Monte Carlo for Dirichlet process hierarchical models. Biometrika,
95:169–186, 2008.
Y. Peres. Iterating von Neumann’s procedure for extracting random
bits. Annals of Statistics, 20(1): 590–597, 1992.

Omiros' talk on the Bernoulli factory problem

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Omiros' talk on the Bernoulli factory problem

Similar to Omiros' talk on the Bernoulli factory problem (20)

More from BigMC

More from BigMC (10)

Recently uploaded

Recently uploaded (20)

Omiros' talk on the Bernoulli factory problem