Bayesian Inference (UC Berkeley School of Information; July 25, 2019)

rochu2008 © 123RF.com
Bayesian Inference
or the value of imperfect information
Ivan Corneillet

2
“When the facts change, I change my mind. What do you do, sir?”
– John Maynard Keynes

Agenda
‣ Probability and statistics basics and the Bayes’ rule
‣ How to use the Bayes’ rule to get point estimates
‣ How to generalize the Bayes’ rule with conjugate priors and get confidence intervals
‣ Appendix: Let’s “justify” why we are using the beta distribution as the conjugate prior for
the binomial distribution
3

Probabilities and statistics basics and the Bayes’
rule

What’s probability? What’s statistics? How are they different?
Probability refers to the study of a random
process in which all basic features of the random
process are known.
Our goal with probability is to discover other
deeper features of the random process.
E.g., it is a problem in probability to determine
when presented with a die known to be fair,
how often it will land on a odd face over ten
consecutive throws.
Statistics refers to the study of a random
process in which some basic features of the
random process are unknown.
Our goal with statistics is to infer other hidden
features of the random process.
E.g., it is a problem in statistics to determine
when presented with a die which has landed on
odd faces ten consecutive times, whether one
should continue to believe it is fair.
5

The basic problem solving skill you need to solve problems in probability is counting
(or combinatorics) (yep, really…)
E.g.,
‣ How many ways are there to roll a die(*) and
get a face less than 4?
‣ # w/ less−than−four face = 3
‣ How many ways are there to roll a die(*) and
get a face less than 4 which is also odd?
‣ # w/ less−than−four face and odd face = 2
6
Aleksandr Elesin © 123RF.com
(*) 6-sided fair die

What’s an outcome? What’s an event?
An outcome is a single thing that can happen.
E.g., when thinking about dice, an outcome is an
odd face.
An event is a collection of desired outcomes.
E.g., the collection of all odd faces, less-than-four
faces, etc., are events.
7

What’s a probability?
The probability of an event ! is defined by:
" # =
# desired outcomes for #
# total outcomes
So to compute basic probabilities, we use our knowledge from combinatorics.
8

What’s a conditional probability?
Suppose we know that one event ! has already happened or will happen (the condition), and we want
to know the probability of a different event ".
Then the conditional probability of # given $ is defined by:
% " ! =
# desired outcomes for " and !
# desired outcomes for !
9

E.g., what’s the conditional probability of getting an odd face for a fair 6-sided die
given that the face it is less than 4?
! w/ odd face less−than−four face =
# w/ less−than−four face and odd face
# w/ less−than−four face
=
2
3
10

Another way to look at conditional probabilities…
! " # =
# desired outcomes for " and #
=
# desired outcomes for " and #
# total outcomes
# total outcomes
=
! " and #
! #
11

What’s Bayes rule?
Remember our definition of conditional probability:
! " and & = ! " & ( ! &
! & and " = ! & A ( ! "
Setting them equal to one another leads us to Bayes’ rule:
! " & =
! & " ( ! "
! &
This is probably the most important simple formula in both probability and statistics.
12

How to use the Bayes’ rule to get point estimates

What’s the probability that your firm will be breached?(*)
(*) loosely adapted from the disease screening problem
Suppose that you are looking to contract a cybersecurity team to pen test your company:
‣ 5% of the firms in your industry have been breached
‣ If a firm was breached and that firm also contracted that cybersecurity team, their pen testing was successful
90% of the time
‣ If a firm wasn’t breached and that firm also contracted that cybersecurity team, their pen testing was successful
only 20% of the time
You go ahead and contract this cybersecurity team and their pen testing against your firm is
successful. What’s the updated probability that your firm will be breached?
14

We are looking at the following conditional probability ! breach successful pen testing
And we know that:
‣ ! breach = .05
‣ So ! no breach = .95
‣ ! successful pen testing breach = .9
‣ ! successful pen testing no breach = .2
15

Let’s apply Bayes’ rule to calculate ! breach successful pen testing
! breach successful pen testing
=
! successful pen testing breach 2 ! breach
! successful pen testing
We know all the things appearing in the previous formula except ! successful pen testing .
16

We can calculate ! successful pen testing by breaking it down
= ! successful pen testing and breach or no breach
4546789:;<
= ! successful pen testing and breach or successful pen testing and no breach
= ! successful pen testing and breach + ! successful pen testing and no breach
= ! successful pen testing breach > ! breach + ! successful pen testing no breach > ! no breach (*)
= .9 × .05 + .2 × .95 = .64
(*) ! J = ! J K > ! K + ! J LK > ! LK (law of total probabilities)
(since K ∪ LK = P and K ∩ LK = ∅)
17

We now have everything to calculate ! breach successful pen testing
! breach successful pen testing
=
! successful pen testing breach 2 ! breach
=
.4 × .67
.89
= .07(*)
(*) The probability we will be breached in only 7% even though the pen test was successful. This kind of result is unintuitive and is called the base rate
fallacy. It takes a lot of evidence to make an unlikely situation likely.
18

There is some terminology that is often used to understand these relationships.
These ideas form the basis of Bayesian statistics.
! breach is called the prior probability.
‣ It is what we know before collecting evidence/data
! successful pen testing breach is called the likelihood.
‣ It is the strength of the evidence/data we collected
! breach successful pen testing is called the posterior probability.
‣ It is what we know after collecting evidence/data
19

Bayes’ rule in the context of Bayesian statistics
! " #
$%&'()*%) +,-./.01023
-4 256 53+-256707 “9”
;0<6= 256 6<0>6=?6 “@”
=
! # "
B*C(B*D%%E -4 256 6<0>6=?6 “@”
04 256 53+-256707 “9” 07 2,F6
G ! "
$)*%) +,-./.01023
-4 256 53+-256707 “9”
! #
+,0-, +,-./.01023
25/2 256 6<0>6=?6 “@” 027614 07 2,F6
(/=> / =-,I/10J0=; ?-=72/=2)
This equation captures the essence of Bayesian inference: that uncertainty (your hypothesis “H”) can
be quantified (thanks to your evidence “E”)
20

Suppose that you do a second independent pen testing through another
cybersecurity firm
This cybersecurity team has the following track record:
‣ If a firm was breached and that firm also contracted that second cybersecurity team, their pen testing was
successful 80% of the time
‣ If a firm wasn’t breached and that firm also contracted that second cybersecurity team, their pen testing was
successful only 25% of the time
What’s the updated probability that your firm will be breached in the event the second pen test is
successful. If, on the other hand, your firm resisted to that second pen test, what would that
probability become?
21

Suppose you do a second pen test and that pen test is also successful…
! " #$#% =
! #$#% " ' ! "
! #$#%
‣ ! #$#% " = ! #$ " ' ! #% " = .9 × .8 (because the events #$ and #% are assumed to be independents)
‣ ! " = .05
‣ ! #$#% = ! #$#% " ' ! " + ! #$#%
B" ' ! B"
‣ ! #$#%
B" = ! #$
B" ' ! #%
B" = .2 × .25
! " #$#% =
.9 × .8 × .05
.9 × .8 × .05 + .2 × .25 × .95
= .43
After two successful pen tests, the updated probability that your firm will be breached soared to 43%.
22

If, on the other hand, the second pen test wasn’t successful…
! " #$
%#& =
! #$
%#& " ( ! "
! #$#&
‣ ! #$
%#& " = ! #$ " ( ! %#& " = .9 × 1 − .8
‣ ! " = .05
‣ ! #$
%#& = ! #$
%#& " ( ! " + ! #$
%#&
2B ( ! %"
‣ ! #$
%#&
%" = ! #$
%" ( ! %#&
%" = .2 × 1 − .25
! " #$
%#& =
.9 × 1 − .8 × .05
.9 × 1 − .8 × .05 + .2 × 1 − .25 × .95
= .06
After an unsuccessful second pen test, the updated probability (1) goes down slightly, reflecting that, maybe, the result first pen test happened
by chance; (2) is still higher than if you haven’t done any pen testing at all, reflecting that, maybe, the result first pen test did not happen by
chance.
23

How to generalize the Bayes’ rule with conjugate
priors and get confidence intervals

What’s the probability that your firm will be breached?(*)
(*) loosely adapted from a Bayesian A/B testing example
In the previous section, we got a point estimate for the probability to be breached. To get there, we were
fortunate to have quite some information.
Is there anything you can you do with less information? What if you only know how often pen testing was
successful/unsuccessful?
Yes we can! In this section, we’ll derive a statistical distribution for the probability to be breached and
derive a confidence interval(*).
(*) as a Bayesian confidence interval
25

Let’s model hypothesis !(1) and evidence " using statistical distributions
Let’s adapt Bayes’ rule in the context of statistical distributions:
P ! "
$%&'()*%)
=
P " !
,*-(,*.%%/
0 1P !
$)*%)
2P "
(4%)56,*7*48 9%4&'64')
All that “mathiness” can be boiled down to the distilled version of the Bayes’ rule that:
Posterior ∝(B)
Likelihood 0 Prior
(1)
By θ, what we really mean here is that our hypothesis E will follow a specific statistical distribution with parameter(s) θ
(2)
a ∝ b means “a is proportional to b”
26

Let’s go back to our example…
and attach some tangible distributions to:
‣ the likelihood
‣ and the prior and posterior probabilities
27

Let’s start with the likelihood. What’s the probability distribution for the likelihood?
What are the possible outcomes for each pen testing?
‣ A pen testing is either successful or not; the outcome is binary
How then is our evidence/data distributed? I.e., what is the statistical distribution for the likelihood?
‣ The likelihood follows a binomial distribution
28

What’s a binomial distribution again?
The binomial distribution (parameters ! and ")
is the discrete probability distribution of the
number of successes in a sequence of !
independent experiments, each asking a binary
question, and each with its own outcome: success
(with probability ") or failure (with probability
1 − ").
The pmf (probability mass function) of a
binomial distribution is:
% & = ( =
!
(
) "*
) 1 − " +,*
29
(*) notice the bell-shape distribution of the pmf for the different values of ! and "

What about the probability distribution(s) for the prior and posterior probabilities?
Plugging the likelihood into the distilled version of the Bayes’ rule, we get:
!"#$%&'"& ∝ )*
+ 1 − ) ./*
+ !&'"&
What probability distribution should we pick the prior and the posterior? For that, we need to consider the following:
‣ The equation above is a function of ) (0 and 1 are outputs of the evidence/data we just collected); therefore, the
posterior will have terms with ) and 1 − )
‣ As we collect new evidence/data, the posterior we just calculated will become our new prior; therefore, we want the
prior and the posterior to have the same form, i.e., be from the same probability distribution (but not the same
parameters)
To summarize, the prior and the posterior should be from the same probability distribution and have terms in ) and 1 − ).
30

Enter the beta distribution…
The beta distribution (parameters ! and ") is a
continuous probability distribution between 0
and 1 and can be used to estimate a population
proportion. We’ll use it here to estimate the
probability of being breach. (Remember that a
probability is a proportion of outcomes).
The pdf (probability distribution function) of a
beta distribution is:
% & ≤ ( =
*
+ ,,.
(,/*
0 1 − ( ./*
(with 2 !, " =
3 , 3 .
3 ,4.
)
31
(*) notice the bell-shape distribution of the pdf for the different values of ∝> 1 and " > 1

Let’s put these probability distributions together(*)…
If the prior follow the beta distribution with parameters ! and ", and the posterior follow the beta
distribution with parameters !#
and "#
, then:
$%&'(
) 1 − $ ,&'(
∝ $.
) 1 − $ /'.
) $%'(
) 1 − $ ,'(
Therefore by identifying the terms in $ and 1 − $:
‣ !#
− 1 = 1 + ! − 1 and "#
− 1 = 3 − 1 + " − 1
To summarize, as we collect more evidence/data, we will update our belief that we can be breached with:
4#
= 4 + 5 and 6#
= 6 + 7 − 5
(*)
Because the posterior of a binomial likelihood and a beta prior is also a beta distribution, the beta distribution is called a conjugate prior to the binomial
distribution.
32

How do we start?
When we don’t have any information on pen
tests (! and "), every probability is likely, i.e., we
have a uniform continuous distribution between
0 and 1.
This is called an uniformed prior.
The beta distribution can model a continuous
distribution with % = 1 and ' = 1.
33

Let’s go back to our previous example and assume that out of ! = 50 pen tests on
firms in your industry, % = 5 were successful. What’s a 90% confidence interval(*)
that your firm will be breached?
The posterior will be ()*+ ,, . with , = 1 + % =
6 and . = 1 + ! − % = 1 + 50 − 5 = 46
‣ 4( = (567. 9:; .05,6,46 = .052
‣ =( = (567. 9:; .95,6,46 = .19
Out 90% confidence interval ranges from 5.2% to
19%.
34

How often should we update our beliefs?
Is collecting pen testing data twice a year (with k" successful pen tests out of n" in the first half and k$
successful pen tests out of n$ in the second half) better than collecting pen testing data for the whole year (so
with k" + k$ successful pen tests out of n" + n$)?
‣ Starting with a prior with parameters α and β
‣ After the first half, we have α(
= α + k" and β(
= β + n" − k"
‣ After the second half, we have α((
= α(
+ k$ = α + k" + k$ and β((
= β(
+ n$ − k$ = β + n" + n$ − k" − k$
We can easily verify that we would have derived the same values for α(( and β(( in the second scenario so
there’s no penalty for picking!
‣ Update your beliefs as often as you want or need to
35

Appendix: Let’s “justify” why we are using the beta
distribution as the conjugate prior for the binomial
distribution

Let’s say you didn’t like to see the beta distribution just crashing the party like that.
Let’s do without the beta and the binomial distributions…
Let’s discretize our probability space into ! probabilities
(e.g., evenly spaced as "# =
#
%
, 0 ≤ ) ≤ ! − 1).
We are looking to model
, " = "# - tests, . successful
Before any test is run, every probability "# is equally
probable:
, " = "# 0 tests, 0 successful = , " = "# =
1
!
or
, " = "# 0 tests, 0 successful ∝ 1
37

Suppose that out of ! pen tests, " were successful
The order of the tests doesn’t matter since every pen test is independent from the others.
Let’s order then with all " successful tests followed by all ! − " unsuccessful ones.
38

What are the possible outcomes for each pen testing?
A pen testing is either successful or not; the outcome is binary and follows a Bernoulli
distribution:
! test is successful = + and ! test is unsuccessful = 1 − +
39

Let’s incorporate the first pen test (which is successful per the ordering we chose)
Let’s apply the distilled version of the Bayes’ rule:
Posterior ∝ Likelihood . Prior
/ 0 = 02 1 test, 1 successful ∝ / test is successful
78
. / 0 = 02 0 test, 0 successful
:
∝ 02
40

Let’s apply the other ! − 1 successful pen tests. We can see that the pmf is in the
form of terms in “$”
% $ = $' 2 tests, 2 successful ∝ % test is successful
23
4 % $ = $' 1 test, 1 successful
23
∝ $'
5
% $ = $' 3 tests, 3 successful ∝ % test is successful
23
4 % $ = $' ! tests, ! − 1 successful
23
7
∝ $'
8
…
% $ = $' ! tests, ! successful ∝ % test is successful
23
4 % $ = $' ! tests, ! − 1 successful
23
:;<
∝ $'
=
41

Then the n − # unsuccessful pen tests. We can see that the pmf is in the form of
terms in “1 − %”
& % = %( # + 1 tests, # successful ∝ & test is unsuccessful
3456
7 & % = %( # tests, # − 1 successful
56
8
∝ %(
9
7 1 − %(
3456
7 & % = %( # + 1 tests, # − 1 successful
56
8
7 3456
∝ %(
9
7 1 − %(
;
…
3456
7 & % = %( # + 1 tests, # − 1 successful
56
8
7 3456
<=8=>
∝ %(
9
7 1 − %(
?49
42

Suggested readings
‣ Chapter 8: “reducing Uncertainty with Bayesian Methods” and Chapter 9: “Some
powerful Methods Based on Bayes” (first half) [HTMAIC]
‣ Chapter 10: “Bayes: Adding to What You Know Now” [HTMA]
43

References
‣ How to Measure Anything in Cybersecurity Risk (by Hubbard and Seiersen) [HTMAIC]
‣ How to Measure Anything, Finding the Value of “Intangibles” in Business, Third Edition
(by Hubbard) [HTMA]
44

Bayesian Inference (UC Berkeley School of Information; July 25, 2019)

Recommended

Recommended

More Related Content

Similar to Bayesian Inference (UC Berkeley School of Information; July 25, 2019)

Similar to Bayesian Inference (UC Berkeley School of Information; July 25, 2019) (20)

Recently uploaded

Recently uploaded (20)

Bayesian Inference (UC Berkeley School of Information; July 25, 2019)