Lecture: Joint, Conditional and Marginal Probabilities

Joint,
Condi*onal
and
Marginal
Probabili*es

Last
Updated:
24
March
2015

Slideshare:
h7p://www.slideshare.net/marinasan*ni1/mathema*cs-‐for-‐language-‐technology

Mathema*cs
for
Language
Technology

h7p://stp.lingﬁl.uu.se/~matsd/uv/uv15/mfst/

Marina
San*ni

san5nim@stp.lingﬁl.uu.se

Department
of
Linguis*cs
and
Philology

Uppsala
University,
Uppsala,
Sweden

Spring
2015
1

Acknowledgements

•  Several
slides
borrowed
from
Prof
Joakim
Nivre.

•  Prac*cal
Ac*vi*es
by
Prof
Joakim
Nivre

•  Required
Reading:

–  E&G
(2013):
Ch.
5
(pp.
pp.
110-‐114)

–  Compendium
(4):
9.2,
9.3,
9.4

–  E&G
(2013):
Ch.
5.2-‐5.3
(self-‐study)

•  Recommended
Reading:

–  Sec5ons
3-‐6
in
Goldsmith
J.
(2007)
Probability
for
Linguists.
The

University
of
Chicago.
The
Department
of
Linguis*cs:

•  h7p://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf

2

Outline

•  Joint
Probability

•  Condi*onal
Probability

•  Mul*plica*on
Rule

•  Marginal
Probability

•  Bayes
Law

•  Independence

3

Linguis*c
Note:

•  Tradi*onally,
the
plural
is
dice,
but
the

singular
is
die.
(i.e.
1
die,
2
dice.)

•  Modern
lexicography
says:
ex,
MacMillan:

–  h7p://www.macmillandic*onary.com/dic*onary/bri*sh/dice_1

Joint
vs
Condi*onal

In
many
situa*ons
where
we
want
to
make

use
fo
probabili*es,
there
are

dependencies
between
diﬀerent
variables

or
events.

For
this
reason
we
need
the
no*on
of

condi*onal
probability,
ie
the
probabability

of
an
event
given
some
other
event.

the
condi5onal

probability
of
A
given
B
is

deﬁned
as
the
probability

of
the
intersec*on
of
A

and
B
divided
by
the

probability
of
B.

the
probability
of
the
intersec*on

is
referred
to
as
the
joint

probability
because
it
is
the

probability
that
both
A
and
B

occur.

CONDITIONAL
=
NOT
SYMMETRICAL

5

Condi*onal

When
we
talk
about
the
joint

probability
of
A
and
B,
then
we

are
considering
the
intersec*on

of
A
and
B,
ie
those
outcomes

that
are
both
in
A
and
B.
And
we

ask:
how
large
is
that
set
of

events
compared
to
the
en*re

sample
space?

6

Example:
Bigrams

10-‐3
=
1/103=1/1000=
one
in
thousand

one
in
one
million

joint
probability
=
one
in
10
millions

We
apply
the
formula

of
condi*onal

probability

7

From
the
deﬁni*on
of
condi*onal

probability
we
can
derive
the

Mul*plica*on
Rule

8

One
way
to

compute
the

probability
of
A

and
B
(ie
the
joint

probability)
is
to

take
the

probability
of
B

by
itself
and

mul*ply
it
with

the
probability
of

A
given
B.

Another
way

to
compute

the
joint

probability

of
A
and
B
is

to
start
with

the
simple

probability

of
A
and

mul*ply
that

by
the

probability

of
B
given
A

Quiz
1:
only
one
answer
is
correct

9

Probability
is
the
measure
of
the
likeliness

that
an
event
will
occur.
The
higher
the

probability
of
an
event,
the
more
certain

we
are
that
the
event
will
occur.

Quiz
1:
Solu*on

1.
Smaller
than
1
in
a
million
—
correct
[P(A,
B)
=

0.00001(=100
000)
0.000001(=1
million)x
0.0001
(=10

000)
<
0.000001;
P
is
1
in
10
million]

2.
Greater
than
1
in
a
million
—
incorrect
[P(A,
B)
=

0.00001(=100
000)
0.000001(=1
million)x
0.0001
(=10

000)
<
0.000001;
P
is
1
in
10
million]

3.
Impossible
to
tell
—
incorrect
[Given
P(A
|
B)
and

P(B),
we
can
derive
P(A,
B)
exactly.]

10

Quiz
1:
only
one
answer
is
correct

11

We
apply
the
following
mul*plica*on
rule:
P(A,B)=P(B)P(A|B),
since
we
know
these
elements:

P(B)
(i.e
1/10
000
=
0.0001)
;
P(A|B)
(i.e
1/1
000
000
=
0.000001)

P(A,B)=P(B)P(A|B)
=
0.0001
*
0.000001
=
0.0000000001
(=
10
000
000
000
=
10
billions)

Result:
the
intersec*on
of
A
and
B
(ie
people
having
BOTH
a
PhD
in
physics
and
winning
a
nobel

prize)
is
1
in
10
billions

1:
is
the
probability
of
1
in
10
billions
smaller
than
1
in
1
million
?
yes!
0.0000000001
is
smaller

than
0.000001

2:
is
the
probability
of
1
in
10
millions
greater
than
1
in
1
million
?
NO!
0.0000000001
is
NOT

smaller
than
0.000001

3:
impossible
to
predict:
INCORRECT!
it
is
possible
to
predict
the
probability
because
you
have

all
the
elements
to
apply
the
mul*plica*on
rule.

Mul*plica*on
Rule

P(A,B)=P(B)P(A|B)
Variant
1

Variant
2

Introduc*on
to
the
concept
of

Marginaliza5on

14

par**on
means:
events
are
disjoint,
ie
they

do
not
have
members
in
common.

In
other
words:
their
intersec*on
is
empty;

their
union
is
the
en*re
sample
space.

This
a
way
to
divide
the
sample
space
in

non-‐overlapping
events.

Pairwise
comparison
generally
refers
to
any

process
of
comparing
en**es
in
pairs…

Given
that
we
have

some
par**ons
and

given
that
we
are

interested
in
another

event
A
in
the
same

sample
space,
then
we

can
compute
the

probability
of
A
by

summing
up
all
the
joint

probabili*es
with
A
to

each
member
of
the

par**on
(this
is
the

summa*on
formula
in

the
middle).

…
con*nued…

15

All
this
seems
a
very
strange

method
because
we
are

compu*ng
something
very

simple,
ie
the
probability
of

A,
from
something
more

complex
involving

summa*on,
joint

probabili*es
and
condi*onal

probabili*es.

But
this
is
something
that
is

very
useful
in
situa5ons

where
we
do
not
know
the

probability
of
A
but
we
know

the
joint
or
the
condi5onal

probabili5es
of
A
with
the

members
of
a
par55on.

Knowing
the
mul*plica*on
rule,
we
also
know
that
the
joint
probability
of
A

and
Bi
can
be
expressed
as
the
condi*onal
probability
of
A
given
Bi
*mes
the

simple
probability
of
Bi.

Marginal
probability

Mul*plica*on
rule

Joint,
Marginal
&
Condi*onal
Probabili*es

16

What
is
important
is
to
understand
the
rela*on
between
the
joint,
the
marginal
and
the
condi*onal

probabili*es,
and
the
way
we
can
derive
them
from
each
other.
In
par*cular,
given
that
we
know
the

joint
probabili*es
of
the
events
we
are
interested
in,
we
can
always
derive
the
marginal
and

condi*onal
probability
from
them,
whereas
the
opposite
does
not
hold
(except
in
some
special

condi*ons).

sum
up
to
1

What
if
we

want
the

simple

probabili*es?

Once
we
have
the
joint
probabili*es
and
the
simple
probabili*es,
we
can
combine

these
to
get
condi*onal
probabili*es.

Joint,
Marginal
&
Condi*onal
Probabili*es

17

Bayes
Law

18

Given
events
A
and
B
in

the
sample
space
omega,

the
condi*onal

probability
of
A
given
B
is

equal
to
the
simple

probability
of
A
*mes
the

inverse
condi*onal

probability,
ie
the

probability
of
B
given
A

divided
by
the
simple

probabiity
of
B.

We
know
thanks
to
the
mul*plica*on/chain
rule
that
the
joint
probabili*es
can
be

replaced
by
the
simple
probability
mul*plied
by
the
condi*onal
probability.

Bayes
Law
is
a
powerful
tool
that
allows
us
to
invert
condi5onal
probability.

When
we
ﬁnd
ourselves
in
a
situa*on
where
we
need
to
know
the
probability
of
A
given

B,
but
our
data
gives
us
only
the
probability
of
B
given
A,
we
can
invert
the
expression

and
get
the
probabili*es
that
we
need
(
a
li7le
bit
more
on
this,
next
*me)

Independence

19

Two
events
A
and
B
independent
if
and
only
if
the
joint
probability
of
A
and
B
is
equal
to
the

simple
probability
of
A
mul*plied
by
the
simple
probability
of
B.

This
is
equivalent
to
say
that
the
probability
of
A
by
itself
is
equal
to
the
condi*onal
probability

of
A
given
B.
Or
viceversa
that
the
simple
probability
of
B
is
equal
to
the
probability
of
B
given
A.

One
way
to
think
of
this
is
to
say
that
if
two
events
are
independent,
knowing
that
one
of
them

has
occurred
does
not
give
us
any
new
informa*on
about
the
other
event,
because
the

condi*onal
probability
is
the
same
as
the
simple
probability.

Quiz
2
(only
one
answer
is
correct)

21

Quiz
2:
Solu*ons
(Joakim’s
original)

1.  The
probability
is
0.1
—
incorrect
[We
cannot

compute
P(A
|
B)
from
P(B
|
A)
without

addi*onal
informa*on.]

2.
The
probability
is
0.9
—
incorrect
[We
cannot

compute
P(A
|
B)
from
P(B
|
A)
without
addi*onal

informa*on.]

3.
Nothing
—
correct
[We
cannot
compute
P(A
|
B)

from
P(B
|
A)
without
addi*onal
informa*on.]

22

Quiz
2:
Solu*ons

1.  The
probability
is
0.1
—
incorrect
[We
cannot

compute
P(Dis|Sym)
from
P(Sym|Dis)
without

addi*onal
informa*on.]

2.  The
probability
is
0.9
—
incorrect
[We
cannot

compute
P(Dis|Sym)
from
P(Sym|Dis)
without

addi*onal
informa*on.]

3.  Nothing
—
correct
[We
cannot
compute
P(Dis|
Sym)
from
P(Sym|Dis)
without
addi*onal

informa*on.]

23

Break
down

•  P(Sym|Dis)
=
0.9
à
P(B|A)=0.9

•  P(Dis|Sym)
=
?
à
P(A|B)=?

•  Bayes:

•  P(A|B)=
P(A)
P(B|A)
/
P(B)

•  P(A)=?

•  P(B)=?

24

We
need

additonal

info,
ie
P(A)

and
P(B)

Can
we
use

marginaliza
;on/Law
of

Total

Probability

to
derive

(A)
and

P(B)?

Total
number
of

individual
outcomes

Prac*cal
Ac*vity
2:
Part-‐of-‐Speech

Bigrams
-‐
Independence

25

See
calcula*ons
overleaf

Prac*cal
Ac*vity
1:
Solu*on

26

Lecture: Joint, Conditional and Marginal Probabilities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Lecture: Joint, Conditional and Marginal Probabilities

Similar to Lecture: Joint, Conditional and Marginal Probabilities (20)

More from Marina Santini

More from Marina Santini (20)

Recently uploaded

Recently uploaded (20)

Lecture: Joint, Conditional and Marginal Probabilities