SlideShare a Scribd company logo
1 of 27
Download to read offline
Joint,	
  Condi*onal	
  and	
  Marginal	
  Probabili*es	
  
	
  
Last	
  Updated:	
  24	
  March	
  2015	
  
	
  
Slideshare:	
  h7p://www.slideshare.net/marinasan*ni1/mathema*cs-­‐for-­‐language-­‐technology	
  
	
  
Mathema*cs	
  for	
  Language	
  Technology	
  
h7p://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/	
  
Marina	
  San*ni	
  
san5nim@stp.lingfil.uu.se	
  
	
  
Department	
  of	
  Linguis*cs	
  and	
  Philology	
  
Uppsala	
  University,	
  Uppsala,	
  Sweden	
  
	
  
Spring	
  2015	
   1	
  
Acknowledgements	
  
•  Several	
  slides	
  borrowed	
  from	
  Prof	
  Joakim	
  Nivre.	
  
•  Prac*cal	
  Ac*vi*es	
  by	
  Prof	
  Joakim	
  Nivre	
  
•  Required	
  Reading:	
  
–  E&G	
  (2013):	
  Ch.	
  5	
  (pp.	
  pp.	
  110-­‐114)	
  
–  Compendium	
  (4):	
  9.2,	
  9.3,	
  9.4	
  
–  E&G	
  (2013):	
  Ch.	
  5.2-­‐5.3	
  (self-­‐study)	
  
•  Recommended	
  Reading:	
  
–  Sec5ons	
  3-­‐6	
  in	
  Goldsmith	
  J.	
  (2007)	
  Probability	
  for	
  Linguists.	
  The	
  
University	
  of	
  Chicago.	
  The	
  Department	
  of	
  Linguis*cs:	
  
•  h7p://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf	
  
2	
  
Outline	
  
•  Joint	
  Probability	
  
•  Condi*onal	
  Probability	
  
•  Mul*plica*on	
  Rule	
  
•  Marginal	
  Probability	
  
•  Bayes	
  Law	
  
•  Independence	
  
3	
  
Linguis*c	
  Note:	
  
•  Tradi*onally,	
  the	
  plural	
  is	
  dice,	
  but	
  the	
  
singular	
  is	
  die.	
  (i.e.	
  1	
  die,	
  2	
  dice.)	
  
•  Modern	
  lexicography	
  says:	
  ex,	
  MacMillan:	
  
–  h7p://www.macmillandic*onary.com/dic*onary/bri*sh/dice_1	
  
Joint	
  vs	
  Condi*onal	
  
In	
  many	
  situa*ons	
  where	
  we	
  want	
  to	
  make	
  
use	
  fo	
  probabili*es,	
  there	
  are	
  
dependencies	
  between	
  different	
  variables	
  
or	
  events.	
  
For	
  this	
  reason	
  we	
  need	
  the	
  no*on	
  of	
  
condi*onal	
  probability,	
  ie	
  the	
  probabability	
  
of	
  an	
  event	
  given	
  some	
  other	
  event.	
  
the	
  condi5onal	
  
probability	
  of	
  A	
  given	
  B	
  is	
  
defined	
  as	
  the	
  probability	
  
of	
  the	
  intersec*on	
  of	
  A	
  
and	
  B	
  divided	
  by	
  the	
  
probability	
  of	
  B.	
  	
  
the	
  probability	
  of	
  the	
  intersec*on	
  
is	
  referred	
  to	
  as	
  the	
  joint	
  
probability	
  because	
  it	
  is	
  the	
  
probability	
  that	
  both	
  A	
  and	
  B	
  
occur.	
  
CONDITIONAL	
  =	
  NOT	
  SYMMETRICAL	
  	
  
5	
  
Condi*onal	
  
When	
  we	
  talk	
  about	
  the	
  joint	
  
probability	
  of	
  A	
  and	
  B,	
  then	
  we	
  
are	
  considering	
  the	
  intersec*on	
  
of	
  A	
  and	
  B,	
  ie	
  those	
  outcomes	
  
that	
  are	
  both	
  in	
  A	
  and	
  B.	
  And	
  we	
  
ask:	
  how	
  large	
  is	
  that	
  set	
  of	
  
events	
  compared	
  to	
  the	
  en*re	
  
sample	
  space?	
  
6	
  
Example:	
  Bigrams	
  
10-­‐3	
  =	
  1/103=1/1000=	
  one	
  in	
  thousand	
  
one	
  in	
  one	
  million	
  
joint	
  probability	
  =	
  one	
  in	
  10	
  millions	
  
We	
  apply	
  the	
  formula	
  
of	
  condi*onal	
  
probability	
  	
  
7	
  
From	
  the	
  defini*on	
  of	
  condi*onal	
  
probability	
  we	
  can	
  derive	
  the	
  
Mul*plica*on	
  Rule	
  
8	
  
One	
  way	
  to	
  
compute	
  the	
  
probability	
  of	
  A	
  
and	
  B	
  (ie	
  the	
  joint	
  
probability)	
  is	
  to	
  
take	
  the	
  
probability	
  of	
  B	
  
by	
  itself	
  and	
  
mul*ply	
  it	
  with	
  
the	
  probability	
  of	
  
A	
  given	
  B.	
  	
  
Another	
  way	
  
to	
  compute	
  
the	
  joint	
  
probability	
  
of	
  A	
  and	
  B	
  is	
  
to	
  start	
  with	
  
the	
  simple	
  
probability	
  
of	
  A	
  and	
  
mul*ply	
  that	
  
by	
  the	
  
probability	
  
of	
  B	
  given	
  A	
  
Quiz	
  1:	
  only	
  one	
  answer	
  is	
  correct	
  
9	
  
Probability	
  is	
  the	
  measure	
  of	
  the	
  likeliness	
  
that	
  an	
  event	
  will	
  occur.	
  The	
  higher	
  the	
  
probability	
  of	
  an	
  event,	
  the	
  more	
  certain	
  
we	
  are	
  that	
  the	
  event	
  will	
  occur.	
  	
  
Quiz	
  1:	
  Solu*on	
  
1.	
  Smaller	
  than	
  1	
  in	
  a	
  million	
  —	
  correct	
  [P(A,	
  B)	
  =	
  
0.00001(=100	
  000)	
  0.000001(=1	
  million)x	
  0.0001	
  (=10	
  
000)	
  <	
  0.000001;	
  P	
  is	
  1	
  in	
  10	
  million]	
  
	
  
2.	
  Greater	
  than	
  1	
  in	
  a	
  million	
  —	
  incorrect	
  [P(A,	
  B)	
  =	
  
0.00001(=100	
  000)	
  0.000001(=1	
  million)x	
  0.0001	
  (=10	
  
000)	
  <	
  0.000001;	
  P	
  is	
  1	
  in	
  10	
  million]	
  
	
  
3.	
  Impossible	
  to	
  tell	
  —	
  incorrect	
  [Given	
  P(A	
  |	
  B)	
  and	
  
P(B),	
  we	
  can	
  derive	
  P(A,	
  B)	
  exactly.]	
  
10	
  
Quiz	
  1:	
  only	
  one	
  answer	
  is	
  correct	
  
11	
  
We	
  apply	
  the	
  following	
  mul*plica*on	
  rule:	
  P(A,B)=P(B)P(A|B),	
  since	
  we	
  know	
  these	
  elements:	
  	
  
P(B)	
  (i.e	
  1/10	
  000	
  =	
  0.0001)	
  ;	
  P(A|B)	
  (i.e	
  1/1	
  000	
  000	
  =	
  0.000001)	
  
	
  
P(A,B)=P(B)P(A|B)	
  =	
  0.0001	
  *	
  0.000001	
  =	
  0.0000000001	
  (=	
  10	
  000	
  000	
  000	
  =	
  10	
  billions)	
  	
  
Result:	
  the	
  intersec*on	
  of	
  A	
  and	
  B	
  (ie	
  people	
  having	
  BOTH	
  a	
  PhD	
  in	
  physics	
  and	
  winning	
  a	
  nobel	
  
prize)	
  is	
  1	
  in	
  10	
  billions	
  
	
  
1:	
  is	
  the	
  probability	
  of	
  1	
  in	
  10	
  billions	
  smaller	
  than	
  1	
  in	
  1	
  million	
  ?	
  yes!	
  0.0000000001	
  is	
  smaller	
  
than	
  0.000001	
  
2:	
  is	
  the	
  probability	
  of	
  1	
  in	
  10	
  millions	
  greater	
  than	
  1	
  in	
  1	
  million	
  ?	
  NO!	
  0.0000000001	
  is	
  NOT	
  
smaller	
  than	
  0.000001	
  
3:	
  impossible	
  to	
  predict:	
  INCORRECT!	
  it	
  is	
  possible	
  to	
  predict	
  the	
  probability	
  because	
  you	
  have	
  
all	
  the	
  elements	
  to	
  apply	
  the	
  mul*plica*on	
  rule.	
  
Mul*plica*on	
  Rule	
  
	
  P(A,B)=P(B)P(A|B)	
   Variant	
  1	
  
Variant	
  2	
  
Marginaliza*on	
  
13	
  
Introduc*on	
  to	
  the	
  concept	
  of	
  
Marginaliza5on	
  
14	
  
par**on	
  means:	
  events	
  are	
  disjoint,	
  ie	
  they	
  
do	
  not	
  have	
  members	
  in	
  common.	
  
	
  
In	
  other	
  words:	
  their	
  intersec*on	
  is	
  empty;	
  
their	
  union	
  is	
  the	
  en*re	
  sample	
  space.	
  
	
  
This	
  a	
  way	
  to	
  divide	
  the	
  sample	
  space	
  in	
  
non-­‐overlapping	
  events.	
  	
  
Pairwise	
  comparison	
  generally	
  refers	
  to	
  any	
  
process	
  of	
  comparing	
  en**es	
  in	
  pairs…	
  
Given	
  that	
  we	
  have	
  
some	
  par**ons	
  and	
  
given	
  that	
  we	
  are	
  
interested	
  in	
  another	
  
event	
  A	
  in	
  the	
  same	
  
sample	
  space,	
  then	
  we	
  
can	
  compute	
  the	
  
probability	
  of	
  A	
  by	
  
summing	
  up	
  all	
  the	
  joint	
  
probabili*es	
  with	
  A	
  to	
  
each	
  member	
  of	
  the	
  
par**on	
  (this	
  is	
  the	
  
summa*on	
  formula	
  in	
  
the	
  middle).	
  	
  
…	
  con*nued…	
  
15	
  
All	
  this	
  seems	
  a	
  very	
  strange	
  
method	
  because	
  we	
  are	
  
compu*ng	
  something	
  very	
  
simple,	
  ie	
  the	
  probability	
  of	
  
A,	
  from	
  something	
  more	
  
complex	
  involving	
  
summa*on,	
  joint	
  
probabili*es	
  and	
  condi*onal	
  
probabili*es.	
  	
  
	
  
	
  
But	
  this	
  is	
  something	
  that	
  is	
  
very	
  useful	
  in	
  situa5ons	
  
where	
  we	
  do	
  not	
  know	
  the	
  
probability	
  of	
  A	
  but	
  we	
  know	
  
the	
  joint	
  or	
  the	
  condi5onal	
  
probabili5es	
  of	
  A	
  with	
  the	
  
members	
  of	
  a	
  par55on.	
  	
  
Knowing	
  the	
  mul*plica*on	
  rule,	
  we	
  also	
  know	
  that	
  the	
  joint	
  probability	
  of	
  A	
  
and	
  Bi	
  can	
  be	
  expressed	
  as	
  the	
  condi*onal	
  probability	
  of	
  A	
  given	
  Bi	
  *mes	
  the	
  
simple	
  probability	
  of	
  Bi.	
  
Marginal	
  probability	
  
Mul*plica*on	
  rule	
  
Joint,	
  Marginal	
  &	
  Condi*onal	
  Probabili*es	
  
16	
  
What	
  is	
  important	
  is	
  to	
  understand	
  the	
  rela*on	
  between	
  the	
  joint,	
  the	
  marginal	
  and	
  the	
  condi*onal	
  
probabili*es,	
  and	
  the	
  way	
  we	
  can	
  derive	
  them	
  from	
  each	
  other.	
  In	
  par*cular,	
  given	
  that	
  we	
  know	
  the	
  
joint	
  probabili*es	
  of	
  the	
  events	
  we	
  are	
  interested	
  in,	
  we	
  can	
  always	
  derive	
  the	
  marginal	
  and	
  
condi*onal	
  probability	
  from	
  them,	
  whereas	
  the	
  opposite	
  does	
  not	
  hold	
  (except	
  in	
  some	
  special	
  
condi*ons).	
  
sum	
  up	
  to	
  1	
  
What	
  if	
  we	
  
want	
  the	
  
simple	
  
probabili*es?	
  
Once	
  we	
  have	
  the	
  joint	
  probabili*es	
  and	
  the	
  simple	
  probabili*es,	
  we	
  can	
  combine	
  
these	
  to	
  get	
  condi*onal	
  probabili*es.	
  	
  
Joint,	
  Marginal	
  &	
  Condi*onal	
  Probabili*es	
  
17	
  
Bayes	
  Law	
  
18	
  
Given	
  events	
  A	
  and	
  B	
  in	
  
the	
  sample	
  space	
  omega,	
  
the	
  condi*onal	
  
probability	
  of	
  A	
  given	
  B	
  is	
  
equal	
  to	
  the	
  simple	
  
probability	
  of	
  A	
  *mes	
  the	
  
inverse	
  condi*onal	
  
probability,	
  ie	
  the	
  
probability	
  of	
  B	
  given	
  A	
  
divided	
  by	
  the	
  simple	
  
probabiity	
  of	
  B.	
  	
  
We	
  know	
  thanks	
  to	
  the	
  mul*plica*on/chain	
  rule	
  that	
  the	
  joint	
  probabili*es	
  can	
  be	
  
replaced	
  by	
  the	
  simple	
  probability	
  mul*plied	
  by	
  the	
  condi*onal	
  probability.	
  	
  
	
  
Bayes	
  Law	
  is	
  a	
  powerful	
  tool	
  that	
  allows	
  us	
  to	
  invert	
  condi5onal	
  probability.	
  	
  
When	
  we	
  find	
  ourselves	
  in	
  a	
  situa*on	
  where	
  we	
  need	
  to	
  know	
  the	
  probability	
  of	
  A	
  given	
  
B,	
  but	
  our	
  data	
  gives	
  us	
  only	
  the	
  probability	
  of	
  B	
  given	
  A,	
  we	
  can	
  invert	
  the	
  expression	
  
and	
  get	
  the	
  probabili*es	
  that	
  we	
  need	
  (	
  a	
  li7le	
  bit	
  more	
  on	
  this,	
  next	
  *me)	
  
Independence	
  
19	
  
Two	
  events	
  A	
  and	
  B	
  independent	
  if	
  and	
  only	
  if	
  the	
  joint	
  probability	
  of	
  A	
  and	
  B	
  is	
  equal	
  to	
  the	
  
simple	
  probability	
  of	
  A	
  mul*plied	
  by	
  the	
  simple	
  probability	
  of	
  B.	
  	
  
	
  
This	
  is	
  equivalent	
  to	
  say	
  that	
  the	
  probability	
  of	
  A	
  by	
  itself	
  is	
  equal	
  to	
  the	
  condi*onal	
  probability	
  
of	
  A	
  given	
  B.	
  Or	
  viceversa	
  that	
  the	
  simple	
  probability	
  of	
  B	
  is	
  equal	
  to	
  the	
  probability	
  of	
  B	
  given	
  A.	
  	
  
	
  
One	
  way	
  to	
  think	
  of	
  this	
  is	
  to	
  say	
  that	
  if	
  two	
  events	
  are	
  independent,	
  knowing	
  that	
  one	
  of	
  them	
  
has	
  occurred	
  does	
  not	
  give	
  us	
  any	
  new	
  informa*on	
  about	
  the	
  other	
  event,	
  because	
  the	
  
condi*onal	
  probability	
  is	
  the	
  same	
  as	
  the	
  simple	
  probability.	
  	
  
Independence	
  
20	
  
Quiz	
  2	
  (only	
  one	
  answer	
  is	
  correct)	
  
21	
  
Quiz	
  2:	
  Solu*ons	
  (Joakim’s	
  original)	
  
1.  The	
  probability	
  is	
  0.1	
  —	
  incorrect	
  [We	
  cannot	
  
compute	
  P(A	
  |	
  B)	
  from	
  P(B	
  |	
  A)	
  without	
  
addi*onal	
  informa*on.]	
  
2.	
  The	
  probability	
  is	
  0.9	
  —	
  incorrect	
  [We	
  cannot	
  
compute	
  P(A	
  |	
  B)	
  from	
  P(B	
  |	
  A)	
  without	
  addi*onal	
  
informa*on.]	
  
	
  
3.	
  Nothing	
  —	
  correct	
  [We	
  cannot	
  compute	
  P(A	
  |	
  B)	
  
from	
  P(B	
  |	
  A)	
  without	
  addi*onal	
  informa*on.]	
  
22	
  
Quiz	
  2:	
  Solu*ons	
  
1.  The	
  probability	
  is	
  0.1	
  —	
  incorrect	
  [We	
  cannot	
  
compute	
  P(Dis|Sym)	
  from	
  P(Sym|Dis)	
  without	
  
addi*onal	
  informa*on.]	
  
2.  The	
  probability	
  is	
  0.9	
  —	
  incorrect	
  [We	
  cannot	
  
compute	
  P(Dis|Sym)	
  from	
  P(Sym|Dis)	
  without	
  
addi*onal	
  informa*on.]	
  
3.  Nothing	
  —	
  correct	
  [We	
  cannot	
  compute	
  P(Dis|
Sym)	
  from	
  P(Sym|Dis)	
  without	
  addi*onal	
  
informa*on.]	
  
23	
  
Break	
  down	
  
•  P(Sym|Dis)	
  =	
  0.9	
  à	
  P(B|A)=0.9	
  
•  P(Dis|Sym)	
  =	
  ?	
  à	
  P(A|B)=?	
  
•  Bayes:	
  	
  
•  P(A|B)=	
  P(A)	
  P(B|A)	
  /	
  P(B)	
  
•  P(A)=?	
  
•  P(B)=?	
  
24	
  
We	
  need	
  
additonal	
  
info,	
  ie	
  P(A)	
  
and	
  P(B)	
  
Can	
  we	
  use	
  
marginaliza
;on/Law	
  of	
  
Total	
  
Probability	
  
to	
  derive	
  
(A)	
  and	
  
P(B)?	
  	
  
Total	
  number	
  of	
  
individual	
  outcomes	
  
Prac*cal	
  Ac*vity	
  2:	
  Part-­‐of-­‐Speech	
  
Bigrams	
  -­‐	
  Independence	
  
25	
  
See	
  calcula*ons	
  overleaf	
  
Prac*cal	
  Ac*vity	
  1:	
  Solu*on	
  	
  
26	
  
The	
  end	
  
27	
  

More Related Content

What's hot

PROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULESPROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULES
Bhargavi Bhanu
 

What's hot (20)

Probability And Its Axioms
Probability And Its AxiomsProbability And Its Axioms
Probability And Its Axioms
 
Statistics: Probability
Statistics: ProbabilityStatistics: Probability
Statistics: Probability
 
Joint probability
Joint probabilityJoint probability
Joint probability
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Bayes Theorem
Bayes TheoremBayes Theorem
Bayes Theorem
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)
 
Introduction to probability distributions-Statistics and probability analysis
Introduction to probability distributions-Statistics and probability analysis Introduction to probability distributions-Statistics and probability analysis
Introduction to probability distributions-Statistics and probability analysis
 
Sampling Distributions and Estimators
Sampling Distributions and Estimators Sampling Distributions and Estimators
Sampling Distributions and Estimators
 
Addition rule and multiplication rule
Addition rule and multiplication rule  Addition rule and multiplication rule
Addition rule and multiplication rule
 
PROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULESPROBABILITY AND IT'S TYPES WITH RULES
PROBABILITY AND IT'S TYPES WITH RULES
 
Poisson distribution
Poisson distributionPoisson distribution
Poisson distribution
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Conditional probability
Conditional probabilityConditional probability
Conditional probability
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Bernoulli distribution
Bernoulli distributionBernoulli distribution
Bernoulli distribution
 
Mathematical Expectation And Variance
Mathematical Expectation And VarianceMathematical Expectation And Variance
Mathematical Expectation And Variance
 
probability
probabilityprobability
probability
 
Introduction to Probability and Probability Distributions
Introduction to Probability and Probability DistributionsIntroduction to Probability and Probability Distributions
Introduction to Probability and Probability Distributions
 

Viewers also liked

Probability Density Functions
Probability Density FunctionsProbability Density Functions
Probability Density Functions
guestb86588
 
Understanding Conditional Probability
Understanding  Conditional ProbabilityUnderstanding  Conditional Probability
Understanding Conditional Probability
guest98b107
 
Conditional Probability
Conditional ProbabilityConditional Probability
Conditional Probability
shannonrenee4
 
STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution
jundumaug1
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
Steve Bishop
 
Mean, Median, Mode: Measures of Central Tendency
Mean, Median, Mode: Measures of Central Tendency Mean, Median, Mode: Measures of Central Tendency
Mean, Median, Mode: Measures of Central Tendency
Jan Nah
 

Viewers also liked (15)

Probability concept and Probability distribution
Probability concept and Probability distributionProbability concept and Probability distribution
Probability concept and Probability distribution
 
Probability Density Functions
Probability Density FunctionsProbability Density Functions
Probability Density Functions
 
Statistics - Probability theory 1
Statistics - Probability theory 1Statistics - Probability theory 1
Statistics - Probability theory 1
 
PROBABILITY
PROBABILITYPROBABILITY
PROBABILITY
 
Statistical sampling
Statistical samplingStatistical sampling
Statistical sampling
 
Understanding Conditional Probability
Understanding  Conditional ProbabilityUnderstanding  Conditional Probability
Understanding Conditional Probability
 
Probability Density Function (PDF)
Probability Density Function (PDF)Probability Density Function (PDF)
Probability Density Function (PDF)
 
Theorems And Conditional Probability
Theorems And Conditional ProbabilityTheorems And Conditional Probability
Theorems And Conditional Probability
 
Conditional Probability
Conditional ProbabilityConditional Probability
Conditional Probability
 
Fundamentals Probability 08072009
Fundamentals Probability 08072009Fundamentals Probability 08072009
Fundamentals Probability 08072009
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Tree diagrams
Tree diagramsTree diagrams
Tree diagrams
 
STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Mean, Median, Mode: Measures of Central Tendency
Mean, Median, Mode: Measures of Central Tendency Mean, Median, Mode: Measures of Central Tendency
Mean, Median, Mode: Measures of Central Tendency
 

Similar to Lecture: Joint, Conditional and Marginal Probabilities

Bayes primer2
Bayes primer2Bayes primer2
Bayes primer2
MhAcKnI
 
LU2 Basic Probability
LU2 Basic ProbabilityLU2 Basic Probability
LU2 Basic Probability
ashikin654
 
Ppt unit-05-mbf103
Ppt unit-05-mbf103Ppt unit-05-mbf103
Ppt unit-05-mbf103
Vibha Nayak
 

Similar to Lecture: Joint, Conditional and Marginal Probabilities (20)

Probability concepts for Data Analytics
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data Analytics
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability Theory
 
tps5e_Ch5_3.ppt
tps5e_Ch5_3.ppttps5e_Ch5_3.ppt
tps5e_Ch5_3.ppt
 
tps5e_Ch5_3.ppt
tps5e_Ch5_3.ppttps5e_Ch5_3.ppt
tps5e_Ch5_3.ppt
 
1-Probability-Conditional-Bayes.pdf
1-Probability-Conditional-Bayes.pdf1-Probability-Conditional-Bayes.pdf
1-Probability-Conditional-Bayes.pdf
 
Lecture3 Applied Econometrics and Economic Modeling
Lecture3 Applied Econometrics and Economic ModelingLecture3 Applied Econometrics and Economic Modeling
Lecture3 Applied Econometrics and Economic Modeling
 
S244 10 Probability.ppt
S244 10 Probability.pptS244 10 Probability.ppt
S244 10 Probability.ppt
 
Bayes primer2
Bayes primer2Bayes primer2
Bayes primer2
 
Lect w3 probability
Lect w3 probabilityLect w3 probability
Lect w3 probability
 
Data Distribution &The Probability Distributions
Data Distribution &The Probability DistributionsData Distribution &The Probability Distributions
Data Distribution &The Probability Distributions
 
LU2 Basic Probability
LU2 Basic ProbabilityLU2 Basic Probability
LU2 Basic Probability
 
Chapter 4 part1-Probability Model
Chapter 4 part1-Probability ModelChapter 4 part1-Probability Model
Chapter 4 part1-Probability Model
 
Probability&Bayes theorem
Probability&Bayes theoremProbability&Bayes theorem
Probability&Bayes theorem
 
Week 2 notes.ppt
Week 2 notes.pptWeek 2 notes.ppt
Week 2 notes.ppt
 
Probability Theory.pdf
Probability Theory.pdfProbability Theory.pdf
Probability Theory.pdf
 
Probability
ProbabilityProbability
Probability
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributions
 
basic probability Lecture 9.pptx
basic probability Lecture 9.pptxbasic probability Lecture 9.pptx
basic probability Lecture 9.pptx
 
Day 3.pptx
Day 3.pptxDay 3.pptx
Day 3.pptx
 
Ppt unit-05-mbf103
Ppt unit-05-mbf103Ppt unit-05-mbf103
Ppt unit-05-mbf103
 

More from Marina Santini

More from Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Recently uploaded

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Lecture: Joint, Conditional and Marginal Probabilities

  • 1. Joint,  Condi*onal  and  Marginal  Probabili*es     Last  Updated:  24  March  2015     Slideshare:  h7p://www.slideshare.net/marinasan*ni1/mathema*cs-­‐for-­‐language-­‐technology     Mathema*cs  for  Language  Technology   h7p://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/   Marina  San*ni   san5nim@stp.lingfil.uu.se     Department  of  Linguis*cs  and  Philology   Uppsala  University,  Uppsala,  Sweden     Spring  2015   1  
  • 2. Acknowledgements   •  Several  slides  borrowed  from  Prof  Joakim  Nivre.   •  Prac*cal  Ac*vi*es  by  Prof  Joakim  Nivre   •  Required  Reading:   –  E&G  (2013):  Ch.  5  (pp.  pp.  110-­‐114)   –  Compendium  (4):  9.2,  9.3,  9.4   –  E&G  (2013):  Ch.  5.2-­‐5.3  (self-­‐study)   •  Recommended  Reading:   –  Sec5ons  3-­‐6  in  Goldsmith  J.  (2007)  Probability  for  Linguists.  The   University  of  Chicago.  The  Department  of  Linguis*cs:   •  h7p://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf   2  
  • 3. Outline   •  Joint  Probability   •  Condi*onal  Probability   •  Mul*plica*on  Rule   •  Marginal  Probability   •  Bayes  Law   •  Independence   3  
  • 4. Linguis*c  Note:   •  Tradi*onally,  the  plural  is  dice,  but  the   singular  is  die.  (i.e.  1  die,  2  dice.)   •  Modern  lexicography  says:  ex,  MacMillan:   –  h7p://www.macmillandic*onary.com/dic*onary/bri*sh/dice_1  
  • 5. Joint  vs  Condi*onal   In  many  situa*ons  where  we  want  to  make   use  fo  probabili*es,  there  are   dependencies  between  different  variables   or  events.   For  this  reason  we  need  the  no*on  of   condi*onal  probability,  ie  the  probabability   of  an  event  given  some  other  event.   the  condi5onal   probability  of  A  given  B  is   defined  as  the  probability   of  the  intersec*on  of  A   and  B  divided  by  the   probability  of  B.     the  probability  of  the  intersec*on   is  referred  to  as  the  joint   probability  because  it  is  the   probability  that  both  A  and  B   occur.   CONDITIONAL  =  NOT  SYMMETRICAL     5  
  • 6. Condi*onal   When  we  talk  about  the  joint   probability  of  A  and  B,  then  we   are  considering  the  intersec*on   of  A  and  B,  ie  those  outcomes   that  are  both  in  A  and  B.  And  we   ask:  how  large  is  that  set  of   events  compared  to  the  en*re   sample  space?   6  
  • 7. Example:  Bigrams   10-­‐3  =  1/103=1/1000=  one  in  thousand   one  in  one  million   joint  probability  =  one  in  10  millions   We  apply  the  formula   of  condi*onal   probability     7  
  • 8. From  the  defini*on  of  condi*onal   probability  we  can  derive  the   Mul*plica*on  Rule   8   One  way  to   compute  the   probability  of  A   and  B  (ie  the  joint   probability)  is  to   take  the   probability  of  B   by  itself  and   mul*ply  it  with   the  probability  of   A  given  B.     Another  way   to  compute   the  joint   probability   of  A  and  B  is   to  start  with   the  simple   probability   of  A  and   mul*ply  that   by  the   probability   of  B  given  A  
  • 9. Quiz  1:  only  one  answer  is  correct   9   Probability  is  the  measure  of  the  likeliness   that  an  event  will  occur.  The  higher  the   probability  of  an  event,  the  more  certain   we  are  that  the  event  will  occur.    
  • 10. Quiz  1:  Solu*on   1.  Smaller  than  1  in  a  million  —  correct  [P(A,  B)  =   0.00001(=100  000)  0.000001(=1  million)x  0.0001  (=10   000)  <  0.000001;  P  is  1  in  10  million]     2.  Greater  than  1  in  a  million  —  incorrect  [P(A,  B)  =   0.00001(=100  000)  0.000001(=1  million)x  0.0001  (=10   000)  <  0.000001;  P  is  1  in  10  million]     3.  Impossible  to  tell  —  incorrect  [Given  P(A  |  B)  and   P(B),  we  can  derive  P(A,  B)  exactly.]   10  
  • 11. Quiz  1:  only  one  answer  is  correct   11   We  apply  the  following  mul*plica*on  rule:  P(A,B)=P(B)P(A|B),  since  we  know  these  elements:     P(B)  (i.e  1/10  000  =  0.0001)  ;  P(A|B)  (i.e  1/1  000  000  =  0.000001)     P(A,B)=P(B)P(A|B)  =  0.0001  *  0.000001  =  0.0000000001  (=  10  000  000  000  =  10  billions)     Result:  the  intersec*on  of  A  and  B  (ie  people  having  BOTH  a  PhD  in  physics  and  winning  a  nobel   prize)  is  1  in  10  billions     1:  is  the  probability  of  1  in  10  billions  smaller  than  1  in  1  million  ?  yes!  0.0000000001  is  smaller   than  0.000001   2:  is  the  probability  of  1  in  10  millions  greater  than  1  in  1  million  ?  NO!  0.0000000001  is  NOT   smaller  than  0.000001   3:  impossible  to  predict:  INCORRECT!  it  is  possible  to  predict  the  probability  because  you  have   all  the  elements  to  apply  the  mul*plica*on  rule.  
  • 12. Mul*plica*on  Rule    P(A,B)=P(B)P(A|B)   Variant  1   Variant  2  
  • 14. Introduc*on  to  the  concept  of   Marginaliza5on   14   par**on  means:  events  are  disjoint,  ie  they   do  not  have  members  in  common.     In  other  words:  their  intersec*on  is  empty;   their  union  is  the  en*re  sample  space.     This  a  way  to  divide  the  sample  space  in   non-­‐overlapping  events.     Pairwise  comparison  generally  refers  to  any   process  of  comparing  en**es  in  pairs…   Given  that  we  have   some  par**ons  and   given  that  we  are   interested  in  another   event  A  in  the  same   sample  space,  then  we   can  compute  the   probability  of  A  by   summing  up  all  the  joint   probabili*es  with  A  to   each  member  of  the   par**on  (this  is  the   summa*on  formula  in   the  middle).    
  • 15. …  con*nued…   15   All  this  seems  a  very  strange   method  because  we  are   compu*ng  something  very   simple,  ie  the  probability  of   A,  from  something  more   complex  involving   summa*on,  joint   probabili*es  and  condi*onal   probabili*es.         But  this  is  something  that  is   very  useful  in  situa5ons   where  we  do  not  know  the   probability  of  A  but  we  know   the  joint  or  the  condi5onal   probabili5es  of  A  with  the   members  of  a  par55on.     Knowing  the  mul*plica*on  rule,  we  also  know  that  the  joint  probability  of  A   and  Bi  can  be  expressed  as  the  condi*onal  probability  of  A  given  Bi  *mes  the   simple  probability  of  Bi.   Marginal  probability   Mul*plica*on  rule  
  • 16. Joint,  Marginal  &  Condi*onal  Probabili*es   16   What  is  important  is  to  understand  the  rela*on  between  the  joint,  the  marginal  and  the  condi*onal   probabili*es,  and  the  way  we  can  derive  them  from  each  other.  In  par*cular,  given  that  we  know  the   joint  probabili*es  of  the  events  we  are  interested  in,  we  can  always  derive  the  marginal  and   condi*onal  probability  from  them,  whereas  the  opposite  does  not  hold  (except  in  some  special   condi*ons).   sum  up  to  1   What  if  we   want  the   simple   probabili*es?   Once  we  have  the  joint  probabili*es  and  the  simple  probabili*es,  we  can  combine   these  to  get  condi*onal  probabili*es.    
  • 17. Joint,  Marginal  &  Condi*onal  Probabili*es   17  
  • 18. Bayes  Law   18   Given  events  A  and  B  in   the  sample  space  omega,   the  condi*onal   probability  of  A  given  B  is   equal  to  the  simple   probability  of  A  *mes  the   inverse  condi*onal   probability,  ie  the   probability  of  B  given  A   divided  by  the  simple   probabiity  of  B.     We  know  thanks  to  the  mul*plica*on/chain  rule  that  the  joint  probabili*es  can  be   replaced  by  the  simple  probability  mul*plied  by  the  condi*onal  probability.       Bayes  Law  is  a  powerful  tool  that  allows  us  to  invert  condi5onal  probability.     When  we  find  ourselves  in  a  situa*on  where  we  need  to  know  the  probability  of  A  given   B,  but  our  data  gives  us  only  the  probability  of  B  given  A,  we  can  invert  the  expression   and  get  the  probabili*es  that  we  need  (  a  li7le  bit  more  on  this,  next  *me)  
  • 19. Independence   19   Two  events  A  and  B  independent  if  and  only  if  the  joint  probability  of  A  and  B  is  equal  to  the   simple  probability  of  A  mul*plied  by  the  simple  probability  of  B.       This  is  equivalent  to  say  that  the  probability  of  A  by  itself  is  equal  to  the  condi*onal  probability   of  A  given  B.  Or  viceversa  that  the  simple  probability  of  B  is  equal  to  the  probability  of  B  given  A.       One  way  to  think  of  this  is  to  say  that  if  two  events  are  independent,  knowing  that  one  of  them   has  occurred  does  not  give  us  any  new  informa*on  about  the  other  event,  because  the   condi*onal  probability  is  the  same  as  the  simple  probability.    
  • 21. Quiz  2  (only  one  answer  is  correct)   21  
  • 22. Quiz  2:  Solu*ons  (Joakim’s  original)   1.  The  probability  is  0.1  —  incorrect  [We  cannot   compute  P(A  |  B)  from  P(B  |  A)  without   addi*onal  informa*on.]   2.  The  probability  is  0.9  —  incorrect  [We  cannot   compute  P(A  |  B)  from  P(B  |  A)  without  addi*onal   informa*on.]     3.  Nothing  —  correct  [We  cannot  compute  P(A  |  B)   from  P(B  |  A)  without  addi*onal  informa*on.]   22  
  • 23. Quiz  2:  Solu*ons   1.  The  probability  is  0.1  —  incorrect  [We  cannot   compute  P(Dis|Sym)  from  P(Sym|Dis)  without   addi*onal  informa*on.]   2.  The  probability  is  0.9  —  incorrect  [We  cannot   compute  P(Dis|Sym)  from  P(Sym|Dis)  without   addi*onal  informa*on.]   3.  Nothing  —  correct  [We  cannot  compute  P(Dis| Sym)  from  P(Sym|Dis)  without  addi*onal   informa*on.]   23  
  • 24. Break  down   •  P(Sym|Dis)  =  0.9  à  P(B|A)=0.9   •  P(Dis|Sym)  =  ?  à  P(A|B)=?   •  Bayes:     •  P(A|B)=  P(A)  P(B|A)  /  P(B)   •  P(A)=?   •  P(B)=?   24   We  need   additonal   info,  ie  P(A)   and  P(B)   Can  we  use   marginaliza ;on/Law  of   Total   Probability   to  derive   (A)  and   P(B)?     Total  number  of   individual  outcomes  
  • 25. Prac*cal  Ac*vity  2:  Part-­‐of-­‐Speech   Bigrams  -­‐  Independence   25   See  calcula*ons  overleaf  
  • 26. Prac*cal  Ac*vity  1:  Solu*on     26