SlideShare a Scribd company logo
1 of 55
Probability Review
Thursday Sep 13
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Sample space and Events
• W : Sample Space, result of an experiment
• If you toss a coin twice W = {HH,HT,TH,TT}
• Event: a subset of W
• First toss is head = {HH,HT}
• S: event space, a set of events:
• Closed under finite union and complements
• Entails other binary operation: union, diff, etc.
• Contains the empty event and W
Probability Measure
• Defined over (W,S) s.t.
• P(a) >= 0 for all a in S
• P(W) = 1
• If a, b are disjoint, then
• P(a U b) = p(a) + p(b)
• We can deduce other axioms from the above ones
• Ex: P(a U b) for non-disjoint event
P(a U b) = p(a) + p(b) – p(a ∩ b)
Visualization
• We can go on and define conditional
probability, using the above visualization
Conditional Probability
P(F|H) = Fraction of worlds in which H is true that also
have F true
)
(
)
(
)
|
(
H
p
H
F
p
h
f
p

=
Rule of total probability
A
B1
B2
B3
B4
B5
B6
B7
 )  )  )

= i
i B
A
P
B
P
A
p |
From Events to Random Variable
• Almost all the semester we will be dealing with RV
• Concise way of specifying attributes of outcomes
• Modeling students (Grade and Intelligence):
• W = all possible students
• What are events
• Grade_A = all students with grade A
• Grade_B = all students with grade B
• Intelligence_High = … with high intelligence
• Very cumbersome
• We need “functions” that maps from W to an
attribute space.
• P(G = A) = P({student ϵ W : G(student) = A})
Random Variables
W
High
low
A
B A+
I:Intelligence
G:Grade
P(I = high) = P( {all students whose intelligence is high})
Discrete Random Variables
• Random variables (RVs) which may take on
only a countable number of distinct values
– E.g. the total number of tails X you get if you flip
100 coins
• X is a RV with arity k if it can take on exactly
one value out of {x1, …, xk}
– E.g. the possible values that X can take on are 0, 1,
2, …, 100
Probability of Discrete RV
• Probability mass function (pmf): P(X = xi)
• Easy facts about pmf
 Σi P(X = xi) = 1
 P(X = xi∩X = xj) = 0 if i ≠ j
 P(X = xi U X = xj) = P(X = xi) + P(X = xj) if i ≠ j
 P(X = x1 U X = x2 U … U X = xk) = 1
Common Distributions
• Uniform X U[1, …, N]
 X takes values 1, 2, … N
 P(X = i) = 1/N
 E.g. picking balls of different colors from a box
• Binomial X Bin(n, p)
 X takes values 0, 1, …, n

 E.g. coin flips

p(X = i) =
n
i





pi
(1 p)ni
Continuous Random Variables
• Probability density function (pdf) instead of
probability mass function (pmf)
• A pdf is any function f(x) that describes the
probability density in terms of the input
variable x.
Probability of Continuous RV
• Properties of pdf


• Actual probability can be obtained by taking
the integral of pdf
 E.g. the probability of X being between 0 and 1 is

f (x)  0,x
f (x) =1



P(0  X 1) = f (x)dx
0
1

Cumulative Distribution Function
• FX(v) = P(X ≤ v)
• Discrete RVs
 FX(v) = Σvi P(X = vi)
• Continuous RVs


FX (v) = f (x)dx

v

d
dx
Fx (x) = f (x)
Common Distributions
• Normal X N(μ, σ2)

 E.g. the height of the entire population

f (x) =
1
 2
exp 
(x  )2
22






Multivariate Normal
• Generalization to higher dimensions of the
one-dimensional normal
f r
X
(xi,...,xd ) =
1
(2)d /2

1/2

exp 
1
2
r
x  
 )
T
1 r
x  
 )






.
Covariance matrix
Mean
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Joint Probability Distribution
• Random variables encodes attributes
• Not all possible combination of attributes are equally
likely
• Joint probability distributions quantify this
• P( X= x, Y= y) = P(x, y)
• Generalizes to N-RVs
•
•
 )
  =
=
=
x y
y
Y
x
X
P 1
,
 )
 =
x y
Y
X dxdy
y
x
f 1
,
,
Chain Rule
• Always true
• P(x, y, z) = p(x) p(y|x) p(z|x, y)
= p(z) p(y|z) p(x|y, z)
=…
Conditional Probability
 )
 )
 )
P X Y
P X Y
P Y
x y
x y
y
=  =
= = =
=
 )
)
(
)
,
(
|
y
p
y
x
p
y
x
P =
But we will always write it this way:
events
Marginalization
• We know p(X, Y), what is P(X=x)?
• We can use the low of total probability, why?
 )  )
 )  )


=
=
y
y
y
x
P
y
P
y
x
P
x
p
|
,
A
B1
B2
B3
B4
B5
B6
B7
Marginalization Cont.
• Another example
 )  )
 )  )


=
=
y
z
z
y
z
y
x
P
z
y
P
z
y
x
P
x
p
,
,
,
|
,
,
,
Bayes Rule
• We know that P(rain) = 0.5
• If we also know that the grass is wet, then
how this affects our belief about whether it
rains or not?

P rain | wet
 )=
P(rain)P(wet | rain)
P(wet)
P x | y
 )=
P(x)P(y | x)
P(y)
Bayes Rule cont.
• You can condition on more variables
 )
)
|
(
)
,
|
(
)
|
(
,
|
z
y
P
z
x
y
P
z
x
P
z
y
x
P =
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Independence
• X is independent of Y means that knowing Y
does not change our belief about X.
• P(X|Y=y) = P(X)
• P(X=x, Y=y) = P(X=x) P(Y=y)
• The above should hold for all x, y
• It is symmetric and written as X  Y
Independence
• X1, …, Xn are independent if and only if
• If X1, …, Xn are independent and identically
distributed we say they are iid (or that they
are a random sample) and we write

P(X1  A1,...,Xn  An ) = P Xi  Ai
 )
i=1
n

X1, …, Xn ∼ P
CI: Conditional Independence
• RV are rarely independent but we can still
leverage local structural properties like
Conditional Independence.
• X  Y | Z if once Z is observed, knowing the
value of Y does not change our belief about X
• P(rain  sprinkler’s on | cloudy)
• P(rain  sprinkler’s on | wet grass)
Conditional Independence
• P(X=x | Z=z, Y=y) = P(X=x | Z=z)
• P(Y=y | Z=z, X=x) = P(Y=y | Z=z)
• P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)
We call these factors : very useful concept !!
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Mean and Variance
• Mean (Expectation):
– Discrete RVs:
– Continuous RVs:
 )
X
E
 =
 )  )
X P X
i
i i
v
E v v
= =

 )  )
X
E xf x dx


= 

E(g(X)) = g(vi)P(X = vi)
vi

E(g(X)) = g(x) f (x)dx



Mean and Variance
• Variance:
– Discrete RVs:
– Continuous RVs:
• Covariance:
 )  )  )
2
X P X
i
i i
v
V v v

=  =

 )  )  )
2
X
V x f x dx



= 


Var(X) = E((X  )2
)
Var(X) = E(X2
)  2
Cov(X,Y) = E((X  x )(Y  y )) = E(XY)  xy
Mean and Variance
• Correlation:

(X,Y) = Cov(X,Y)/xy

1 (X,Y) 1
Properties
• Mean
–
–
– If X and Y are independent,
• Variance
–
– If X and Y are independent,
 )  )  )
X Y X Y
E E E
 = 
 )  )
X X
E a aE
=
 )  )  )
XY X Y
E E E
= 
 )  )
2
X X
V a b a V
 =
 )
X Y (X) (Y)
V V V
 = 
Some more properties
• The conditional expectation of Y given X when
the value of X = x is:
• The Law of Total Expectation or Law of
Iterated Expectation:
 ) dy
x
y
p
y
x
X
Y
E )
|
(
*
| 
=
=
   =
=
= dx
x
p
x
X
Y
E
X
Y
E
E
Y
E X )
(
)
|
(
)
|
(
)
(
Some more properties
• The law of Total Variance:

Var(Y) =Var E(Y | X)
  E Var(Y | X)
 
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
The Big Picture
Model Data
Probability
Estimation/learning
Statistical Inference
• Given observations from a model
– What (conditional) independence assumptions
hold?
• Structure learning
– If you know the family of the model (ex,
multinomial), What are the value of the
parameters: MLE, Bayesian estimation.
• Parameter learning
Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
Monty Hall Problem
• You're given the choice of three doors: Behind one
door is a car; behind the others, goats.
• You pick a door, say No. 1
• The host, who knows what's behind the doors, opens
another door, say No. 3, which has a goat.
• Do you want to pick door No. 2 instead?
Host must
reveal Goat B
Host must
reveal Goat A
Host reveals
Goat A
or
Host reveals
Goat B
Monty Hall Problem: Bayes Rule
• : the car is behind door i, i = 1, 2, 3
•
• : the host opens door j after you pick door i
•
i
C
ij
H
 ) 1 3
i
P C =
 )
0
0
1 2
1 ,
ij k
i j
j k
P H C
i k
i k j k
=

 =

= 
=

  

Monty Hall Problem: Bayes Rule cont.
• WLOG, i=1, j=3
•
•
 )
 )  )
 )
13 1 1
1 13
13
P H C P C
P C H
P H
=
 )  )
13 1 1
1 1 1
2 3 6
P H C P C =  =
•
•
Monty Hall Problem: Bayes Rule cont.
 )  )  )  )
 )  )  )  )
13 13 1 13 2 13 3
13 1 1 13 2 2
, , ,
1 1
1
6 3
1
2
P H P H C P H C P H C
P H C P C P H C P C
=  
= 
=  
=
 )
1 13
1 6 1
1 2 3
P C H = =
Monty Hall Problem: Bayes Rule cont.
 )
1 13
1 6 1
1 2 3
P C H = =


 You should switch!
 )  )
2 13 1 13
1 2
1
3 3
P C H P C H
=  = 
Information Theory
• P(X) encodes our uncertainty about X
• Some variables are more uncertain that others
• How can we quantify this intuition?
• Entropy: average number of bits required to encode X
P(X) P(Y)
X Y
 )
 )
 )
 )
 )

 
=
=






=
x
x
P x
P
x
P
x
P
x
P
x
p
E
X
H )
(
log
1
log
1
log
Information Theory cont.
• Entropy: average number of bits required to encode X
• We can define conditional entropy similarly
• i.e. once Y is known, we only need H(X,Y) – H(Y) bits
• We can also define chain rule for entropies (not surprising)
 )
 )
 )  )
Y
H
Y
X
H
y
x
p
E
Y
X
H P
P
P 
=






= ,
|
1
log
|
 )  )  )  )
Y
X
Z
H
X
Y
H
X
H
Z
Y
X
H P
P
P
P ,
|
|
,
, 

=
 )
 )
 )
 )
 )

 
=
=






=
x
x
P x
P
x
P
x
P
x
P
x
p
E
X
H )
(
log
1
log
1
log
Mutual Information: MI
• Remember independence?
• If XY then knowing Y won’t change our belief about X
• Mutual information can help quantify this! (not the only
way though)
• MI:
• “The amount of uncertainty in X which is removed by
knowing Y”
• Symmetric
• I(X;Y) = 0 iff, X and Y are independent!
 )  )  )
Y
X
H
X
H
Y
X
I P
P
P |
; 
=
 







=
y x y
p
x
p
y
x
p
y
x
p
Y
X
I
)
(
)
(
)
,
(
log
)
,
(
)
;
(
Chi Square Test for Independence
(Example)
Republican Democrat Independent Total
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000
• State the hypotheses
H0: Gender and voting preferences are independent.
Ha: Gender and voting preferences are not independent
• Choose significance level
Say, 0.05
Chi Square Test for Independence
• Analyze sample data
• Degrees of freedom =
|g|-1 * |v|-1 = (2-1) * (3-1) = 2
• Expected frequency count =
Eg,v = (ng * nv) / n
Em,r = (400 * 450) / 1000 = 180000/1000 = 180
Em,d= (400 * 450) / 1000 = 180000/1000 = 180
Em,i = (400 * 100) / 1000 = 40000/1000 = 40
Ef,r = (600 * 450) / 1000 = 270000/1000 = 270
Ef,d = (600 * 450) / 1000 = 270000/1000 = 270
Ef,i = (600 * 100) / 1000 = 60000/1000 = 60
Republican Democrat Independent Total
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000
Chi Square Test for Independence
• Chi-square test statistic
• Χ2 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40 +
(250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/40
• Χ2 = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 +
100/60
• Χ2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2







 
= 
v
g
v
g
v
g
E
E
O
X
,
2
,
,
2
)
(
Republican Democrat Independent Total
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000
Chi Square Test for Independence
• P-value
– Probability of observing a sample statistic as
extreme as the test statistic
– P(X2 ≥ 16.2) = 0.0003
• Since P-value (0.0003) is less than the
significance level (0.05), we cannot accept the
null hypothesis
• There is a relationship between gender and
voting preference
Acknowledgment
• Carlos Guestrin recitation slides:
http://www.cs.cmu.edu/~guestrin/Class/10708/recitations/r1/Probability_and_St
atistics_Review.ppt
• Andrew Moore Tutorial:
http://www.autonlab.org/tutorials/prob.html
• Monty hall problem:
http://en.wikipedia.org/wiki/Monty_Hall_problem
• http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html
• Chi-square test for independence
http://stattrek.com/chi-square-test/independence.aspx

More Related Content

Similar to Probability_Review.ppt for your knowledg

Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.pptakashok1v
 
Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.pptssuserc7c104
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic tradingQuantInsti
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Fuzzy logic-introduction
Fuzzy logic-introductionFuzzy logic-introduction
Fuzzy logic-introductionWBUTTUTORIALS
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECsundarKanagaraj1
 
Recursive State Estimation AI for Robotics.pdf
Recursive State Estimation AI for Robotics.pdfRecursive State Estimation AI for Robotics.pdf
Recursive State Estimation AI for Robotics.pdff20220630
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiersblaircomp2003
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Fuzzy Logic.pptx
Fuzzy Logic.pptxFuzzy Logic.pptx
Fuzzy Logic.pptxImXaib
 

Similar to Probability_Review.ppt for your knowledg (20)

Lec12-Probability (1).ppt
Lec12-Probability (1).pptLec12-Probability (1).ppt
Lec12-Probability (1).ppt
 
Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.ppt
 
Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.ppt
 
Lec12-Probability.ppt
Lec12-Probability.pptLec12-Probability.ppt
Lec12-Probability.ppt
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Fuzzy logic-introduction
Fuzzy logic-introductionFuzzy logic-introduction
Fuzzy logic-introduction
 
8.5 GaussianBNs.pdf
8.5 GaussianBNs.pdf8.5 GaussianBNs.pdf
8.5 GaussianBNs.pdf
 
Lecture3
Lecture3Lecture3
Lecture3
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
 
PredicateLogic (1).ppt
PredicateLogic (1).pptPredicateLogic (1).ppt
PredicateLogic (1).ppt
 
PredicateLogic.pptx
PredicateLogic.pptxPredicateLogic.pptx
PredicateLogic.pptx
 
Recursive State Estimation AI for Robotics.pdf
Recursive State Estimation AI for Robotics.pdfRecursive State Estimation AI for Robotics.pdf
Recursive State Estimation AI for Robotics.pdf
 
Predicates and Quantifiers
Predicates and QuantifiersPredicates and Quantifiers
Predicates and Quantifiers
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Ch02 fuzzyrelation
Ch02 fuzzyrelationCh02 fuzzyrelation
Ch02 fuzzyrelation
 
Fuzzy Logic.pptx
Fuzzy Logic.pptxFuzzy Logic.pptx
Fuzzy Logic.pptx
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 

Recently uploaded (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 

Probability_Review.ppt for your knowledg

  • 2. Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Structural properties • Independence, conditional independence • Mean and Variance • The big picture • Examples
  • 3. Sample space and Events • W : Sample Space, result of an experiment • If you toss a coin twice W = {HH,HT,TH,TT} • Event: a subset of W • First toss is head = {HH,HT} • S: event space, a set of events: • Closed under finite union and complements • Entails other binary operation: union, diff, etc. • Contains the empty event and W
  • 4. Probability Measure • Defined over (W,S) s.t. • P(a) >= 0 for all a in S • P(W) = 1 • If a, b are disjoint, then • P(a U b) = p(a) + p(b) • We can deduce other axioms from the above ones • Ex: P(a U b) for non-disjoint event P(a U b) = p(a) + p(b) – p(a ∩ b)
  • 5. Visualization • We can go on and define conditional probability, using the above visualization
  • 6. Conditional Probability P(F|H) = Fraction of worlds in which H is true that also have F true ) ( ) ( ) | ( H p H F p h f p  =
  • 7. Rule of total probability A B1 B2 B3 B4 B5 B6 B7  )  )  )  = i i B A P B P A p |
  • 8. From Events to Random Variable • Almost all the semester we will be dealing with RV • Concise way of specifying attributes of outcomes • Modeling students (Grade and Intelligence): • W = all possible students • What are events • Grade_A = all students with grade A • Grade_B = all students with grade B • Intelligence_High = … with high intelligence • Very cumbersome • We need “functions” that maps from W to an attribute space. • P(G = A) = P({student ϵ W : G(student) = A})
  • 9. Random Variables W High low A B A+ I:Intelligence G:Grade P(I = high) = P( {all students whose intelligence is high})
  • 10. Discrete Random Variables • Random variables (RVs) which may take on only a countable number of distinct values – E.g. the total number of tails X you get if you flip 100 coins • X is a RV with arity k if it can take on exactly one value out of {x1, …, xk} – E.g. the possible values that X can take on are 0, 1, 2, …, 100
  • 11. Probability of Discrete RV • Probability mass function (pmf): P(X = xi) • Easy facts about pmf  Σi P(X = xi) = 1  P(X = xi∩X = xj) = 0 if i ≠ j  P(X = xi U X = xj) = P(X = xi) + P(X = xj) if i ≠ j  P(X = x1 U X = x2 U … U X = xk) = 1
  • 12. Common Distributions • Uniform X U[1, …, N]  X takes values 1, 2, … N  P(X = i) = 1/N  E.g. picking balls of different colors from a box • Binomial X Bin(n, p)  X takes values 0, 1, …, n   E.g. coin flips  p(X = i) = n i      pi (1 p)ni
  • 13. Continuous Random Variables • Probability density function (pdf) instead of probability mass function (pmf) • A pdf is any function f(x) that describes the probability density in terms of the input variable x.
  • 14. Probability of Continuous RV • Properties of pdf   • Actual probability can be obtained by taking the integral of pdf  E.g. the probability of X being between 0 and 1 is  f (x)  0,x f (x) =1    P(0  X 1) = f (x)dx 0 1 
  • 15. Cumulative Distribution Function • FX(v) = P(X ≤ v) • Discrete RVs  FX(v) = Σvi P(X = vi) • Continuous RVs   FX (v) = f (x)dx  v  d dx Fx (x) = f (x)
  • 16. Common Distributions • Normal X N(μ, σ2)   E.g. the height of the entire population  f (x) = 1  2 exp  (x  )2 22      
  • 17. Multivariate Normal • Generalization to higher dimensions of the one-dimensional normal f r X (xi,...,xd ) = 1 (2)d /2  1/2  exp  1 2 r x    ) T 1 r x    )       . Covariance matrix Mean
  • 18. Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Structural properties • Independence, conditional independence • Mean and Variance • The big picture • Examples
  • 19. Joint Probability Distribution • Random variables encodes attributes • Not all possible combination of attributes are equally likely • Joint probability distributions quantify this • P( X= x, Y= y) = P(x, y) • Generalizes to N-RVs • •  )   = = = x y y Y x X P 1 ,  )  = x y Y X dxdy y x f 1 , ,
  • 20. Chain Rule • Always true • P(x, y, z) = p(x) p(y|x) p(z|x, y) = p(z) p(y|z) p(x|y, z) =…
  • 21. Conditional Probability  )  )  ) P X Y P X Y P Y x y x y y =  = = = = =  ) ) ( ) , ( | y p y x p y x P = But we will always write it this way: events
  • 22. Marginalization • We know p(X, Y), what is P(X=x)? • We can use the low of total probability, why?  )  )  )  )   = = y y y x P y P y x P x p | , A B1 B2 B3 B4 B5 B6 B7
  • 23. Marginalization Cont. • Another example  )  )  )  )   = = y z z y z y x P z y P z y x P x p , , , | , , ,
  • 24. Bayes Rule • We know that P(rain) = 0.5 • If we also know that the grass is wet, then how this affects our belief about whether it rains or not?  P rain | wet  )= P(rain)P(wet | rain) P(wet) P x | y  )= P(x)P(y | x) P(y)
  • 25. Bayes Rule cont. • You can condition on more variables  ) ) | ( ) , | ( ) | ( , | z y P z x y P z x P z y x P =
  • 26. Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Structural properties • Independence, conditional independence • Mean and Variance • The big picture • Examples
  • 27. Independence • X is independent of Y means that knowing Y does not change our belief about X. • P(X|Y=y) = P(X) • P(X=x, Y=y) = P(X=x) P(Y=y) • The above should hold for all x, y • It is symmetric and written as X  Y
  • 28. Independence • X1, …, Xn are independent if and only if • If X1, …, Xn are independent and identically distributed we say they are iid (or that they are a random sample) and we write  P(X1  A1,...,Xn  An ) = P Xi  Ai  ) i=1 n  X1, …, Xn ∼ P
  • 29. CI: Conditional Independence • RV are rarely independent but we can still leverage local structural properties like Conditional Independence. • X  Y | Z if once Z is observed, knowing the value of Y does not change our belief about X • P(rain  sprinkler’s on | cloudy) • P(rain  sprinkler’s on | wet grass)
  • 30. Conditional Independence • P(X=x | Z=z, Y=y) = P(X=x | Z=z) • P(Y=y | Z=z, X=x) = P(Y=y | Z=z) • P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z) We call these factors : very useful concept !!
  • 31. Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Structural properties • Independence, conditional independence • Mean and Variance • The big picture • Examples
  • 32. Mean and Variance • Mean (Expectation): – Discrete RVs: – Continuous RVs:  ) X E  =  )  ) X P X i i i v E v v = =   )  ) X E xf x dx   =   E(g(X)) = g(vi)P(X = vi) vi  E(g(X)) = g(x) f (x)dx   
  • 33. Mean and Variance • Variance: – Discrete RVs: – Continuous RVs: • Covariance:  )  )  ) 2 X P X i i i v V v v  =  =   )  )  ) 2 X V x f x dx    =    Var(X) = E((X  )2 ) Var(X) = E(X2 )  2 Cov(X,Y) = E((X  x )(Y  y )) = E(XY)  xy
  • 34. Mean and Variance • Correlation:  (X,Y) = Cov(X,Y)/xy  1 (X,Y) 1
  • 35. Properties • Mean – – – If X and Y are independent, • Variance – – If X and Y are independent,  )  )  ) X Y X Y E E E  =   )  ) X X E a aE =  )  )  ) XY X Y E E E =   )  ) 2 X X V a b a V  =  ) X Y (X) (Y) V V V  = 
  • 36. Some more properties • The conditional expectation of Y given X when the value of X = x is: • The Law of Total Expectation or Law of Iterated Expectation:  ) dy x y p y x X Y E ) | ( * |  = =    = = = dx x p x X Y E X Y E E Y E X ) ( ) | ( ) | ( ) (
  • 37. Some more properties • The law of Total Variance:  Var(Y) =Var E(Y | X)   E Var(Y | X)  
  • 38. Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Structural properties • Independence, conditional independence • Mean and Variance • The big picture • Examples
  • 39. The Big Picture Model Data Probability Estimation/learning
  • 40. Statistical Inference • Given observations from a model – What (conditional) independence assumptions hold? • Structure learning – If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. • Parameter learning
  • 41. Probability Review • Events and Event spaces • Random variables • Joint probability distributions • Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. • Structural properties • Independence, conditional independence • Mean and Variance • The big picture • Examples
  • 42. Monty Hall Problem • You're given the choice of three doors: Behind one door is a car; behind the others, goats. • You pick a door, say No. 1 • The host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. • Do you want to pick door No. 2 instead?
  • 43. Host must reveal Goat B Host must reveal Goat A Host reveals Goat A or Host reveals Goat B
  • 44. Monty Hall Problem: Bayes Rule • : the car is behind door i, i = 1, 2, 3 • • : the host opens door j after you pick door i • i C ij H  ) 1 3 i P C =  ) 0 0 1 2 1 , ij k i j j k P H C i k i k j k =   =  =  =     
  • 45. Monty Hall Problem: Bayes Rule cont. • WLOG, i=1, j=3 • •  )  )  )  ) 13 1 1 1 13 13 P H C P C P C H P H =  )  ) 13 1 1 1 1 1 2 3 6 P H C P C =  =
  • 46. • • Monty Hall Problem: Bayes Rule cont.  )  )  )  )  )  )  )  ) 13 13 1 13 2 13 3 13 1 1 13 2 2 , , , 1 1 1 6 3 1 2 P H P H C P H C P H C P H C P C P H C P C =   =  =   =  ) 1 13 1 6 1 1 2 3 P C H = =
  • 47. Monty Hall Problem: Bayes Rule cont.  ) 1 13 1 6 1 1 2 3 P C H = =    You should switch!  )  ) 2 13 1 13 1 2 1 3 3 P C H P C H =  = 
  • 48. Information Theory • P(X) encodes our uncertainty about X • Some variables are more uncertain that others • How can we quantify this intuition? • Entropy: average number of bits required to encode X P(X) P(Y) X Y  )  )  )  )  )    = =       = x x P x P x P x P x P x p E X H ) ( log 1 log 1 log
  • 49. Information Theory cont. • Entropy: average number of bits required to encode X • We can define conditional entropy similarly • i.e. once Y is known, we only need H(X,Y) – H(Y) bits • We can also define chain rule for entropies (not surprising)  )  )  )  ) Y H Y X H y x p E Y X H P P P  =       = , | 1 log |  )  )  )  ) Y X Z H X Y H X H Z Y X H P P P P , | | , ,   =  )  )  )  )  )    = =       = x x P x P x P x P x P x p E X H ) ( log 1 log 1 log
  • 50. Mutual Information: MI • Remember independence? • If XY then knowing Y won’t change our belief about X • Mutual information can help quantify this! (not the only way though) • MI: • “The amount of uncertainty in X which is removed by knowing Y” • Symmetric • I(X;Y) = 0 iff, X and Y are independent!  )  )  ) Y X H X H Y X I P P P | ;  =          = y x y p x p y x p y x p Y X I ) ( ) ( ) , ( log ) , ( ) ; (
  • 51. Chi Square Test for Independence (Example) Republican Democrat Independent Total Male 200 150 50 400 Female 250 300 50 600 Total 450 450 100 1000 • State the hypotheses H0: Gender and voting preferences are independent. Ha: Gender and voting preferences are not independent • Choose significance level Say, 0.05
  • 52. Chi Square Test for Independence • Analyze sample data • Degrees of freedom = |g|-1 * |v|-1 = (2-1) * (3-1) = 2 • Expected frequency count = Eg,v = (ng * nv) / n Em,r = (400 * 450) / 1000 = 180000/1000 = 180 Em,d= (400 * 450) / 1000 = 180000/1000 = 180 Em,i = (400 * 100) / 1000 = 40000/1000 = 40 Ef,r = (600 * 450) / 1000 = 270000/1000 = 270 Ef,d = (600 * 450) / 1000 = 270000/1000 = 270 Ef,i = (600 * 100) / 1000 = 60000/1000 = 60 Republican Democrat Independent Total Male 200 150 50 400 Female 250 300 50 600 Total 450 450 100 1000
  • 53. Chi Square Test for Independence • Chi-square test statistic • Χ2 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40 + (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/40 • Χ2 = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 + 100/60 • Χ2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2          =  v g v g v g E E O X , 2 , , 2 ) ( Republican Democrat Independent Total Male 200 150 50 400 Female 250 300 50 600 Total 450 450 100 1000
  • 54. Chi Square Test for Independence • P-value – Probability of observing a sample statistic as extreme as the test statistic – P(X2 ≥ 16.2) = 0.0003 • Since P-value (0.0003) is less than the significance level (0.05), we cannot accept the null hypothesis • There is a relationship between gender and voting preference
  • 55. Acknowledgment • Carlos Guestrin recitation slides: http://www.cs.cmu.edu/~guestrin/Class/10708/recitations/r1/Probability_and_St atistics_Review.ppt • Andrew Moore Tutorial: http://www.autonlab.org/tutorials/prob.html • Monty hall problem: http://en.wikipedia.org/wiki/Monty_Hall_problem • http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html • Chi-square test for independence http://stattrek.com/chi-square-test/independence.aspx