Probability axioms and counting rules

Axioms of Probability
A probability measure P is defined on S by defining
for each event E, P[E] with the following properties
1. P[E] ≥ 0, for each E.
2. P[S] = 1.
3.   if for all ,
i i i i
i
i
P E P E E E i j

 
  
 
 

     
1 2 1 2
P E E P E P E
    

Finite uniform probability space
Many examples fall into this category
1. Finite number of outcomes
2. All outcomes are equally likely
3.  
 
 
  no. of outcomes in
=
total no. of outcomes
n E n E E
P E
n S N
 
 
: = no. of elements of
n A A
Note
To handle problems in case we have to be able to
count. Count n(E) and n(S).

Basic Rule of counting
Suppose we carry out k operations in sequence
Let
n1 = the number of ways the first operation
can be performed
ni = the number of ways the ith operation can be
performed once the first (i - 1) operations
have been completed. i = 2, 3, … , k
Then N = n1n2 … nk = the number of ways the
k operations can be performed in sequence.

Basic Counting Formulae
1. Permutations: How many ways can you order n
objects
n!
2. Permutations of size k (< n): How many ways can
you choose k objects from n objects in a specific
order
   
 
!
= 1 1
!
n k
n
P n n n k
n k
   


3. Combinations of size k ( ≤ n): A combination of
size k chosen from n objects is a subset of size k
where the order of selection is irrelevant. How many
ways can you choose a combination of size k objects
from n objects (order of selection is irrelevant)
n k
n
C
k
 
 
 
   
    
    
1 2 1
!
! ! 1 2 1
n n n n k
n
n k k k k k
   

  

Important Notes
1. In combinations ordering is irrelevant.
Different orderings result in the same
combination.
2. In permutations order is relevant. Different
orderings result in the different permutations.

The additive rule
P[A  B] = P[A] + P[B] – P[A  B]
and
if P[A  B] = 
P[A  B] = P[A] + P[B]

The additive rule for more than two events
then
and if Ai  Aj =  for all i ≠ j.
 
1
1
n n
i i i j
i i j
i
P A P A P A A
 

 
 
  
   
 
 
i j k
i j k
P A A A
 
 
   
 

   
1
1 2
1
n
n
P A A A

    
 
1
1
n n
i i
i
i
P A P A


 

 
 


The Rule for complements
for any event E
 
1
P E P E
   
 

Conditional Probability,
Independence
and
The Multiplicative Rule

The conditional probability of A given B is
defined to be:
 
 
P A B
P A B
P B

  
 
 
if 0
P B 

 
   
   
if 0
if 0
P A P B A P A
P A B
P B P A B P B
   
  
  
  
  

The multiplicative rule of probability
and
     
P A B P A P B
 
if A and B are independent.
This is the definition of independence

 
1 2 n
P A A A
   
The multiplicative rule for more than
two events
 
1 2 1 3 2 1
P A P A A P A A A
   

   
1 2 1
n n n
P A A A A
 
 
 
 

Independence
for more than 2 events

Definition:
The set of k events A1, A2, … , Ak are called
mutually independent if:
P[Ai1
∩ Ai2
∩… ∩ Aim
] = P[Ai1
] P[Ai2
] …P[Aim
]
For every subset {i1, i2, … , im } of {1, 2, …, k }
i.e. for k = 3 A1, A2, … , Ak are mutually independent if:
P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3],
P[A2 ∩ A3] = P[A2] P[A3],
P[A1 ∩ A2 ∩ A3] = P[A1] P[A2] P[A3]

Definition:
The set of k events A1, A2, … , Ak are called
pairwise independent if:
P[Ai ∩ Aj] = P[Ai] P[Aj] for all i and j.
i.e. for k = 3 A1, A2, … , Ak are pairwise independent if:
P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3],
P[A2 ∩ A3] = P[A2] P[A3],
It is not necessarily true that P[A1 ∩ A2 ∩ A3] = P[A1]
P[A2] P[A3]

Bayes Rule for probability
 
 
P A P B A
P A B
P A P B A P A P B A
 
 
  
   
 
  
     

Let A1, A2 , … , Ak denote a set of events such that
 
   
1 1
i i
i
k k
P A P B A
P A B
P A P B A P A P B A
 
 
  
   
   
   
An generalization of Bayes Rule
1 2 and
k i j
S A A A A A 
     
for all i and j. Then

Random Variables
an important concept in probability

A random variable , X, is a numerical quantity
whose value is determined be a random
experiment

Definition – The probability function, p(x), of
a random variable, X.
For any random variable, X, and any real
number, x, we define
     
p x P X x P X x
 
   
 
where {X = x} = the set of all outcomes (event)
with X = x.
For continuous random variables p(x) = 0 for all
values of x.

Definition – The cumulative distribution
function, F(x), of a random variable, X.
For any random variable, X, and any real
number, x, we define
     
F x P X x P X x
 
   
 
where {X ≤ x} = the set of all outcomes (event)
with X ≤ x.

Discrete Random Variables
For a discrete random variable X the probability
distribution is described by the probability
function p(x), which has the following properties
   
1
2. 1
i
x i
p x p x


 
 
 
1. 0 1
p x
 
   
3.
a x b
P a x b p x
 
   

Graph: Discrete Random Variable
p(x)
   
a x b
P a x b p x
 
   
a b

Continuous random variables
For a continuous random variable X the probability
distribution is described by the probability density
function f(x), which has the following properties :
1. f(x) ≥ 0
 
2. 1.
f x dx




   
3. .
b
a
P a X b f x dx
   

Graph: Continuous Random Variable
probability density function, f(x)
  1.
f x dx




    .
b
a
P a X b f x dx
   

The distribution function F(x)
This is defined for any random variable, X.
F(x) = P[X ≤ x]
Properties
1. F(-∞) = 0 and F(∞) = 1.
2. F(x) is non-decreasing (i. e. if x1 < x2 then
F(x1) ≤ F(x2) )
3. F(b) – F(a) = P[a < X ≤ b].

4. p(x) = P[X = x] =F(x) – F(x-)
5. If p(x) = 0 for all x (i.e. X is continuous)
then F(x) is continuous.
Here    
lim
u x
F x F u
 
 

6. For Discrete Random Variables
F(x) is a non-decreasing step function with
     
u x
F x P X x p u

   
       
jump in at .
p x F x F x F x x
   
   
0 and 1
F F
   
0
0.2
0.4
0.6
0.8
1
1.2
-1 0 1 2 3 4
F(x)
p(x)

7. For Continuous Random Variables
Variables
F(x) is a non-decreasing continuous function with
     
x
F x P X x f u du

   
   .
f x F x


   
0 and 1
F F
   
F(x)
f(x) slope
0
1
-1 0 1 2
x
To find the probability density function, f(x), one first
finds F(x) then    .
f x F x



Some Important Discrete
distributions

Suppose that we have a experiment that has two
outcomes
1. Success (S)
2. Failure (F)
These terms are used in reliability testing.
Suppose that p is the probability of success (S) and
q = 1 – p is the probability of failure (F)
This experiment is sometimes called a Bernoulli Trial
Let 0 if the outcome is F
1 if the outcome is S
X

 

Then    
0
1
q x
p x P X x
p x


   



The probability distribution with probability function
is called the Bernoulli distribution
   
0
1
q x
p x P X x
p x


   


0
0.2
0.4
0.6
0.8
1
0 1
p
q = 1- p

We observe a Bernoulli trial (S,F) n times.
    0,1,2, ,
x n x
n
p x P X x p q x n
x

 
   
 
 
where
Let X denote the number of successes in the n trials.
Then X has a binomial distribution, i. e.
1. p = the probability of success (S), and
2. q = 1 – p = the probability of failure (F)

The Poisson distribution
• Suppose events are occurring randomly and
uniformly in time.
• Let X be the number of events occuring in a
fixed period of time. Then X will have a
Poisson distribution with parameter l.
  0,1,2,3,4,
!
x
p x e x
x
l
l 
 

The Geometric distribution
Suppose a Bernoulli trial (S,F) is repeated until a
success occurs.
X = the trial on which the first success (S)
occurs.
The probability function of X is:
p(x) =P[X = x] = (1 – p)x – 1p = p qx - 1

The Negative Binomial distribution
Suppose a Bernoulli trial (S,F) is repeated until k
successes occur.
Let X = the trial on which the kth success (S)
occurs.
The probability function of X is:
   
1
, 1, 2,
1
k x k
x
p x P X x p q x k k k
k


 
     
 

 

The Hypergeometric distribution
Suppose we have a population containing N objects.
Suppose the elements of the population are partitioned into two
groups. Let a = the number of elements in group A and let b = the
number of elements in the other group (group B). Note N = a + b.
Now suppose that n elements are selected from the population at
random. Let X denote the elements from group A.
The probability distribution of X is
   
a b
x n x
p x P X x
N
n
  
  

  
  
 
 
 

0
0.1
0.2
0.3
0.4
0 5 10 15
1
b a




 


a b
 
f x
x
0
0.1
0.2
0.3
0.4
0 5 10 15
1
b a




 


a b
 
f x
x
1
b a




 


a b
 
f x
x
Continuous Distributions
The Uniform distribution from a to b
 
1
0 otherwise
a x b
f x b a

 

 




The Normal distribution
(mean m, standard deviation s)
 
 2
2
2
1
2
x
f x e
m
s
s



s
m

0
0.1
0.2
-2 0 2 4 6 8 10
The Exponential distribution
 
0
0 0
x
e x
f x
x
l
l 
 
 



The Weibull distribution
A model for the lifetime of objects
that do age.

The Weibull distribution with parameters a and b.
  1
0
x
f x x e x
b
a
b b
a


 

The Weibull density, f(x)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4 5
(a = 0.5, b = 2)
(a = 0.7, b = 2)
(a = 0.9, b = 2)

The Gamma distribution
An important family of distributions

The Gamma distribution
Let the continuous random variable X have
density function:
   
1
0
0 0
x
x e x
f x
x
a
a l
l
a
 




 
 

Then X is said to have a Gamma distribution
with parameters a and l.

Graph: The gamma distribution
0
0.1
0.2
0.3
0.4
0 2 4 6 8 10
(a = 2, l = 0.9)
(a = 2, l = 0.6)
(a = 3, l = 0.6)

Comments
1. The set of gamma distributions is a family of
distributions (parameterized by a and l).
2. Contained within this family are other distributions
a. The Exponential distribution – in this case a = 1, the
gamma distribution becomes the exponential distribution
with parameter l. The exponential distribution arises if
we are measuring the lifetime, X, of an object that does
not age. It is also used a distribution for waiting times
between events occurring uniformly in time.
b. The Chi-square distribution – in the case a = n/2 and
l = ½, the gamma distribution becomes the chi- square
(c2) distribution with n degrees of freedom. Later we
will see that sum of squares of independent standard
normal variates have a chi-square distribution, degrees
of freedom = the number of independent terms in the
sum of squares.

Let X denote a discrete random variable with
probability function p(x) (probability density function
f(x) if X is continuous) then the expected value of X,
E(X) is defined to be:
     
i i
x i
E X xp x x p x
 
 
   
E X xf x dx


 
and if X is continuous with probability density function
f(x)

Expectation of functions
Let X denote a discrete random variable with
probability function p(x) then the expected value of X,
E[g (X)] is defined to be:
     
x
E g X g x p x
  
  
     
E X g x f x dx


 
and if X is continuous with probability density function
f(x)

the kth moment of X :
 
k
k E X
m 
 
 
-
if is discrete
if is continuous
k
x
k
x p x X
x f x dx X





 





• The first moment of X , m = m1 = E(X) is the center of gravity
of the distribution of X.
• The higher moments give different information regarding the
distribution of X.

the kth central moment of X
 
0 k
k E X
m m
 
 
 
   
   
-
if is discrete
if is continuous
k
x
k
x p x X
x f x dx X
m
m


 


 
 





Definition
Let X denote a random variable, Then the moment
generating function of X , mX(t) is defined by:
 
 
 
if is discrete
if is continuous
tx
x
tX
X
tx
e p x X
m t E e
e f x dx X





 
  
 






Properties
1. mX(0) = 1
 
   
0 derivative of at 0.
k th
X X
m k m t t
 
2.
 
k
k E X
m
 
  2 3
3
2
1
1 .
2! 3! !
k
k
X
m t t t t t
k
m m
m
m
      
3.
 
 
 
continuous
discrete
k
k
k k
x f x dx X
E X
x p x X
m


  





4. Let X be a random variable with moment
generating function mX(t). Let Y = bX + a
Then mY(t) = mbX + a(t)
= E(e [bX + a]t) = eatE(e X[ bt ])
= eatmX (bt)
5. Let X and Y be two independent random
variables with moment generating function
mX(t) and mY(t) .
Then mX+Y(t) = E(e [X + Y]t) = E(e Xt e Yt)
= E(e Xt) E(e Yt)
= mX (t) mY (t)

6. Let X and Y be two random variables with
moment generating function mX(t) and mY(t)
and two distribution functions FX(x) and
FY(y) respectively.
Let mX (t) = mY (t) then FX(x) = FY(x).
This ensures that the distribution of a random
variable can be identified by its moment
generating function

M. G. F.’s - Continuous distributions
Name
Moment generating
function MX(t)
Continuous
Uniform
ebt-eat
[b-a]t
Exponential l
l  t





for t < l
Gamma
l
l  t






a
for t < l
c2
n d.f.






1
1-2t
n/2
for t < 1/2
Normal etm+(1/2)t2s2

M. G. F.’s - Discrete distributions
Name
Moment
generating
function MX(t)
Discrete
Uniform
et
N
etN-1
et-1
Bernoulli q + pet
Binomial (q + pet)N
Geometric pet
1-qet
Negative
Binomial 





pet
1-qet
k
Poisson el(et-1)

Note:
The distribution of a random variable X can be described by:
 
 
probability function if is discrete
1.
probability density function if is continuous
p x X
f x X


 
 
 
3. Moment generating function:
if is discrete
if is continuous
tx
x
tX
X
tx
e p x X
m t E e
e f x dx X





 
  
 





 
 
 
2. Distribution function:
if is discrete
if is continuous
u x
x
p u X
F x
f u du X





 






Summary of Discrete Distributions
Name probability function p(x) Mean Variance
Moment
generating
function MX(t)
Discrete
Uniform p(x) =
1
N x=1,2,...,N
N+1
2
N2-1
12
et
N
etN-1
et-1
Bernoulli
p(x) =


p x=1
q x=0
p pq q + pet
Binomial
p(x) = 





N
x pxqN-x
Np Npq (q + pet)N
Geometric p(x) =pqx-1 x=1,2,... 1
p
q
p2
pet
1-qet
Negative
Binomial p(x) = 





x-1
k-1 pkqx-k
x=k,k+1,...
k
p
kq
p2






pet
1-qet
k
Poisson
p(x) =
lx
x! e-l x=1,2,...
l l el(et-1)
Hypergeometric
p(x) =






A
x 





N-A
n-x






N
n
n






A
N n






A
N 





1-
A
N 





N-n
N-1
not useful

Summary of Continuous Distributions
Name
probability
density function f(x) Mean Variance
Moment generating
function MX(t)
Continuous
Uniform









otherwise
b
x
a
a
b
x
f
0
1
)
(
a+b
2
(b-a)2
12
ebt-eat
[b-a]t
Exponentia
l







0
0
0
)
(
x
x
le
x
f
lx 1
l
1
l2
l
l  t





for t < l
Gamma
f(x) =










0
0
0
)
(
f(x)
1
x
x
e
x
a
G
l lx
a
a
a
l
a
l2
l
l  t






a
for t < l
c2
n d.f.
f(x) =




(1/2)n
(n/2)
xne-(1/2)x x ? 0
0 x < 0
n n






1
1-2t
n/2
for t < 1/2
Normal
f(x) =
1
2 s
e-(x-m)2/2s2 m s2
etm+(1/2)t2s2
Weibull
f(x) =






x e-x
 x ? 0
0 x < 0


( )
+1









( )
+2
 -[ ]
( )
+1

 not
avail.

Jointly distributed Random
variables
Multivariate distributions

The joint probability function;
p(x,y) = P[X = x, Y = y]
 
1. 0 , 1
p x y
 
 
2. , 1
x y
p x y 

   
3. , ,
P X Y A p x y
 
 
  
 
,
x y A


Definition: Two random variable are said to have
joint probability density function f(x,y) if
 
1. 0 ,
f x y

 
2. , 1
f x y dxdy
 
 

 
   
3. , ,
P X Y A f x y dxdy
 
 
  
A

Marginal and conditional
distributions

Marginal Distributions (Discrete case):
Let X and Y denote two random variables with
joint probability function p(x,y) then
the marginal density of X is
   
,
X
y
p x p x y
 
the marginal density of Y is
   
,
Y
x
p y p x y
 

Marginal Distributions (Continuous case):
joint probability density function f(x,y) then
the marginal density of X is
   
,
X
f x f x y dy


 
the marginal density of Y is
   
,
Y
f y f x y dx


 

Conditional Distributions (Discrete Case):
joint probability function p(x,y) and marginal
probability functions pX(x), pY(y) then
the conditional density of Y given X = x
 
 
 
,
Y X
X
p x y
p y x
p x

conditional density of X given Y = y
 
 
 
,
X Y
Y
p x y
p x y
p y


Conditional Distributions (Continuous Case):
joint probability density function f(x,y) and
marginal densities fX(x), fY(y) then
the conditional density of Y given X = x
 
 
 
,
Y X
X
f x y
f y x
f x

conditional density of X given Y = y
 
 
 
,
X Y
Y
f x y
f x y
f y


The bivariate Normal distribution

Let
 
2 2
1 1 1 1 2 2 2 2
1 1 2 2
1 2 2
2
,
1
x x x x
Q x x
m m m m

s s s s

 
      
   
 
 
 
      
      
 
 


 
 
 
1 2
1
,
2
1 2 2
1 2
1
, e
2 1
Q x x
f x x
 s s 



where
This distribution is called the bivariate
Normal distribution.
The parameters are m1, m2 , s1, s2 and .

Surface Plots of the bivariate
Normal distribution

2. The marginal distribution of x2 is Normal with
mean m2 and standard deviation s2.
1. The marginal distribution of x1 is Normal with
mean m1 and standard deviation s1.
Marginal distributions

Conditional distributions
1. The conditional distribution of x1 given x2 is
Normal with:
and
mean
standard deviation
 
1
1 2 2
12
2
x
s
m m  m
s
  
2
1
12
1
s s 
 
2. The conditional distribution of x2 given x1 is
Normal with:
and
mean
standard deviation
 
2
2 1 1
21
1
x
s
m m  m
s
  
2
2
21
1
s s 
 

Two random variables X and Y are defined to be
independent if
Definition:
     
, X Y
p x y p x p y
 if X and Y are discrete
     
, X Y
f x y f x f y
 if X and Y are continuous

multivariate distributions
k ≥ 2

Definition
Let X1, X2, …, Xn denote n discrete random
variables, then
p(x1, x2, …, xn )
is joint probability function of X1, X2, …, Xn if
 
1
1
2. , , 1
n
n
x x
p x x 
 
 
1
1. 0 , , 1
n
p x x
 
   
1 1
3. , , , ,
n n
P X X A p x x
 
 
   
 
1, , n
x x A


Definition
Let X1, X2, …, Xk denote k continuous random
variables, then
f(x1, x2, …, xk )
is joint density function of X1, X2, …, Xk if
 
1 1
2. , , , , 1
n n
f x x dx dx
 
 

 
 
1
1. , , 0
n
f x x 
   
1 1 1
3. , , , , , ,
n n n
P X X A f x x dx dx
 
 
   
A

The Multinomial distribution
Suppose that we observe an experiment that has k
possible outcomes {O1, O2, …, Ok } independently n
times.
Let p1, p2, …, pk denote probabilities of O1, O2, …,
Ok respectively.
Let Xi denote the number of times that outcome Oi
occurs in the n repetitions of the experiment.

is called the Multinomial distribution
  1 2
1 1 2
1 2
!
, ,
! ! !
k
x
x x
n k
k
n
p x x p p p
x x x

1 2
1 2
1 2
k
x
x x
k
k
n
p p p
x x x
 
  
 
The joint probability function of:

The Multivariate Normal distribution
Recall the univariate normal distribution
   
2
1
2
1
2
x
f x e
m
s
s



the bivariate normal distribution
          
2
2
1
2
2 1
2
2
1
,
2 1
x x
x x y y
x x
x x y y
x y
f x y e
m m
m m
s s s s


s s 
 
 

 
  
 
 



The k-variate Normal distribution
   
 
   
1
1
2
1 /2 1/2
1
, ,
2
k k
f x x f e



   
 

x μ x μ
x
where
1
2
k
x
x
x
 
 
 

 
 
 
x
1
2
k
m
m
m
 
 
 

 
 
 
μ
11 12 1
12 22 2
1 2
k
k
k k kk
s s s
s s s
s s s
 
 
 
 
 
 
 

Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete
random variables with joint probability function
p(x1, x2, …, xq, xq+1 …, xk )
   
1
12 1 1
, , , ,
q n
q q n
x x
p x x p x x

   
then the marginal joint probability function
of X1, X2, …, Xq is

Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous
random variables with joint probability density
function
f(x1, x2, …, xq, xq+1 …, xk )
   
12 1 1 1
, , , ,
q q n q n
f x x f x x dx dx
 

 
   
then the marginal joint probability function
of X1, X2, …, Xq is

Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete
random variables with joint probability function
p(x1, x2, …, xq, xq+1 …, xk )
   
 
1
1 1
1 1
1 1
, ,
, , , ,
, ,
k
q q k
q q k
q k q k
p x x
p x x x x
p x x


 
  

then the conditional joint probability function
of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is

Definition
function
f(x1, x2, …, xq, xq+1 …, xk )
Definition
   
 
1
1 1
1 1
1 1
, ,
, , , ,
, ,
k
q q k
q q k
q k q k
f x x
f x x x x
f x x


 
  


Definition
function
f(x1, x2, …, xq, xq+1 …, xk )
then the variables X1, X2, …, Xq are independent
of Xq+1, …, Xk if
Definition – Independence of sets of vectors
     
1 1 1 1 1
, , , , , ,
k q q q k q k
f x x f x x f x x
 
  
A similar definition for discrete random variables.

Definition
Let X1, X2, …, Xk denote k continuous random
variables with joint probability density function
f(x1, x2, …, xk )
then the variables X1, X2, …, Xk are called
mutually independent if
Definition – Mutual Independence
       
1 1 1 2 2
, , k k k
f x x f x f x f x
 
A similar definition for discrete random variables.

Expectation
for multivariate distributions

Definition
Let X1, X2, …, Xn denote n jointly distributed
random variable with joint density function
f(x1, x2, …, xn )
then
 
1, , n
E g X X
 
 
   
1 1 1
, , , , , ,
n n n
g x x f x x dx dx
 
 
  

   
1 1
1. , ,
i i n n
E X x f x x dx dx
 
 
  
 
i i i i
x f x dx


 
Thus you can calculate E[Xi] either from the joint distribution of
X1, … , Xn or the marginal distribution of Xi.
     
1 1 1 1
2. n n n n
E a X a X b a E X a E X b
      
The Linearity property

   
1 1
, , , ,
q q k
E g X X h X X

 
 
In the simple case when k = 2
3. (The Multiplicative property) Suppose X1, … , Xq
are independent of Xq+1, … , Xk then
   
1 1
, , , ,
q q k
E g X X E h X X

   
    
     
E XY E X E Y

if X and Y are independent

Some Rules for Variance
   
2 2 2
Var X X
X E X E X
m m
   
   
 
 

Ex:
2
1
1
P X k
k
m s
 
   
 
3
2
4
P X m s
 
  
 
Tchebychev’s inequality
8
3
9
P X m s
 
  
 
15
4
16
P X m s
 
  
 

       
1. Var Var Var 2Cov ,
X Y X Y X Y
   
    
where Cov , = X Y
X Y E X Y
m m
 
 
 
 
Cov , 0
X Y 
     
and Var Var Var
X Y X Y
  
Note: If X and Y are independent, then

The correlation coefficient XY
 
   
 
Cov , Cov ,
=
Var Var
xy
X Y
X Y X Y
X Y

s s

:
1. If and are independent than 0.
XY
X Y  
Properties
2. 1 1
XY

  
if there exists a and b such that
and 1
XY
 
  1
P Y bX a
  
where XY = +1 if b > 0 and XY = -1 if b< 0

       
2 2
2. Var Var Var 2 Cov ,
aX bY a X b Y ab X Y
   
Some other properties of variance
 
1 1
3. Var n n
a X a X
  
   
2 2
1 1
Var Var
n n
a X a X
  
   
1 2 1 2 1 1
2 Cov , 2 Cov ,
n n
a a X X a a X X
  
   
2 3 2 3 2 2
2 Cov , 2 Cov ,
n n
a a X X a a X X
  
 
1 1
2 Cov ,
n n n n
a a X X
 

   
2
1
Var 2 Cov ,
n
i i i j i j
i
a X a a X X

 
 

4. Variance: Multiplicative Rule for independent
random variables
Suppose that X and Y are independent random variables,
then:
         
2 2
X Y
Var XY Var X Var Y Var Y Var X
m m
  

Mean and Variance of averages
Let
1
1 n
i
i
X X
n 
 
Let X1, … , Xn be n mutually independent random
variables each having mean m and standard deviation s
(variance s2).
Then X
E X
m m
 
 
 
and
2
2
X
Var X
n
s
s  
 
 

The Law of Large Numbers
Let
1
1 n
i
i
X X
n 
 
Let X1, … , Xn be n mutually independent random
variables each having mean m.
Then for any d > 0 (no matter how small)
1 as
P X P X n
m d m d m d
   
         
 
 

random variables with joint probability density function
f(x1, x2, …, xq, xq+1 …, xk )
Definition
   
 
1
1 1
1 1
1 1
, ,
, , , ,
, ,
k
q q k
q q k
q k q k
f x x
f x x x x
f x x


 
  


Let U = h( X1, X2, …, Xq, Xq+1 …, Xk )
then the Conditional Expectation of U
given Xq+1 = xq+1 , …, Xk = xk is
Definition
   
1 1 1 1
1 1
, , , , , ,
k q q k q
q q k
h x x f x x x x dx dx
 


 
   
 
1, ,
q k
E U x x

 
 
 
Note this will be a function of xq+1 , …, xk.

A very useful rule
 
E U E E U
 
 
  
 
y y
 
Var U E Var U Var E U
   
   
 
   
   
y y
y y
Then
   
1 1
Let , , , , , ,
q m
U g x x y y g
  x y
Let (x1, x2, … , xq, y1, y2, … , ym) = (x, y) denote q + m
random variables.

Probability axioms and counting rules

Recommended

Recommended

More Related Content

Similar to Probability axioms and counting rules

Similar to Probability axioms and counting rules (20)

Recently uploaded

Recently uploaded (20)

Probability axioms and counting rules