Hypergeometric Distribution Mean and Variance

Mean and Variance of the HyperGeometric Distribution Page 1
Al Lehnen Madison Area Technical College 11/30/2011
In a drawing of n distinguishable objects without replacement from a set of N (n < N)
distinguishable objects, a of which have characteristic A, (a < N) the probability that
exactly x objects in the draw of n have the characteristic A is given by then number of
different ways the x objects can be chosen from the a available times the number of
different ways the n-x objects in the draw which don’t have A can be chosen from the
N-a available divided by the number of different ways n distinguishable objects can be
chosen from a set of N. The resulting probability distribution for the random variable x is
called the hypergeometric distribution. In symbols,
( )
a N a
x n x
f x
N
n
−⎛ ⎞⎛ ⎞
⎜ ⎟⎜ ⎟
−⎝ ⎠⎝ ⎠=
⎛ ⎞
⎜ ⎟
⎝ ⎠
.
The binomial coefficient
( )
!
! !
k k
j j k j
⎛ ⎞
=⎜ ⎟
−⎝ ⎠
is defined to be zero if either j or k-j is
negative, so that the probability of the null event of drawing more objects than those
available is zero. To prove that ( )
0 0
1
n n
x x
a N a
x n x
f x
N
n
= =
−⎛ ⎞⎛ ⎞
⎜ ⎟⎜ ⎟
−⎝ ⎠⎝ ⎠= =
⎛ ⎞
⎜ ⎟
⎝ ⎠
∑ ∑ , consider the factorization
( ) ( ) ( )N a N a
B C B C B C
−
+ = + + . From the binomial theorem,
( ) ( )
( )
0 0
0 0
a N a
a N a a j j N a l l
j l
a N a
N l j l j
j l
a N a
B C B C B C B C
j l
a N a
B C
j l
−
− − − −
= =
−
− + +
= =
−⎛ ⎞ ⎛ ⎞
+ + = ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
−⎛ ⎞⎛ ⎞
= ⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
∑ ∑
∑ ∑
Using the diagonal rearrangement suggested by the figure below with l n j= − , with the
intercept n running from 0 to N and j running from 0 to a. This generates more than the
( )( )1 1a N a+ − + terms in the above sum. However, all of the new terms generated vanish
since they have l N a> − .
( ) ( )
0 0
N a
a N a N n n
n j
a N a
B C B C B C
j n j
− −
= =
−⎛ ⎞⎛ ⎞
+ + = ⎜ ⎟⎜ ⎟
−⎝ ⎠⎝ ⎠
∑ ∑
Now, for n a> extending the sum over j to n because of the
a
j
⎛ ⎞
⎜ ⎟
⎝ ⎠
factor would only add
terms which are zero. Similarly, if n a< , the terms in the sum over j from j = n + 1 to j = a
are all zero due to the
N a
n j
−⎛ ⎞
⎜ ⎟
−⎝ ⎠
factor. Thus,
( ) ( )
0 0 0 0
N a N n
a N a N n n N n n
n j n j
a N a a N a
B C B C B C B C
j n j j n j
− − −
= = = =
− −⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞
+ + = =⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟
− −⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
∑ ∑ ∑ ∑ .

But from a second use of the binomial theorem,
( ) ( ) ( )
0 0 0
N n N
a N a NN n n N n n
n j n
a N a N
B C B C B C B C B C
j n j n
− − −
= = =
−⎛ ⎞⎛ ⎞ ⎛ ⎞
+ + = = + =⎜ ⎟⎜ ⎟ ⎜ ⎟
−⎝ ⎠⎝ ⎠ ⎝ ⎠
∑ ∑ ∑ .
The only way the two sums can be equal for all values of B and C is for
0
n
j
a N a N
j n j n=
−⎛ ⎞⎛ ⎞ ⎛ ⎞
=⎜ ⎟⎜ ⎟ ⎜ ⎟
−⎝ ⎠⎝ ⎠ ⎝ ⎠
∑ . (1)
This in turn implies that the hypergeometric probabilities do indeed construct a valid
probability distribution, i.e. ( )
0 0
1
n n
x x
a N a
x n x
f x
N
n
= =
−⎛ ⎞⎛ ⎞
⎜ ⎟⎜ ⎟
−⎝ ⎠⎝ ⎠= =
⎛ ⎞
⎜ ⎟
⎝ ⎠
∑ ∑ .
The mean or expected value of the hypergeometric random variable is given by
( )
1
0 0
n n
x
x x
N a N a
x x f x x
n x n x
μ
−
= =
−⎛ ⎞ ⎛ ⎞⎛ ⎞
= = =⎜ ⎟ ⎜ ⎟⎜ ⎟
−⎝ ⎠ ⎝ ⎠⎝ ⎠
∑ ∑ .
Now, using Equation (1),

( )
( )
( ) ( ) ( )
( ) ( )
( ) ( )
( )
( )
( ) ( )
( )
( ) ( )
( )
0 1 1
1 1
0 0
1 11 !!
1 1! ! 1 ! 1 1 !
1 1 1 11 ! 1
1 1! 1 !
1
1
n n n
x x x
n n
x x
N aa aa N a N axa
x
n xx n x n xx n x x n x
N a N aa a
a a
n x n xxx n x
N
a
n
= = =
− −
= =
⎛ ⎞− − −−− −⎛ ⎞⎛ ⎞ ⎛ ⎞
= = ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟− − −− −− ⎡ ⎤− − − −⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
⎛ ⎞ ⎛ ⎞− − − − − −− −⎛ ⎞
= =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟− − − −⎡ ⎤− − ⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦
−⎛ ⎞
= ⎜ ⎟
−⎝ ⎠
∑ ∑ ∑
∑ ∑
This gives that ( )
( )
( ) ( )
( )1
0
1 ! ! !1
1 1 ! ! !
n
x
x
N n N nN N na
x x f x a a
n n n N n N N
μ
−
=
− −−⎛ ⎞ ⎛ ⎞
= = = = ⋅ =⎜ ⎟ ⎜ ⎟
− − −⎝ ⎠ ⎝ ⎠
∑ .
Using the notation of the binomial distribution that
a
p
N
= , we see that the expected value
of x is the same for both drawing without replacement (the hypergeometric distribution)
and with replacement (the binomial distribution).
x
na
x np
N
μ = = = (2)
The variance of the hypergeometric distribution can be computed from the generic
formula that
2 22 2
x x x x xσ ⎡ ⎤= − = −⎣ ⎦ . Again from Equation (1),
( )
( )
( )
( )( )
( ) ( ) ( )
( ) ( )
( ) ( )
( )
( )
( )
( ) ( )
( )
( )
( ) ( )
( )
0 2 2
2 2
0 0
2 21 ! 1 2 !
1
2 2! ! 2 ! 2 2 !
2 2 2 22 ! 2
1 1
2 2! 2 !
n n n
x x x
n n
x x
N ax x a a a aa N a N a
x x
n xx n x n xx n x x n x
N a N aa a
a a a a
n x n xxx n x
= = =
− −
= =
⎛ ⎞− − −− − −− −⎛ ⎞⎛ ⎞ ⎛ ⎞
− = = ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟− − −− −− ⎡ ⎤− − − −⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
⎛ ⎞ ⎛ ⎞− − − − − −− −⎛ ⎞
= − = −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟− − − −⎡ ⎤− − ⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦
∑ ∑ ∑
∑
( )
2
1
2
N
a a
n
−⎛ ⎞
= − ⎜ ⎟
−⎝ ⎠
∑
So,
( ) ( ) ( )
( )
( )
( ) ( )
( ) ( ) ( )
( )
1 1
0
2
1 1 1
2
2 ! ! ! 1 1
1
2 ! ! ! 1
n
x
N a N a N N
x x x x a a
n x n x n n
N N n n a a n n
a a
n N n N N N
− −
=
− −⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞
− = − = −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟
− −⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠
− − − −
= − ⋅ =
− − −
∑
and
( )
( ) ( )
( )
( )( )
( )
2 1 1 1 1
1 1
1 1
a a n n a nan an
x x x x
N N N N N
⎡ ⎤− − − −
= − + = + = +⎢ ⎥
− −⎢ ⎥⎣ ⎦
.

Thus,
( )( ) ( )( )
( )
( )
( )
( )
( )
( ) ( )
( ) ( )
( )
( )( )
( )
22 2
2 2
1 1 1 1 1 1
1
1 1 1 1
1 1
1 1
x
a n N a n N N an Nan an an
x x
N N N N N N N N N N
an Nan Na Nn N N N Nan an an N Na Nn an
N N N N N N
N N a n N a N n N aan an an N a
N N N N N N N N
σ
⎡ ⎤⎡ ⎤− − − − − −
= − = + − = + −⎢ ⎥⎢ ⎥
− − − −⎢ ⎥⎣ ⎦ ⎣ ⎦
⎡ ⎤ ⎡ ⎤− − + + − − + − − +
= =⎢ ⎥ ⎢ ⎥
− −⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦
⎡ ⎤ ⎡ ⎤− − − − − −⎛
= = =⎢ ⎥ ⎢ ⎥ ⎜− − ⎝⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦
( )
1
1 1
1 1
N n
N
an a N n N n
np p
N N N N
−⎞⎛ ⎞
⎟⎜ ⎟−⎠⎝ ⎠
− −⎛ ⎞⎛ ⎞ ⎛ ⎞
= − = −⎜ ⎟⎜ ⎟ ⎜ ⎟
− −⎝ ⎠⎝ ⎠ ⎝ ⎠
The last factor
1
N n
N
−⎛ ⎞
⎜ ⎟−⎝ ⎠
is called the “finite population correction” and is the reason that
the variance of the binomial distribution ( )1np p− differs from the hypergeometric
distribution. For N large compared to the sample size n, the two distributions are
essentially identical.

Hypergeometric Distribution Mean and Variance

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (10)

Similar to Hypergeometric Distribution Mean and Variance

Similar to Hypergeometric Distribution Mean and Variance (20)

Recently uploaded

Recently uploaded (20)

Hypergeometric Distribution Mean and Variance