4. Outline
Concept of a random variable
Binomial distribution
Poisson distribution
4
5. 5
Suppose we consider any measureable
characteristics of a population, such as the
household size of all houses in a city. Because this
characteristics can take different values, we refer
to it as a variable.
If we were to select one household at random from
this population, the value of this variable is
determined. Because of the value is determined
through a random sampling, we call it a random
variable.
The “random” of a
random variable is
from the random
sampling process
6. Discrete vs. Continuous RV
A discrete random variable is a random variable
that can take on only a finite or at most a
countable infinite number of values.
The total number of heads turns up when flipping a coin
three times: [0, 1, 2, 3]
The number of accidents occurred in Redlands per day
A continuous random variable is the random
variable that can take on a continuum of values
Commute distance / annual rainfall or temperature
6
For example, GPA.
A course GPA is a
Discrete RV. The
random variable only
take on a finite value.
But the average
GPA? It is continuous.
7. Probability Function
A table, graph, or mathematical function that describes the
potential values of a random variable X and their
corresponding probabilities is a probability function.
It describes the frequency distribution of the variable.
7
8. Probability Mass Function
The probability distribution of a discrete
random variable is specified by a probability
mass function or the frequency function.
We use uppercase letter X to denote random
variable and lowercase x to denote a specific value.
8
1
)
(
and
)
(
)
(
1
k
i
i
i
i x
P
x
X
P
x
P
The sum of all xi should always be: 1
9. Probability Mass Function
0 1 2 3
9
Example
Flipping a coin three times. Define X to be the total
number of heads that turns up
P(X=0) = 1/8 P(X=1) = 3/8
P(X=2) = 3/8 P(X=3) = 1/8
Flip the coin three
times. It can either be
head OR tail. So, the
number of possible
outcome is: 23
{HHH, HHT, HTH,
THH, HTT, TTH, THT,
TTT}
AND is multiplication;
OR is addition.
Head AND Head
AND Head.
∴
1
2
×
1
2
×
1
2
=
1
23
=
1
8
10. Probability on an Interval
What is the probability that the total number of heads turns up is less
than 2 when flipping a coin three times?
P(X<2) = ? x=0 OR x=1; P = 0.5
P(1<X<3) = ? x=2; P=0.375
P(1≤X<3) = ? x=1 OR x=2 P=0.75
P(1≤X≤3) = ? x=1 OR x=2 OR x=3 P=0.875
10
[1,3] = P(1≤X≤3)
(1,3) = P(1<X<3)
11. Expected Value of a Discrete RV
Expected value of a discrete RV is the average value it takes.
11
i
i
i x
P
x
X
E )
(
)
(
xi P(xi) xi*P(xi)
0 0.125 0
1 0.375 0.375
2 0.375 0.75
3 0.125 0.375
E(X) = 1.5
This is the probability
of different number of
head turns up.
Remember IDW?
The sum of weight is
also 1.
Flipping a coin three
times. Define X to be the
total number of heads
that turns up
This is the Expected
value of the total
number of heads
turns up when
tossed three times.
Which is the sum of
these 4 items.
This is the number
of heads turns up
when the coin is
tossed 3 times.
Either 0 or 1 or 2 or
3.
12. Binomial Distribution
Bernoulli Trial
Each trial results in one of two possible outcomes
(“success”/“failure” or “head”/”tail”)
The probability of success is constant and equal to p
on each trial (the probability of failure is 1-p)
Binomial Distribution
The process of interest consists of n independent
Bernoulli trials with the probability of success in each
trial as p
The total number of successes, X, is a binomial
random variable with parameters n and p.
12
Suppose that n independent experiments
are performed, where n is fixed number,
and each experiment results in a success
with probability p and a failure with 1-p. the
total number of successes, X, is a binomial
random variable with parameters n and p.
13. What is the probability that one head would turn up when a fair
coin is tossed three times?
1st 2nd 3rd
H T T 0.5*0.5*0.5= (0.51* 0.52) =0.125
T H T 0.5*0.5*0.5 =(0.51* 0.52) =0.125
T T H 0.5*0.5*0.5 =(0.51* 0.52) =0.125
P(X=1) = 3 × 0.51
× 0.52
= 0.375
13
𝐶 𝑛, 𝑟 =
𝑛
𝑟
=
3
1
=
𝑛!
𝑟! 𝑛 − 𝑟 !
= 3
AND is multiplication;
OR is addition.
HTT,
OR THT,
OR TTH;
∴0.125 +0.125 +0.125 =0.375
14. What is the probability that two “2” would turn up if a dice is rolled four
times?
A B C D
1 1 0 0 1/6*1/6*5/6*5/6 =(1/6)2* (5/6)2
1 0 1 0 1/6*5/6*1/6*5/6 =(1/6)2* (5/6)2
1 0 0 1 1/6*5/6*5/6*1/6 =(1/6)2* (5/6)2
0 1 1 0 5/6*1/6*1/6*5/6 =(1/6)2* (5/6)2
0 1 0 1 5/6*1/6*5/6*1/6 =(1/6)2* (5/6)2
0 0 1 1 5/6*5/6*1/6*1/6 =(1/6)2* (5/6)2
14
P(X=2) = 6*(1/6)^2*(5/6)^2 = 0.116
Whatever other than “2”
The four trial/roll
“2”(Success): 1
Not “2”(Failure): 0
𝐶 𝑛, 𝑟 =
𝑛
𝑟
=
4
2
=
𝑛!
𝑟! 𝑛 − 𝑟 !
= 6
15. Frequency Function
Part I: any particular sequence of x
successes occurs with probability px (1-p)n-x
(multiplication law)
Part II: there are ways to assign x
successes to n trials
x
n
x
n
x
p
p
x
n
x
X
P
)
1
(
)
(
15
AND is multiplication;
OR is addition.
Among the n trials, the frequency for the situation to happen
𝑛
𝑥
AND Probability of something successfully happening x times (𝑝𝑥
)
AND Probability of something NOT successfully happening in the REST of the trials ( 1 − 𝑝 𝑛−𝑥
)
…which is the Probability Function, see p.7
Probability Function “describes the frequency distribution of the variable”
𝐶 𝑛, 𝑥 =
𝑛
𝑥
=
𝑛!
𝑥! 𝑛 − 𝑥 !
16. Binomial -> Poisson
16
Consider this situation…
Suppose you are a transportation planner, and
you are concerned about the safety of particular
intersection. During the last 60 days, there were 3
accidents each occurring on separate days. You
are asked to estimate the probability that 2 will
occur during the next 30 days.
Among the n trials, the frequency for the situation to happen
𝑛
𝑥
=
30
2
AND Probability of something successfully happening x times (𝑝𝑥
) =
3
60
2
= 0.05 2
AND Probability of something NOT successfully happening in the REST of the trials ( 1 − 𝑝 𝑛−𝑥
) =
57
60
30−2
= 0.95 28
17. Binomial -> Poisson (cont.)
17
Solution – the binomial distribution
If we define observing the traffic accident per day
as a Bernoulli trial, the number of days in which
an accident occurs is a binomial random variable
However, it is possible to have more than one
accidents per day. So we can take half day as the
analysis unit
Among the n trials, the frequency
for the situation to happen
𝑛
𝑥
30
2
AND Probability of something
successfully happening x times (𝑝𝑥
)
3
60
2
= 0.05 2
AND Probability of something NOT
successfully happening in the
REST of the trials ( 1 − 𝑝 𝑛−𝑥
)
57
60
30−2
= 0.95 28
2586
.
0
95
.
0
05
.
0
)
2
,
30
(
)
2
(
95
.
0
1
,
05
0
60
3
,
30
28
2
C
X
P
p
.
/
p
n
18. Binomial -> Poisson
18
Solution – the binomial distribution
X is defined as the number of half days in which
one accident occurs
Again, the choice of time unit is artificial. We
can continue to divide the day into smaller time
periods.
2548
.
0
975
.
0
025
.
0
)
2
,
60
(
)
2
(
975
.
0
1
,
025
0
120
3
,
60
58
2
C
X
P
p
.
/
p
n
0
,
p
n
Among the n trials, the frequency
for the situation to happen
𝑛
𝑥
60
2
AND Probability of something
successfully happening x times
(𝑝𝑥
)
3
120
2
= 0.025 2
AND Probability of something
NOT successfully happening in
the REST of the trials ( 1 − 𝑝 𝑛−𝑥
)
57
120
60−2
= 0.975 58
19. Binomial -> Poisson
The Poisson distribution can be defined as the
limiting case of the binomial distribution:
Poisson distribution can be used as the
approximation of Binomial distribution for large n
and small p
n
np constant
19
20. Poisson Distribution
The process of interest consists of events that occur
repeatedly and randomly within certain time period or
space
Traffic accidents in Redlands / Tornados in Columbus (Ohio)
Events are independent of past or future occurrences
The occurrence of an event has a constant mean rate
or density (underlying process governing the
phenomenon must be invariant)
The random variable of interest, X, is the number of
events occurring within a given unit of time, area,
volume, and etc.
20
The Poisson distribution is sometimes known as the
Law of Small Numbers, because it describes the
behavior of events that are rare
The probability that an event will occur within a
given unit must be the same for all units (i.e. the
underlying process governing the phenomenon must
be invariant)
3. The number of events occurring per unit must be
independent of the number of events occurring in
other units (no interactions)
4. The mean or expected number of events per unit
(λ) is found by past experience (observations)
The “counts” of events
21. Poisson Distribution (cont.)
Frequency function
The mean or expected number of events (λ) is found
by past experience (observations)
where e = 2.71828 (base of the natural logarithm)
λ = the mean or expected value
(for THE given time, expected value, not actually happening)
(The mean or expected number of events per unit)
x = 1, 2, …, n – 1, n # of occurrences
!
)
(
x
e
x
X
P
x
21
Number of trials: n
Probability of success: p
𝑛𝑝 = 𝜆
The probability that an event will occur within a
given unit must be the same for all units (i.e. the
underlying process governing the phenomenon must
be invariant)
3. The number of events occurring per unit must be
independent of the number of events occurring in
other units (no interactions)
4. The mean or expected number of events per unit
(λ) is found by past experience (observations)
22. Poisson Distribution (cont.)
λ
22
λ affect the skew: The larger the λ, the
more symmetrical it becomes.
Notice that the Poisson Distribution is for
relatively rare incidents, such as
accidents and cancer. If the frequency is
relatively high, we should use normal
distribution.
23. Example 1
Three(3) accidents were observed in last 60 days. Find
the probability of observing x accidents in the next 30
days
Solution:
1. Random variable X: the # of accidents occurred during
the 30-day period
2. The mean number of accidents during the 30-day
period is constant and equal to = 3/2 = 1.5
3. Find the probability observing x accidents during the
30-day period. That is, find the value of P(X = x)
23
!
)
(
x
e
x
X
P
x
3
60
× 30 =
3
2
= 1.5
[3 accidents per 60 days]
When time period is 30
days, the mean number of
accidents would be 1.5
(accidents).
25. Example 2
A disease occurs randomly in space, with one(1)
incident every 16 square kilometers. What is the
probability of finding four(4) incidents in a 30
square kilometer area?
25
26. Example 2
Solution:
1. Random variable X: the # of incidents in a 30 square
kilometer area
2. The mean number of incidents in a 30 square
kilometer area equals to = 30/16 = 1.875
3.
26
079
.
0
!
4
875
.
1
)
4
(
4
875
.
1
e
X
P
One(1) every 16, the 𝜆, mean number of incidents in a 30 km2
area, is 30/16, which is 1.875 (incidents)
27. Binomial vs. Poisson
If a mean or average probability of an event happening per unit
time/per page/per mile cycled etc., is given, and you are asked to
calculate a probability of n events happening in a given
time/number of pages/number of miles cycled, then the Poisson
Distribution is used. You do not know the number of trials.
If, on the other hand, an exact probability of an event happening is
given, or implied, in the question, and you are asked to calculate
the probability of this event happening k times out of n, then the
Binomial Distribution must be used. You know the number of
trials.
27
http://personal.maths.surrey.ac.uk/st/J.Deane/Teac
h/se202/poiss_bin.html
Expected value = 𝜆
Variance = 𝜆
Expected value = 𝑛𝑝
Variance = 𝑛𝑝(1 − 𝑝)
28. Binomial vs. Poisson
The Binomial and Poisson distributions are similar, but they are different. Also, the fact that they are both
discrete does not mean that they are the same. The Geometric distribution and one form of the Uniform
distribution are also discrete, but they are very different from both the Binomial and Poisson distributions.
The difference between the two is that while both measure the number of certain random events (or
"successes") within a certain frame, the Binomial is based on discrete events, while the Poisson is based
on continuous events. That is, with a Binomial distribution you have a certain number, n, of "attempts,"
each of which has probability of success p. With a Poisson distribution, you essentially have infinite
attempts, with infinitesimal chance of success. That is, given a Binomial distribution with some n,p, if
you let n→∞ and p→0 in such a way that np→λ, then that distribution approaches a Poisson distribution
with parameter λ.
Because of this limiting effect, Poisson distributions are used to model occurrences of events that could
happen a very large number of times but happen rarely. That is, they are used in situations that would
be more properly represented by a Binomial distribution with a very large n and small p, especially when the
exact values of n and p are unknown. (Historically, the number of wrongful criminal convictions in a country)
28
29. Exercise
A typist makes on average 2 mistakes per page. What is the probability of a
particular page having no errors on it? P
A computer crashes once every 2 days on average. What is the probability
of there being 2 crashes in one week? P
Components are packed in boxes of 20. The probability of a component
being defective is 0.1. What is the probability of a box containing 2 defective
components? B
ICs are packaged in boxes of 10. The probability of an IC being faulty is 2%.
What is the probability of a box containing 2 faulty ICs? B
The mean number of faults in a new house is 8. What is the probability of
buying a new house with exactly 1 fault? P
A box contains a large number of washers; there are twice as many steel
washers as brass ones. Four washers are selected at random from the box.
What is the probability that 3 are brass? B
29
http://personal.maths.surrey.ac.uk/st/J.Deane/Teac
h/se202/poiss_bin.html
n=20, p=0.1, x=2
n=10, p=2%, x=2
n=4, p=0.33, x=3
x
n
x
p
p
x
n
x
X
P
)
1
(
)
(
30. Suppose 30 events are randomly distributed among 35
equally sized grid cells, how many of the grid cells are
expected to have one event?
30
33. Continuous Random Variable
A continuous random variable is the
random variable that can take on a
continuum of values
Travel distance / the magnitude of a flood
For a continuous random variable, the role
of frequency function is taken by a density
function, f(x)
33
34. Probability Density Function
Probability distribution of a continuous random variable
is expressed by its probability density function (PDF), f (x),
which has the following properties
1) f(x) > 0
2) f is piecewise continuous
3) −∞
∞
𝑓 𝑥 𝑑𝑥 = 1
f(x) is often represented by a graph or an equation
34
“The total area under
the curve”
= “The Sum of all
possibility”
= 1
𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑑𝑥
−∞
∞
𝑓 𝑥 𝑑𝑥
𝑓 𝑋 = 𝑥 = 0
The probability of one particular value
only, is zero. Because what we look at is
the area, since the probability is tied to
the area. And because the length of one
particular value only is so thin, such that
the length approach to zero. And hence
the area and probability as well.
Therefore, 𝑓 𝑋 = 𝑥 = 0
35. Probability on an Interval
If X is a random variable with density function f, then
for any a < b, the probability that X falls in the interval
(a, b) is the area under the density function between
a and b:
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 =
𝑎
𝑏
𝑓 𝑥 𝑑𝑥
35
a b
𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑑𝑥
36. Uniform Distribution
Uniform distribution
The process of interest consists of equally likely
outcomes
Probability density function
𝑓 𝑥 =
1
𝑏 − 𝑎
; 𝑎 ≤ 𝑥 ≤ 𝑏
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
36
37. Uniform Distribution (cont.)
Probability on an interval
a
b
c
d
dx
a
b
dx
a
b
c
F
d
F
d
X
c
P
c
a
d
a
1
1
)
(
)
(
)
(
37
38. Uniform Distribution - Example
The annual mean temperature is uniformly distributed
between 10ºC and 18ºC. Find the probability that the
annual mean temperature falls in between 12ºC and
15ºC.
a = 10, b = 18, c = 12, d = 15
What is the probability that annual mean temperature is
greater than 15 ºC? What is the probability that annual
temperature is less than 13 ºC?
)
18
,
10
(
~ U
X
38
8
3
10
18
12
15
)
15
12
(
a
b
c
d
X
P
The probability is only related
to the length of the interval,
instead of the location of
interval, given that the
interval is the defined
between a and b.
39. Normal Distribution
It was proposed by Karl Friedrich Gauss as a
model for measurement errors.
It is also called Gaussian distribution.
The most common and important probability
distribution
Most naturally occurring variables are distributed
normally (e.g. heights, weights, annual temperature
variations, test scores, IQ scores, etc.)
Foundation in probability and statistics
39
A normal distribution can also
be produced by tracking the
errors made in repeated
measurements of the same
thing; Karl Friedrich Gauss
was a 19th century astronomer
who found that the distribution
of the repeated errors of
determining the position of the
same star formed a normal (or
Gaussian) distribution
40. Normal Distribution (cont.)
Probability density function
The normal distribution is a continuous distribution
that is symmetric and bell-shaped.
2
2
2
/
)
(
2
2
1
)
(
)
,
(
~
x
e
x
f
N
X
𝜇 – population mean
𝜎2 – population variance
𝜎 – population Standard deviation
40
42. Properties of Normal Distribution
Symmetry: values below μ are just as likely as values above
μ.
Center: f(x) has maximum value for x = μ, so values close to μ
are the most likely to occur.
Dispersion: the density is “wider” for large σ compared to
small values of σ (for fixed μ), so the larger σ the more likely
are observations far from μ.
42
43. Normal Distribution (cont.)
Probability on an interval
The areas under normal curves can be obtained from standard
normal tables. Therefore, it is necessary to standardize normal
distributions.
2
1
)
(
2
2
2
/
)
-
(x
-
b
a
dx
e
b
X
a
P
43
44. Standard Normal Distribution
Standard normal distribution
The special case of normal distribution which has
and
0
1
2
2
/
2
2
1
)
(
)
1
,
0
(
~
z
e
z
f
N
Z
44
46. 46
We would expect the interval as
±1.96 to contain approximately
95% of the observations. This
corresponds to a commonly used
rule-of-thumb that roughly 95% of
the observations are within[-2, 2].
Similar computations are made
for other percentages.
47. The standardization is achieved by converting the
data into z-scores
Example 1
population mean and variance are known
The annual precipitation 𝑋~𝑁 80, 402
, what is the z-
score of x = 150mm?
Standardization of Normal Distributions
75
.
1
40
80
150
40
80
i
x
z
47
“z-score”, which in
other words is “how
many S.D. is x
deviated from the
mean.
By referring to the
Standard Normal
Table, we can
calculate the
possibility of x
laying above, below,
or between certain
numbers.
𝑍~𝑁 0, 12
𝑋~𝑁 𝜇, 𝜎2
𝑧 =
𝑥𝑖 − 𝜇
𝜎
48. Standardization of normal distributions
Example 2
Sample mean and variance
are known
Step I:
Sample mean
𝑥 = 59.7
Sample standard deviation
𝑠 = 12.97
Step II:
Month T (°F) Z-score
J 39.53 -1.56
F 46.36 -1.03
M 46.42 -1.02
A 60.32 0.05
M 66.34 0.51
J 75.49 1.22
J 75.39 1.21
A 77.29 1.36
S 68.64 0.69
O 57.57 -0.16
N 54.88 -0.37
D 48.2 -0.89 s
x
x
z i
48
𝑥=Sample mean; 𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛;
𝜇 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛; 𝜎 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛;
49. Calculating Probabilities from a Normal Distribution
Consider this problem
Suppose that the annual precipitation was normally
distributed with mean 80 mm per year and standard
deviation 40 mm. What is the probability that the
annual precipitation is greater than 150 mm?
Solution:
1. calculate the z score(s)
49
75
.
1
40
80
150
40
80
x
z
56. Random variable X follows a normal
distribution with mean of µ and variance of 2.
Find out the interval that contains of middle
95% of the data.
56
58. Are data normally distributed?
58
Compared the observed histogram to a normal curve
that has sample mean and sample standard deviation.
59. Normal Q-Q plot
59
A normal quantile-
quantile plot compares
the sample quantiles
to those of the normal
distribution. If data are
N(μ,σ2) distributed, the
points in the QQ-plot
should be scattered
around the straight
line.
Straight line in Q-Q plot = normal distribution
62. Estimation
Estimate population parameters based on
sample data
Two types of estimate
Point estimate
Interval estimate
62
63. Mean
Population parameter:
Point estimate: 𝑥
Standard Deviation
Population parameter:
Point estimate: s
Point Estimate
63
n
i
i
x
n
x
1
1
1
)
(
1
2
n
x
x
s
n
i
i
The first type of estimation
“True mean”
“Sample mean”
“Sample s. d.”
“population s. d.”
64. Sampling Error
Sampling error is the difference between the value of a
population characteristics and the value of that
characteristics inferred from a sample.
Example: consider the population characteristic of the
average selling price of homes in Redlands in 2009. If
every house is examined, the average selling price is
$200,000. If only 25 homes per month are sampled, the
average selling price of the 300 homes is $230,000.
The sampling error is $200,000-$230,000 = -$30,000
Sampling Error cannot
be removed.
65. Interval Estimate
It is very unlikely that sample point estimates will
exactly equal the true population parameters due to
uncertainty in probability sampling.
To determine how good our point estimates are, we
could extend a point estimate to an interval within
which the population parameter lies.
65
x
The second type of estimation
66. Confidence Level and Interval
In probabilistic terms, we like to attach some measure
of certainty (confidence) to our interval estimates.
What does the 90% confidence level mean?
The chance that the interval estimate containing the true
mean is 90%.
In other words, the probability that the true mean falls in the
interval is 90%.
This interval is called 90% confidence interval.
66
So, let’s say we repeated
the sampling process
many of time, and there
are many sets of Sample.
The sample mean from
each Sample set are
different, and hence they
form an interval.
So, is the true
population mean (𝝁)
falls within the interval?
Maybe.
The wider the interval it
is, the higher chance it is
in the interval. Technically
if the interval is −∞ ≤ 𝑥 ≤
∞, which is from negative
infinity to infinity, the
chance is always 100%,
and the confident level
would always be 100%...
But that interval would be
meaningless.
Therefore, Statistically
we would apply a
tolerance level, that we
the probability is less
than 100%, but high
enough (90%, 95%,
99%, etc.) that the
interval is useful.
Then, here it is, the
concept of confidence
interval and the
confidence level.
67. How to obtain a confidence interval?
If we know the relationship between the sample mean and the
true mean, we can link them together.
The sampling distribution or probability distribution of the
sample mean reveals the relationship.
67
68. Sampling Distribution of Sample Mean
The sampling distribution of the sample mean can be
developed by taking all possible or many samples of size n
from a population, calculating the value of the mean for each
sample, and drawing the distribution of these values.
68
69. Sampling Distribution of Sample Mean
When the sampling process is repeated many times,
we could get many different samples, which give
different sample means.
http://www.ltcconline.net/greenl/java/Statistics/clt/cltsimulation.html
69
The larger a sample size,
the closer the sample
mean is to the true mean.
Hence, the larger the
sample size, the variance
of the sample means
would be smaller
Which mean this
frequency plot would be
narrower.
70. Central Limit Theorem (CLT)
70
Let X1, X2, X3… Xn be a random sample of size n drawn from
a population with mean and standard deviation .
Then for a large n, the sampling distribution of 𝑋 is
approximately normally distributed with mean and
standard deviation 𝜎 𝑛.
In a special case where X is normal, the distribution of 𝑋 is
exactly normal regardless of sample size.
The standard deviation of the sample mean 𝜎 𝑛 is also
called standard error.
𝑋: Sample mean
: Population Mean
𝜎 𝑛: Standard deviation of sample means, (=standard error)
σ: Standard deviation of population
𝑠: Standard deviation of a set of sample
The mean of the frequency
distribution of sample
mean is theoretically the
same as population mean
(𝝁). The Standard deviation
of the frequency distribution
of sample mean is (𝝈
𝒏
).
𝑋~𝑁 𝜇, 𝜎
𝑛
2
: The frequency distribution of sample mean
71. CLT (cont.)
The central limit theorem only applies to sample
mean, not for other sample statistics.
Generally, a sample size n ≥ 30 could be regarded
as sufficiently large so that the sampling distribution
of sample means is approximately normal.
The sample size is inversely related to the standard
error.
71
Central limit
theorem is used to
estimate the
sample mean and
sample mean only. The statement of
sample size 𝑛 ≥ 30,
is obtained by
comparing the T-
table and normal
distribution. Which
we will talk about in
later slides.
72. Notation for Confidence Level and Interval
Confidence level
Denoted by (1 – α) × 100%
Usually α = 0.05, could also be 0.10 or 0.01
Thus, the likelihood that we are wrong is α (also
called significance level)
The interval in which the true mean lies within
(1 – α) × 100% confidence (lies within the Confidence level) is
called (1 – α) × 100% confidence interval.
72
95% Confidence
level
5% likelihood of
being wrong
90% Confidence level
10% likelihood of being
wrong
99% Confidence level
1% likelihood of being
wrong
“Significance level”
the likelihood that
we are wrong
73. Basic Steps
Step 1: Standardize 𝑋
Step 2: find the z score
Step 3: calculate margin of error
Step 4: obtain the final CI (Confident Interval)
Interpretation
73
74. Suppose the sample size n is sufficiently large (n ≥
30), according to the central limit theorem, the
frequency distribution of sample mean is normal
with mean μ and standard deviation 𝜎 𝑛
Step 1: Standardize 𝑋
Step 1: Standardize 𝑋
74
𝑋~𝑁 𝜇, 𝜎
𝑛
2
𝑍 =
𝑋 − 𝜇
𝜎
𝑛
~𝑁 0, 1
75. Step 2: Find The Z Score
75
The three z-scores 1.65, 1.96, 2.58 are associated with three
confidence levels 1 – α (=0.90, 0.95, 0.99),
where α is 0.10, 0.05, and 0.01 respectively.
76. 𝑧𝛼 2 is a z score or z value that corresponds to
a tail area of α/2.
76
𝑧𝛼 2
α/2
77. Step 3: Calculate Margin Of Error
The range of values above and below the sample
statistic with a specified confidence.
Put differently
1
)
( ME
X
ME
X
P
77
n
z
ME
2
𝑀𝐸: “Margin
of Error”
78. Step 4: Obtain The Final CI(confidence Interval)
Add/subtract the margin of error from the
sample mean to get the CI.
78
]
96
.
1
,
96
.
1
[
then
,
05
.
0
If
]
,
[ 2
2
n
X
n
X
n
z
X
n
z
X
79. Interpretation
When α = 0.05, we can say that
“I am 95% confident that the mean of the population is
somewhere between 𝑥 − 1.96
𝜎
𝑛
and 𝑥 + 1.96
𝜎
𝑛
The true population mean μ should, 95% of the time,
lie within ±1.96
𝜎
𝑛
of sample mean.
95% of all confidence intervals that can be
constructed will contain the unknown true mean.
79
80. 80
After repeated sampling of 100 times, how many of the confidence
intervals would you expect to contain the true mean?
81. What influence CI?
Sample variance ↑, range of CI ↑
larger sample variability , higher uncertainty
wider CI
Sample size ↑, range of CI↓
Larger sample size n, more information
narrower CI
Confidence level ↑, range of CI ↑
Higher confidence level,
higher uncertainty to be accounted
wider CI
81
82. Some issues
How about small sample size (n < 30) ?
t-distribution unless the sample is drawn from a
normally distributed population
How about the population standard deviation σ is
unknown?
Use sample standard deviation s to approximate
population standard deviation σ
t-distribution, providing the population is normal
82
83. t-Distribution
When the sample size is not sufficiently large, the
frequency distribution of sample means has what is
known as the t distribution (or Student’s t
distribution)
t-distribution also copes with uncertainty resulting from
estimating the standard deviation from a sample,
whereas if the population standard deviation was
unknown
The overall shape of the probability density function of
the t-distribution resembles the bell shape of a
standard normal distribution, except that it is a bit
lower and wider.
t-distribution depends on a new parameter – degree of
freedom (df =n -1)
83
Student's t-distribution to
cope with uncertainty
resulting from estimating
the standard deviation from
a sample, whereas if the
population standard
deviation were known, a
normal distribution would be
used.
85. Using t distribution to construct CI
Population standard deviation is unknown
Population standard deviation is known but
sample size is small
85
]
,
[ 1
,
2
1
,
2
n
s
t
X
n
s
t
X n
n
]
,
[ 1
,
2
1
,
2
n
t
X
n
t
X n
n
86. Example question
A local bank needs information concerning the savings account
balances of its customers. A random sample of 15 accounts
was checked. The mean balance was $686 with a standard
deviation of $256. Which of the followings is the
95% confidence interval for the true mean?
The correct answer is: 686 − 2.15 ×
256
15
, 686 + 2.15 ×
256
15
[686-2.15*256/sqrt(15), 686+2.15*256/sqrt(15)]
86
88. Outline
What is hypothesis testing?
Errors in hypothesis testing
One-sample z-test
One-sample t-test
88
89. Consider this situation
A consumer advocacy group collects a random sample
of n = 100 light bulbs from a manufacture, and observe a
sample mean of 987 hours. Assume standard deviation
of all light bulbs is 40 hours. Estimate on average how
many hours of light can the light bulbs provide.
𝛼 = 0.05, 𝑧0.025 = 1.96
𝑀𝐸 = 𝑧0.025
𝑠
𝑛
= 1.96 ×
40
10
= 7.84
Confidence Interval [979.16, 994.84]
89
We are 95% confident that
the mean lifetime of light
bulb manufactured by the
manufacturer fall between
979.16 to 994.84 hours.
90. Consider a related situation
A consumer advocacy group thinks a manufacturer of light bulbs is mistaken
in their claim that their bulbs on average provide 1000 hours of light. They
believe the light bulbs are defective. To test this, they collect a random
sample of n = 100 light bulbs and observe a sample mean of 987 hours.
Assume standard deviation of all light bulbs is 40 hours.
90
We are 95% confident that
the mean lifetime of light
bulb manufactured by the
manufacturer fall between
979.16 to 994.84 hours.
We cannot say that, the sample mean of
one set of sample is 987 hour, so the claim
of manufacturer is false;
But with statistic and probability, we can
claim that what the manufacturer claimed in
false – because 95% of the chance that the
mean lifetime of light bulb manufactured by
the manufacturer would fall between 979.16
to 994.84 hours.
1000 hours fall outside of the 95%
range. In other word, the claim of
the manufacturer, at least 95% of
the case, is wrong.
91. If we assume the manufacture’s claim is true, we
would expect the average lifetime of the samples
is close to 1000h.
But how close is close? Is the sample mean of
987 close enough to the presumed value of 1000?
We need to quantify the closeness or difference
between the sample mean and the presumed
mean.
To do so, we may compare 987 to a threshold that is
deemed as “close enough”
91
Common sense is important
92. 95% of the time the sample mean will range between
992 and 1007. So we can take 992 and 1007 as the
thresholds for “close enough”.
However, there is small chance (<5%) that sample
mean falls out the range of 992 and 1007. So we
could be wrong if we conclude the true mean is not
1000. This 5% is called significance level
92
𝑋~𝑁(1000,
402
100
)
1000
987 992 1008
𝑃 𝑋 ≤ 992 = 0.025
Significant: the tails.
Something you do
not expect.
Confident: the center.
The range that you’re
confident.
For Hypothesis testing, we
start with the claim. Hence,
the mean is set to be 1000
hours. Next, we throw the
Margin of Error into the
graph.
By setting the mean at
1000, we can calculate the
probability that we get a
sample mean that is so
small that it is 987
93. Different problems will have different thresholds. Can we
obtain a standardized threshold that applies to all problems?
Yes, we would use the z score as a generic measure for
“closeness”.
Now let us take a look of basic steps.
93
94. Step 1: state a null hypothesis
Step 2: state alternative hypothesis
Step 3: choose a significance level
Step 4: calculate test statistic
Step 5: find critical value and region of rejection
Step 6: make a decision
94
Basic Steps of Hypothesis Testing
95. Step 1: state a null hypothesis
Step 1: state a null hypothesis
H0: μ = 1000
Note: the null hypothesis states this large random
sample is drawn from the population that has a
mean of 1000.
If the null hypothesis is true, we then can conclude
that the sample mean approximately follows
normal distribution (𝑋~𝑁(1000,
402
100
))
95
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.
96. Step 2: state alternative hypothesis
Alternative hypothesis
Two-sided hypothesis testing (test if the lifetime
differs from 1000 hours of light)
HA: μ ≠ 1000
One-sided hypothesis testing (test if the lightbulbs
provide less than 1000 hours of light)
HA: μ < 1000
96
Population parameter
Reverse of what the
experimenter believes
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.
97. Step 3: choose a significance level
α= 0.1, 0.05, 0.01
Example: α=0.05
The critical z value would be 1.65, 1.96, and 2.58
respectively
97
A result was said to be significant at the 5% level.
This means the result would be unexpected if the
null hypothesis were true.
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.
98. Step 4: calculate test statistic
If H0 is true, the CLT gives
Test statistic
z-score:
𝑧𝑡𝑒𝑠𝑡 =
𝑥 − 𝜇0
𝜎 𝑛
=
987 − 1000
40/10
= −3.25
98
𝑋~𝑁(1000,
402
100
)
if a large random sample is drawn from the population that
has a mean of μ0 and a standard deviation of σ
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.
99. Step 5: find critical value and region of rejection
Two-sided
HA :μ ≠ 100
Example
One-sided
Example
HA : μ < 1000
μ < μ0
99
±𝑧𝛼 2 = ±1.96
−𝑧𝛼 = -1.65
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.
100. Step 6: make a decision
Two-sided
Reject the null
hypothesis if
One-sided
Reject the null
hypothesis if
100
𝑧𝑡𝑒𝑠𝑡 > 𝑧∝ 2 𝑜𝑟 𝑧𝑡𝑒𝑠𝑡 < −𝑧∝ 2
𝑧𝑡𝑒𝑠𝑡 > 𝑧∝ 𝑓𝑜𝑟 𝜇 > 𝜇0
𝑧𝑡𝑒𝑠𝑡 < −𝑧∝ 𝑓𝑜𝑟 𝜇 < 𝜇0
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.
101. Step 6: make a decision
Example
Therefore, we can reject the null hypothesis. The lifetime of
lightbulbs is significantly less than 1000 hours at α=0.05.
101
𝑧𝑡𝑒𝑠𝑡 = −3.25,
−𝑧∝ = −1.65
𝑧𝑡𝑒𝑠𝑡< −𝑧∝
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.
102. What is hypothesis testing?
Now let us take a deeper look of hypothesis testing
Hypothesis
A proposition whose truth or falsity is capable of being
tested
102
103. Errors in Hypothesis Testing
Type I Error
“False Positive”
Rejecting a true hypothesis
The likelihood of making a type I error is denoted by α, referred to as
significance level
Type II Error
“False Negative”
Accepting a wrong hypothesis
The likelihood of making a type II error is denoted by β
103
104. Errors in Hypothesis Testing (cont.)
104
We want to control Type I
error, more than Type II.
In most cases, Type I error
would lead to more severe
consequences.
Take a trial as example.
The null hypothesis (H0) is
an assumption of
innocence.
In a Type I error,
The H0 is true: The person is
innocent.
The H0 is rejected: The
person is deemed guilty
This creates the following
consequence:
(1) an innocent person is
deemed guilty;
(2) the criminal is still out
there free from the
system.
Take the same trial as example.
In a Type II error,
H0 is false: the person is guilty;
H0 is accepted: the person is
deemed to be innocent.
This create the following
consequence:
(1) The criminal is deemed
innocent and released
Type I error, also known as
False Positive
Type II error, also known as
False Negative
105. Controlling Type I Error
It is almost always impossible to simultaneously minimize the
probability of both types of errors.
Classical hypothesis testing adopts the strategy of controlling α.
In making small α, we have small probability of making errors if we are able
to reject our hypothesis.
If we have evidence to reject null hypothesis, we will be confident in our
analysis.
The null hypothesis should be something we want to reject, rather than
something we want to confirm.
105
106. Population standard deviation σ is unknown
Sample size is small
Test Statistic
When H0 is true (the sample is drawn from the specified
population that has mean of μ0, T random variable follows
a student t-distribution with df = n-1
One-sample t-test
𝑇 =
𝑋 − 𝜇0
𝑆/ 𝑛
106
𝑇: T-value
107. Limitations of classic hypothesis testing
The specific significance level must be selected a
priori, and often arbitrary and lack of theoretical
basis
The final decision regarding the null and
alternative hypothesis is binary:
H0 is rejected or not rejected
More flexible method is needed
What is the exact significance level associated with
the test statistic?
107
108. p-value
The probability of getting a test statistic value as
extreme as or more extreme than that observed
by chance, if the null hypothesis H0 is true
If null hypothesis is rejected, p-value is the
probability of making a Type I error
The smaller the p-value, the more convincing to
reject the null hypothesis
108
Rejecting a true hypothesis
Typically, we can reject the null hypothesis, when p value is less than 10% (loose standard).
5% for spatial analysis
109. Determining p-value
Using Calculated z or t test statistic to determine p-value
p-value corresponds to the shaded area under the
standard normal (or t) curve
109
110. Light Bulbs Example
What is the p-value of the lightbulb test? Would
you reject your null hypothesis at α = 0.01
significance level?
110
111. But when to reject the Null Hypothesis?
111
When the p-value of our test
fall outside of the interval, in
other words, fall within the
shaded area, we can reject it.
Let’s say the p-value is 0.006.
We can reject the null
hypothesis in both 𝑝 < 0.1, 𝑝 <
0.05, and 𝑝 < 0.01 level.
Let’s say the p-value is 0.02.
We can reject the null
hypothesis in both 𝑝 < 0.1, and
𝑝 < 0.05 level, but not at the
𝑝 < 0.01 level.
Let’s say the p-value is 0.06.
We can reject the null
hypothesis only in 𝑝 < 0.1
level, but not at the 𝑝 < 0.05
and 𝑝 < 0.01 level.
Let’s say the p-value is 0.2. We
cannot reject the null
hypothesis in all the following
three (𝑝 < 0.1, 𝑝 < 0.05 and
𝑝 < 0.01) level.
113. 113
Null hypothesis (you want to reject)
No difference / no change / equal to “=”
Alternative hypothesis (research interest that you want to accept)
One sided: Different / not equal “≠”
Left sided: smaller than / less than “<”
Right sided: larger than / more than “>”
No sample statistics are included in both H0 and Ha (H1)