2 Review of Statistics. 2 Review of Statistics.

GIS 665 Geospatial Analysis
Inferential Statistics
Review of Statistics
1

Outline
 Probability distributions
 Confidence intervals
 Hypothesis testing
2

Discrete Probability Distributions
3
Section I

Outline
 Concept of a random variable
 Binomial distribution
 Poisson distribution
4

5
Suppose we consider any measureable
characteristics of a population, such as the
household size of all houses in a city. Because this
characteristics can take different values, we refer
to it as a variable.
If we were to select one household at random from
this population, the value of this variable is
determined. Because of the value is determined
through a random sampling, we call it a random
variable.
The “random” of a
random variable is
from the random
sampling process

Discrete vs. Continuous RV
 A discrete random variable is a random variable
that can take on only a finite or at most a
countable infinite number of values.
 The total number of heads turns up when flipping a coin
three times: [0, 1, 2, 3]
 The number of accidents occurred in Redlands per day
 A continuous random variable is the random
variable that can take on a continuum of values
 Commute distance / annual rainfall or temperature
6
For example, GPA.
A course GPA is a
Discrete RV. The
random variable only
take on a finite value.
But the average
GPA? It is continuous.

Probability Function
 A table, graph, or mathematical function that describes the
potential values of a random variable X and their
corresponding probabilities is a probability function.
 It describes the frequency distribution of the variable.
7

Probability Mass Function
 The probability distribution of a discrete
random variable is specified by a probability
mass function or the frequency function.
 We use uppercase letter X to denote random
variable and lowercase x to denote a specific value.
8
1
)
(
and
)
(
)
(
1


 

k
i
i
i
i x
P
x
X
P
x
P
The sum of all xi should always be: 1

Probability Mass Function
0 1 2 3
9
 Example
 Flipping a coin three times. Define X to be the total
number of heads that turns up
P(X=0) = 1/8 P(X=1) = 3/8
P(X=2) = 3/8 P(X=3) = 1/8
Flip the coin three
times. It can either be
head OR tail. So, the
number of possible
outcome is: 23
{HHH, HHT, HTH,
THH, HTT, TTH, THT,
TTT}
AND is multiplication;
OR is addition.
Head AND Head
AND Head.
∴
1
2
×
1
2
×
1
2
=
1
23
=
1
8

Probability on an Interval
 What is the probability that the total number of heads turns up is less
than 2 when flipping a coin three times?
P(X<2) = ? x=0 OR x=1; P = 0.5
 P(1<X<3) = ? x=2; P=0.375
 P(1≤X<3) = ? x=1 OR x=2 P=0.75
 P(1≤X≤3) = ? x=1 OR x=2 OR x=3 P=0.875
10
[1,3] = P(1≤X≤3)
(1,3) = P(1<X<3)

Expected Value of a Discrete RV
 Expected value of a discrete RV is the average value it takes.
11


i
i
i x
P
x
X
E )
(
)
(
xi P(xi) xi*P(xi)
0 0.125 0
1 0.375 0.375
2 0.375 0.75
3 0.125 0.375
E(X) = 1.5
This is the probability
of different number of
head turns up.
Remember IDW?
The sum of weight is
also 1.
Flipping a coin three
times. Define X to be the
total number of heads
that turns up
This is the Expected
value of the total
number of heads
turns up when
tossed three times.
Which is the sum of
these 4 items.
This is the number
of heads turns up
when the coin is
tossed 3 times.
Either 0 or 1 or 2 or
3.

Binomial Distribution
 Bernoulli Trial
 Each trial results in one of two possible outcomes
(“success”/“failure” or “head”/”tail”)
 The probability of success is constant and equal to p
on each trial (the probability of failure is 1-p)
 Binomial Distribution
 The process of interest consists of n independent
Bernoulli trials with the probability of success in each
trial as p
 The total number of successes, X, is a binomial
random variable with parameters n and p.
12
Suppose that n independent experiments
are performed, where n is fixed number,
and each experiment results in a success
with probability p and a failure with 1-p. the
total number of successes, X, is a binomial
random variable with parameters n and p.

What is the probability that one head would turn up when a fair
coin is tossed three times?
1st 2nd 3rd
H T T 0.5*0.5*0.5= (0.51* 0.52) =0.125
T H T 0.5*0.5*0.5 =(0.51* 0.52) =0.125
T T H 0.5*0.5*0.5 =(0.51* 0.52) =0.125
P(X=1) = 3 × 0.51
× 0.52
= 0.375
13
𝐶 𝑛, 𝑟 =
𝑛
𝑟
=
3
1
=
𝑛!
𝑟! 𝑛 − 𝑟 !
= 3
OR is addition.
HTT,
OR THT,
OR TTH;
∴0.125 +0.125 +0.125 =0.375

What is the probability that two “2” would turn up if a dice is rolled four
times?
A B C D
1 1 0 0 1/6*1/6*5/6*5/6 =(1/6)2* (5/6)2
1 0 1 0 1/6*5/6*1/6*5/6 =(1/6)2* (5/6)2
1 0 0 1 1/6*5/6*5/6*1/6 =(1/6)2* (5/6)2
0 1 1 0 5/6*1/6*1/6*5/6 =(1/6)2* (5/6)2
0 1 0 1 5/6*1/6*5/6*1/6 =(1/6)2* (5/6)2
0 0 1 1 5/6*5/6*1/6*1/6 =(1/6)2* (5/6)2
14
P(X=2) = 6*(1/6)^2*(5/6)^2 = 0.116
Whatever other than “2”
The four trial/roll
“2”(Success): 1
Not “2”(Failure): 0
𝐶 𝑛, 𝑟 =
𝑛
𝑟
=
4
2
=
𝑛!
𝑟! 𝑛 − 𝑟 !
= 6

Frequency Function
 Part I: any particular sequence of x
successes occurs with probability px (1-p)n-x
(multiplication law)
 Part II: there are ways to assign x
successes to n trials








x
n
x
n
x
p
p
x
n
x
X
P 










 )
1
(
)
(
15
OR is addition.
Among the n trials, the frequency for the situation to happen
𝑛
𝑥
AND Probability of something successfully happening x times (𝑝𝑥
)
AND Probability of something NOT successfully happening in the REST of the trials ( 1 − 𝑝 𝑛−𝑥
)
…which is the Probability Function, see p.7
Probability Function “describes the frequency distribution of the variable”
𝐶 𝑛, 𝑥 =
𝑛
𝑥
=
𝑛!
𝑥! 𝑛 − 𝑥 !

Binomial -> Poisson
16
 Consider this situation…
Suppose you are a transportation planner, and
you are concerned about the safety of particular
intersection. During the last 60 days, there were 3
accidents each occurring on separate days. You
are asked to estimate the probability that 2 will
occur during the next 30 days.
Among the n trials, the frequency for the situation to happen
𝑛
𝑥
=
30
2
AND Probability of something successfully happening x times (𝑝𝑥
) =
3
60
2
= 0.05 2
AND Probability of something NOT successfully happening in the REST of the trials ( 1 − 𝑝 𝑛−𝑥
) =
57
60
30−2
= 0.95 28

Binomial -> Poisson (cont.)
17
 Solution – the binomial distribution
If we define observing the traffic accident per day
as a Bernoulli trial, the number of days in which
an accident occurs is a binomial random variable
However, it is possible to have more than one
accidents per day. So we can take half day as the
analysis unit
Among the n trials, the frequency
for the situation to happen
𝑛
𝑥
30
2
AND Probability of something
successfully happening x times (𝑝𝑥
)
3
60
2
= 0.05 2
AND Probability of something NOT
successfully happening in the
REST of the trials ( 1 − 𝑝 𝑛−𝑥
)
57
60
30−2
= 0.95 28
2586
.
0
95
.
0
05
.
0
)
2
,
30
(
)
2
(
95
.
0
1
,
05
0
60
3
,
30
28
2










C
X
P
p
.
/
p
n

Binomial -> Poisson
18
 Solution – the binomial distribution
X is defined as the number of half days in which
one accident occurs
Again, the choice of time unit is artificial. We
can continue to divide the day into smaller time
periods.
2548
.
0
975
.
0
025
.
0
)
2
,
60
(
)
2
(
975
.
0
1
,
025
0
120
3
,
60
58
2










C
X
P
p
.
/
p
n
0
, 

 p
n
Among the n trials, the frequency
for the situation to happen
𝑛
𝑥
60
2
successfully happening x times
(𝑝𝑥
)
3
120
2
= 0.025 2
NOT successfully happening in
the REST of the trials ( 1 − 𝑝 𝑛−𝑥
)
57
120
60−2
= 0.975 58

Binomial -> Poisson
 The Poisson distribution can be defined as the
limiting case of the binomial distribution:
 Poisson distribution can be used as the
approximation of Binomial distribution for large n
and small p


n 
 
np constant
19

Poisson Distribution
 The process of interest consists of events that occur
repeatedly and randomly within certain time period or
space
 Traffic accidents in Redlands / Tornados in Columbus (Ohio)
 Events are independent of past or future occurrences
 The occurrence of an event has a constant mean rate
or density (underlying process governing the
phenomenon must be invariant)
 The random variable of interest, X, is the number of
events occurring within a given unit of time, area,
volume, and etc.
20
The Poisson distribution is sometimes known as the
Law of Small Numbers, because it describes the
behavior of events that are rare
The probability that an event will occur within a
given unit must be the same for all units (i.e. the
underlying process governing the phenomenon must
be invariant)
3. The number of events occurring per unit must be
independent of the number of events occurring in
other units (no interactions)
4. The mean or expected number of events per unit
(λ) is found by past experience (observations)
The “counts” of events

Poisson Distribution (cont.)
 Frequency function
 The mean or expected number of events (λ) is found
by past experience (observations)
where e = 2.71828 (base of the natural logarithm)
λ = the mean or expected value
(for THE given time, expected value, not actually happening)
(The mean or expected number of events per unit)
x = 1, 2, …, n – 1, n # of occurrences
!
)
(
x
e
x
X
P
x





21
Number of trials: n
Probability of success: p
𝑛𝑝 = 𝜆
The probability that an event will occur within a
given unit must be the same for all units (i.e. the
underlying process governing the phenomenon must
be invariant)
3. The number of events occurring per unit must be
independent of the number of events occurring in
other units (no interactions)
4. The mean or expected number of events per unit
(λ) is found by past experience (observations)

Poisson Distribution (cont.)
λ
22
λ affect the skew: The larger the λ, the
more symmetrical it becomes.
Notice that the Poisson Distribution is for
relatively rare incidents, such as
accidents and cancer. If the frequency is
relatively high, we should use normal
distribution.

Example 1
Three(3) accidents were observed in last 60 days. Find
the probability of observing x accidents in the next 30
days
Solution:
1. Random variable X: the # of accidents occurred during
the 30-day period
2. The mean number of accidents during the 30-day
period is constant and equal to  = 3/2 = 1.5
3. Find the probability observing x accidents during the
30-day period. That is, find the value of P(X = x)
23
!
)
(
x
e
x
X
P
x





3
60
× 30 =
3
2
= 1.5
[3 accidents per 60 days]
When time period is 30
days, the mean number of
accidents would be 1.5
(accidents).

x P(X = x)
0 e-1.5 1.50 / 0! = 0.2231
1 e-1.5 1.51 / 1! = 0.3374
2 e-1.5 1.52 / 2! = 0.2510
3 e-1.5 1.53 / 3! = 0.1255
Example 1
24

Example 2
A disease occurs randomly in space, with one(1)
incident every 16 square kilometers. What is the
probability of finding four(4) incidents in a 30
square kilometer area?
25

Example 2
Solution:
1. Random variable X: the # of incidents in a 30 square
kilometer area
2. The mean number of incidents in a 30 square
kilometer area equals to  = 30/16 = 1.875
3.
26
079
.
0
!
4
875
.
1
)
4
(
4
875
.
1





e
X
P
One(1) every 16, the 𝜆, mean number of incidents in a 30 km2
area, is 30/16, which is 1.875 (incidents)

Binomial vs. Poisson
 If a mean or average probability of an event happening per unit
time/per page/per mile cycled etc., is given, and you are asked to
calculate a probability of n events happening in a given
time/number of pages/number of miles cycled, then the Poisson
Distribution is used. You do not know the number of trials.
 If, on the other hand, an exact probability of an event happening is
given, or implied, in the question, and you are asked to calculate
the probability of this event happening k times out of n, then the
Binomial Distribution must be used. You know the number of
trials.
27
http://personal.maths.surrey.ac.uk/st/J.Deane/Teac
h/se202/poiss_bin.html
Expected value = 𝜆
Variance = 𝜆
Expected value = 𝑛𝑝
Variance = 𝑛𝑝(1 − 𝑝)

Binomial vs. Poisson
The Binomial and Poisson distributions are similar, but they are different. Also, the fact that they are both
discrete does not mean that they are the same. The Geometric distribution and one form of the Uniform
distribution are also discrete, but they are very different from both the Binomial and Poisson distributions.
The difference between the two is that while both measure the number of certain random events (or
"successes") within a certain frame, the Binomial is based on discrete events, while the Poisson is based
on continuous events. That is, with a Binomial distribution you have a certain number, n, of "attempts,"
each of which has probability of success p. With a Poisson distribution, you essentially have infinite
attempts, with infinitesimal chance of success. That is, given a Binomial distribution with some n,p, if
you let n→∞ and p→0 in such a way that np→λ, then that distribution approaches a Poisson distribution
with parameter λ.
Because of this limiting effect, Poisson distributions are used to model occurrences of events that could
happen a very large number of times but happen rarely. That is, they are used in situations that would
be more properly represented by a Binomial distribution with a very large n and small p, especially when the
exact values of n and p are unknown. (Historically, the number of wrongful criminal convictions in a country)
28

Exercise
 A typist makes on average 2 mistakes per page. What is the probability of a
particular page having no errors on it? P
 A computer crashes once every 2 days on average. What is the probability
of there being 2 crashes in one week? P
 Components are packed in boxes of 20. The probability of a component
being defective is 0.1. What is the probability of a box containing 2 defective
components? B
 ICs are packaged in boxes of 10. The probability of an IC being faulty is 2%.
What is the probability of a box containing 2 faulty ICs? B
 The mean number of faults in a new house is 8. What is the probability of
buying a new house with exactly 1 fault? P
 A box contains a large number of washers; there are twice as many steel
washers as brass ones. Four washers are selected at random from the box.
What is the probability that 3 are brass? B
29
http://personal.maths.surrey.ac.uk/st/J.Deane/Teac
h/se202/poiss_bin.html
n=20, p=0.1, x=2
n=10, p=2%, x=2
n=4, p=0.33, x=3
x
n
x
p
p
x
n
x
X
P 










 )
1
(
)
(

 Suppose 30 events are randomly distributed among 35
equally sized grid cells, how many of the grid cells are
expected to have one event?
30

Normal Distribution
31
Section II

Outline
 Probability density function
 Uniform probability distribution
 Normal distribution
32

Continuous Random Variable
 A continuous random variable is the
random variable that can take on a
continuum of values
 Travel distance / the magnitude of a flood
 For a continuous random variable, the role
of frequency function is taken by a density
function, f(x)
33

Probability Density Function
 Probability distribution of a continuous random variable
is expressed by its probability density function (PDF), f (x),
which has the following properties
1) f(x) > 0
2) f is piecewise continuous
3) −∞
∞
𝑓 𝑥 𝑑𝑥 = 1
 f(x) is often represented by a graph or an equation
34
“The total area under
the curve”
= “The Sum of all
possibility”
= 1
𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑑𝑥
−∞
∞
𝑓 𝑥 𝑑𝑥
𝑓 𝑋 = 𝑥 = 0
The probability of one particular value
only, is zero. Because what we look at is
the area, since the probability is tied to
the area. And because the length of one
particular value only is so thin, such that
the length approach to zero. And hence
the area and probability as well.
Therefore, 𝑓 𝑋 = 𝑥 = 0

Probability on an Interval
 If X is a random variable with density function f, then
for any a < b, the probability that X falls in the interval
(a, b) is the area under the density function between
a and b:
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 =
𝑎
𝑏
𝑓 𝑥 𝑑𝑥
35
a b
𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑇ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑑𝑥

Uniform Distribution
 Uniform distribution
 The process of interest consists of equally likely
outcomes
𝑓 𝑥 =
1
𝑏 − 𝑎
; 𝑎 ≤ 𝑥 ≤ 𝑏
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
36

Uniform Distribution (cont.)
 Probability on an interval
a
b
c
d
dx
a
b
dx
a
b
c
F
d
F
d
X
c
P
c
a
d
a













1
1
)
(
)
(
)
(
37

Uniform Distribution - Example
 The annual mean temperature is uniformly distributed
between 10ºC and 18ºC. Find the probability that the
annual mean temperature falls in between 12ºC and
15ºC.
a = 10, b = 18, c = 12, d = 15
 What is the probability that annual mean temperature is
greater than 15 ºC? What is the probability that annual
temperature is less than 13 ºC?
)
18
,
10
(
~ U
X
38
8
3
10
18
12
15
)
15
12
( 








a
b
c
d
X
P
The probability is only related
to the length of the interval,
instead of the location of
interval, given that the
interval is the defined
between a and b.

Normal Distribution
 It was proposed by Karl Friedrich Gauss as a
model for measurement errors.
 It is also called Gaussian distribution.
 The most common and important probability
distribution
 Most naturally occurring variables are distributed
normally (e.g. heights, weights, annual temperature
variations, test scores, IQ scores, etc.)
 Foundation in probability and statistics
39
A normal distribution can also
be produced by tracking the
errors made in repeated
measurements of the same
thing; Karl Friedrich Gauss
was a 19th century astronomer
who found that the distribution
of the repeated errors of
determining the position of the
same star formed a normal (or
Gaussian) distribution

Normal Distribution (cont.)
 The normal distribution is a continuous distribution
that is symmetric and bell-shaped.
2
2
2
/
)
(
2
2
1
)
(
)
,
(
~








 x
e
x
f
N
X
𝜇 – population mean
𝜎2 – population variance
𝜎 – population Standard deviation
40

Normal distributions with various parameters
41

Properties of Normal Distribution
 Symmetry: values below μ are just as likely as values above
μ.
 Center: f(x) has maximum value for x = μ, so values close to μ
are the most likely to occur.
 Dispersion: the density is “wider” for large σ compared to
small values of σ (for fixed μ), so the larger σ the more likely
are observations far from μ.
42

Normal Distribution (cont.)
 Probability on an interval
 The areas under normal curves can be obtained from standard
normal tables. Therefore, it is necessary to standardize normal
distributions.
2
1
)
(
2
2
2
/
)
-
(x
-
b
a
dx
e
b
X
a
P








43

Standard Normal Distribution
 Standard normal distribution
 The special case of normal distribution which has
and
0


1
2


2
/
2
2
1
)
(
)
1
,
0
(
~
z
e
z
f
N
Z



44

Central Part of z Distribution
45

46
We would expect the interval as
±1.96 to contain approximately
95% of the observations. This
corresponds to a commonly used
rule-of-thumb that roughly 95% of
the observations are within[-2, 2].
Similar computations are made
for other percentages.

 The standardization is achieved by converting the
data into z-scores
 Example 1
 population mean and variance are known
The annual precipitation 𝑋~𝑁 80, 402
, what is the z-
score of x = 150mm?
Standardization of Normal Distributions
75
.
1
40
80
150
40
80




 i
x
z
47
“z-score”, which in
other words is “how
many S.D. is x
deviated from the
mean.
By referring to the
Standard Normal
Table, we can
calculate the
possibility of x
laying above, below,
or between certain
numbers.
𝑍~𝑁 0, 12
𝑋~𝑁 𝜇, 𝜎2
𝑧 =
𝑥𝑖 − 𝜇
𝜎

Standardization of normal distributions
 Example 2
 Sample mean and variance
are known
 Step I:
 Sample mean
𝑥 = 59.7
 Sample standard deviation
𝑠 = 12.97
 Step II:
Month T (°F) Z-score
J 39.53 -1.56
F 46.36 -1.03
M 46.42 -1.02
A 60.32 0.05
M 66.34 0.51
J 75.49 1.22
J 75.39 1.21
A 77.29 1.36
S 68.64 0.69
O 57.57 -0.16
N 54.88 -0.37
D 48.2 -0.89 s
x
x
z i 

48
𝑥=Sample mean; 𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛;
𝜇 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛; 𝜎 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛;

Calculating Probabilities from a Normal Distribution
 Consider this problem
Suppose that the annual precipitation was normally
distributed with mean 80 mm per year and standard
deviation 40 mm. What is the probability that the
annual precipitation is greater than 150 mm?
Solution:
1. calculate the z score(s)
49
75
.
1
40
80
150
40
80





x
z

Standard Normal Table
1.75
P(Z>=1.75)
= 0.0401
z
50
2. Look up the
standard
normal table

Standard Normal Table, Head-End (z=0.0~0.99)
Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
51

Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
52

Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986
53

Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993
3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995
3.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997
3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998
54

Calculating Probabilities (cont.)
μ = 0
f(x)
+1.75
.0401 P(Z > 1.75) = 0.0401
P(Z <= 1.75) = 0.9599
μ = 80
f(x)
.0401
+150
P(X > 150) = 0.0401
P(X <= 150) = 0.9599
55
3. Calculate the
areas of
interest

 Random variable X follows a normal
distribution with mean of µ and variance of 2.
Find out the interval that contains of middle
95% of the data.
56

Central Part of a Normal Distribution
57

Are data normally distributed?
58
 Compared the observed histogram to a normal curve
that has sample mean and sample standard deviation.

Normal Q-Q plot
59
 A normal quantile-
quantile plot compares
the sample quantiles
to those of the normal
distribution. If data are
N(μ,σ2) distributed, the
points in the QQ-plot
should be scattered
around the straight
line.
Straight line in Q-Q plot = normal distribution

Confidence Interval
Central limit theorem, interval estimation
60
Section III

Outline
 Point estimation
 Central Limit Theorem
 Confidence interval
61

Estimation
 Estimate population parameters based on
sample data
 Two types of estimate
 Point estimate
 Interval estimate
62

 Mean
 Population parameter: 
 Point estimate: 𝑥
 Standard Deviation
 Population parameter: 
 Point estimate: s
Point Estimate
63



n
i
i
x
n
x
1
1
1
)
(
1
2





n
x
x
s
n
i
i
The first type of estimation
“True mean”
“Sample mean”
“Sample s. d.”
“population s. d.”

Sampling Error
 Sampling error is the difference between the value of a
population characteristics and the value of that
characteristics inferred from a sample.
 Example: consider the population characteristic of the
average selling price of homes in Redlands in 2009. If
every house is examined, the average selling price is
$200,000. If only 25 homes per month are sampled, the
average selling price of the 300 homes is $230,000.
The sampling error is $200,000-$230,000 = -$30,000
Sampling Error cannot
be removed.

Interval Estimate
 It is very unlikely that sample point estimates will
exactly equal the true population parameters due to
uncertainty in probability sampling.
 To determine how good our point estimates are, we
could extend a point estimate to an interval within
which the population parameter lies.
65
x
The second type of estimation

Confidence Level and Interval
 In probabilistic terms, we like to attach some measure
of certainty (confidence) to our interval estimates.
 What does the 90% confidence level mean?
 The chance that the interval estimate containing the true
mean is 90%.
 In other words, the probability that the true mean falls in the
interval is 90%.
 This interval is called 90% confidence interval.
66
So, let’s say we repeated
the sampling process
many of time, and there
are many sets of Sample.
The sample mean from
each Sample set are
different, and hence they
form an interval.
So, is the true
population mean (𝝁)
falls within the interval?
Maybe.
The wider the interval it
is, the higher chance it is
in the interval. Technically
if the interval is −∞ ≤ 𝑥 ≤
∞, which is from negative
infinity to infinity, the
chance is always 100%,
and the confident level
would always be 100%...
But that interval would be
meaningless.
Therefore, Statistically
we would apply a
tolerance level, that we
the probability is less
than 100%, but high
enough (90%, 95%,
99%, etc.) that the
interval is useful.
Then, here it is, the
concept of confidence
interval and the
confidence level.

How to obtain a confidence interval?
 If we know the relationship between the sample mean and the
true mean, we can link them together.
 The sampling distribution or probability distribution of the
sample mean reveals the relationship.
67

Sampling Distribution of Sample Mean
 The sampling distribution of the sample mean can be
developed by taking all possible or many samples of size n
from a population, calculating the value of the mean for each
sample, and drawing the distribution of these values.
68

Sampling Distribution of Sample Mean
 When the sampling process is repeated many times,
we could get many different samples, which give
different sample means.
 http://www.ltcconline.net/greenl/java/Statistics/clt/cltsimulation.html
69
The larger a sample size,
the closer the sample
mean is to the true mean.
Hence, the larger the
sample size, the variance
of the sample means
would be smaller
Which mean this
frequency plot would be
narrower.

Central Limit Theorem (CLT)
70
 Let X1, X2, X3… Xn be a random sample of size n drawn from
a population with mean  and standard deviation .
 Then for a large n, the sampling distribution of 𝑋 is
approximately normally distributed with mean  and
standard deviation 𝜎 𝑛.
 In a special case where X is normal, the distribution of 𝑋 is
exactly normal regardless of sample size.
 The standard deviation of the sample mean 𝜎 𝑛 is also
called standard error.
𝑋: Sample mean
: Population Mean
𝜎 𝑛: Standard deviation of sample means, (=standard error)
σ: Standard deviation of population
𝑠: Standard deviation of a set of sample
The mean of the frequency
distribution of sample
mean is theoretically the
same as population mean
(𝝁). The Standard deviation
of the frequency distribution
of sample mean is (𝝈
𝒏
).
𝑋~𝑁 𝜇, 𝜎
𝑛
2
: The frequency distribution of sample mean

CLT (cont.)
 The central limit theorem only applies to sample
mean, not for other sample statistics.
 Generally, a sample size n ≥ 30 could be regarded
as sufficiently large so that the sampling distribution
of sample means is approximately normal.
 The sample size is inversely related to the standard
error.
71
Central limit
theorem is used to
estimate the
sample mean and
sample mean only. The statement of
sample size 𝑛 ≥ 30,
is obtained by
comparing the T-
table and normal
distribution. Which
we will talk about in
later slides.

Notation for Confidence Level and Interval
 Confidence level
 Denoted by (1 – α) × 100%
 Usually α = 0.05, could also be 0.10 or 0.01
 Thus, the likelihood that we are wrong is α (also
called significance level)
 The interval in which the true mean lies within
(1 – α) × 100% confidence (lies within the Confidence level) is
called (1 – α) × 100% confidence interval.
72
95% Confidence
level
5% likelihood of
being wrong
90% Confidence level
10% likelihood of being
wrong
99% Confidence level
1% likelihood of being
wrong
“Significance level”
the likelihood that
we are wrong

Basic Steps
 Step 1: Standardize 𝑋
 Step 2: find the z score
 Step 3: calculate margin of error
 Step 4: obtain the final CI (Confident Interval)
 Interpretation
73

 Suppose the sample size n is sufficiently large (n ≥
30), according to the central limit theorem, the
frequency distribution of sample mean is normal
with mean μ and standard deviation 𝜎 𝑛
 Step 1: Standardize 𝑋
Step 1: Standardize 𝑋
74
𝑋~𝑁 𝜇, 𝜎
𝑛
2
𝑍 =
𝑋 − 𝜇
𝜎
𝑛
~𝑁 0, 1

Step 2: Find The Z Score
75
The three z-scores 1.65, 1.96, 2.58 are associated with three
confidence levels 1 – α (=0.90, 0.95, 0.99),
where α is 0.10, 0.05, and 0.01 respectively.

𝑧𝛼 2 is a z score or z value that corresponds to
a tail area of α/2.
76
𝑧𝛼 2
α/2

Step 3: Calculate Margin Of Error
 The range of values above and below the sample
statistic with a specified confidence.
 Put differently

 




 1
)
( ME
X
ME
X
P
77
n
z
ME

 2

𝑀𝐸: “Margin
of Error”

Step 4: Obtain The Final CI(confidence Interval)
 Add/subtract the margin of error from the
sample mean to get the CI.
78
]
96
.
1
,
96
.
1
[
then
,
05
.
0
If
]
,
[ 2
2
n
X
n
X
n
z
X
n
z
X













Interpretation
 When α = 0.05, we can say that
 “I am 95% confident that the mean of the population is
somewhere between 𝑥 − 1.96
𝜎
𝑛
and 𝑥 + 1.96
𝜎
𝑛
 The true population mean μ should, 95% of the time,
lie within ±1.96
𝜎
𝑛
of sample mean.
 95% of all confidence intervals that can be
constructed will contain the unknown true mean.
79

80
After repeated sampling of 100 times, how many of the confidence
intervals would you expect to contain the true mean?

What influence CI?
 Sample variance ↑, range of CI ↑
 larger sample variability , higher uncertainty
 wider CI
 Sample size ↑, range of CI↓
 Larger sample size n, more information
 narrower CI
 Confidence level ↑, range of CI ↑
 Higher confidence level,
higher uncertainty to be accounted
 wider CI
81

Some issues
 How about small sample size (n < 30) ?
 t-distribution unless the sample is drawn from a
normally distributed population
 How about the population standard deviation σ is
unknown?
 Use sample standard deviation s to approximate
population standard deviation σ
 t-distribution, providing the population is normal
82

t-Distribution
 When the sample size is not sufficiently large, the
frequency distribution of sample means has what is
known as the t distribution (or Student’s t
distribution)
 t-distribution also copes with uncertainty resulting from
estimating the standard deviation from a sample,
whereas if the population standard deviation was
unknown
 The overall shape of the probability density function of
the t-distribution resembles the bell shape of a
standard normal distribution, except that it is a bit
lower and wider.
 t-distribution depends on a new parameter – degree of
freedom (df =n -1)
83
Student's t-distribution to
cope with uncertainty
resulting from estimating
the standard deviation from
a sample, whereas if the
population standard
deviation were known, a
normal distribution would be
used.

t-Distribution vs. Standard Normal
df = 1 df = 2 df = 3
df = 5 df = 10 df = 30
84

Using t distribution to construct CI
 Population standard deviation is unknown
 Population standard deviation is known but
sample size is small
85
]
,
[ 1
,
2
1
,
2
n
s
t
X
n
s
t
X n
n 
 
 

]
,
[ 1
,
2
1
,
2
n
t
X
n
t
X n
n



 
 


Example question
A local bank needs information concerning the savings account
balances of its customers. A random sample of 15 accounts
was checked. The mean balance was $686 with a standard
deviation of $256. Which of the followings is the
95% confidence interval for the true mean?
The correct answer is: 686 − 2.15 ×
256
15
, 686 + 2.15 ×
256
15
[686-2.15*256/sqrt(15), 686+2.15*256/sqrt(15)]
86

Hypothesis Testing
87
Section IV

Outline
 What is hypothesis testing?
 Errors in hypothesis testing
 One-sample z-test
 One-sample t-test
88

Consider this situation
A consumer advocacy group collects a random sample
of n = 100 light bulbs from a manufacture, and observe a
sample mean of 987 hours. Assume standard deviation
of all light bulbs is 40 hours. Estimate on average how
many hours of light can the light bulbs provide.
𝛼 = 0.05, 𝑧0.025 = 1.96
𝑀𝐸 = 𝑧0.025
𝑠
𝑛
= 1.96 ×
40
10
= 7.84
Confidence Interval [979.16, 994.84]
89
We are 95% confident that
the mean lifetime of light
bulb manufactured by the
manufacturer fall between
979.16 to 994.84 hours.

Consider a related situation
A consumer advocacy group thinks a manufacturer of light bulbs is mistaken
in their claim that their bulbs on average provide 1000 hours of light. They
believe the light bulbs are defective. To test this, they collect a random
sample of n = 100 light bulbs and observe a sample mean of 987 hours.
Assume standard deviation of all light bulbs is 40 hours.
90
We are 95% confident that
the mean lifetime of light
bulb manufactured by the
manufacturer fall between
979.16 to 994.84 hours.
We cannot say that, the sample mean of
one set of sample is 987 hour, so the claim
of manufacturer is false;
But with statistic and probability, we can
claim that what the manufacturer claimed in
false – because 95% of the chance that the
mean lifetime of light bulb manufactured by
the manufacturer would fall between 979.16
to 994.84 hours.
1000 hours fall outside of the 95%
range. In other word, the claim of
the manufacturer, at least 95% of
the case, is wrong.

 If we assume the manufacture’s claim is true, we
would expect the average lifetime of the samples
is close to 1000h.
 But how close is close? Is the sample mean of
987 close enough to the presumed value of 1000?
 We need to quantify the closeness or difference
between the sample mean and the presumed
mean.
 To do so, we may compare 987 to a threshold that is
deemed as “close enough”
91
Common sense is important

 95% of the time the sample mean will range between
992 and 1007. So we can take 992 and 1007 as the
thresholds for “close enough”.
 However, there is small chance (<5%) that sample
mean falls out the range of 992 and 1007. So we
could be wrong if we conclude the true mean is not
1000. This 5% is called significance level
92
𝑋~𝑁(1000,
402
100
)
1000
987 992 1008
𝑃 𝑋 ≤ 992 = 0.025
Significant: the tails.
Something you do
not expect.
Confident: the center.
The range that you’re
confident.
For Hypothesis testing, we
start with the claim. Hence,
the mean is set to be 1000
hours. Next, we throw the
Margin of Error into the
graph.
By setting the mean at
1000, we can calculate the
probability that we get a
sample mean that is so
small that it is 987

 Different problems will have different thresholds. Can we
obtain a standardized threshold that applies to all problems?
 Yes, we would use the z score as a generic measure for
“closeness”.
 Now let us take a look of basic steps.
93

 Step 1: state a null hypothesis
 Step 2: state alternative hypothesis
 Step 3: choose a significance level
 Step 4: calculate test statistic
 Step 5: find critical value and region of rejection
 Step 6: make a decision
94
Basic Steps of Hypothesis Testing

Step 1: state a null hypothesis
 Step 1: state a null hypothesis
 H0: μ = 1000
Note: the null hypothesis states this large random
sample is drawn from the population that has a
mean of 1000.
If the null hypothesis is true, we then can conclude
that the sample mean approximately follows
normal distribution (𝑋~𝑁(1000,
402
100
))
95
A consumer advocacy group thinks
a manufacturer of light bulbs is
mistaken in their claim that their
bulbs on average provide 1000
hours of light. They believe the light
bulbs are defective. To test this,
they collect a random sample of n =
100 light bulbs and observe a
sample mean of 987 hours.
Assume standard deviation of all
light bulbs is 40 hours.

Step 2: state alternative hypothesis
 Alternative hypothesis
 Two-sided hypothesis testing (test if the lifetime
differs from 1000 hours of light)
 HA: μ ≠ 1000
 One-sided hypothesis testing (test if the lightbulbs
provide less than 1000 hours of light)
 HA: μ < 1000
96
Population parameter
Reverse of what the
experimenter believes

Step 3: choose a significance level
 α= 0.1, 0.05, 0.01
 Example: α=0.05
 The critical z value would be 1.65, 1.96, and 2.58
respectively
97
A result was said to be significant at the 5% level.
This means the result would be unexpected if the
null hypothesis were true.

Step 4: calculate test statistic
 If H0 is true, the CLT gives
 Test statistic
 z-score:
𝑧𝑡𝑒𝑠𝑡 =
𝑥 − 𝜇0
𝜎 𝑛
=
987 − 1000
40/10
= −3.25
98
𝑋~𝑁(1000,
402
100
)
if a large random sample is drawn from the population that
has a mean of μ0 and a standard deviation of σ

Step 5: find critical value and region of rejection
 Two-sided
HA :μ ≠ 100
 Example
 One-sided
 Example
HA : μ < 1000
μ < μ0
99
±𝑧𝛼 2 = ±1.96
−𝑧𝛼 = -1.65

Step 6: make a decision
 Two-sided
 Reject the null
hypothesis if
 One-sided
 Reject the null
hypothesis if
100
𝑧𝑡𝑒𝑠𝑡 > 𝑧∝ 2 𝑜𝑟 𝑧𝑡𝑒𝑠𝑡 < −𝑧∝ 2
𝑧𝑡𝑒𝑠𝑡 > 𝑧∝ 𝑓𝑜𝑟 𝜇 > 𝜇0
𝑧𝑡𝑒𝑠𝑡 < −𝑧∝ 𝑓𝑜𝑟 𝜇 < 𝜇0

Step 6: make a decision
 Example
Therefore, we can reject the null hypothesis. The lifetime of
lightbulbs is significantly less than 1000 hours at α=0.05.
101
𝑧𝑡𝑒𝑠𝑡 = −3.25,
−𝑧∝ = −1.65
𝑧𝑡𝑒𝑠𝑡< −𝑧∝

What is hypothesis testing?
 Now let us take a deeper look of hypothesis testing
 Hypothesis
 A proposition whose truth or falsity is capable of being
tested
102

Errors in Hypothesis Testing
 Type I Error
 “False Positive”
 Rejecting a true hypothesis
 The likelihood of making a type I error is denoted by α, referred to as
significance level
 Type II Error
 “False Negative”
 Accepting a wrong hypothesis
 The likelihood of making a type II error is denoted by β
103

Errors in Hypothesis Testing (cont.)
104
We want to control Type I
error, more than Type II.
In most cases, Type I error
would lead to more severe
consequences.
Take a trial as example.
The null hypothesis (H0) is
an assumption of
innocence.
In a Type I error,
The H0 is true: The person is
innocent.
The H0 is rejected: The
person is deemed guilty
This creates the following
consequence:
(1) an innocent person is
deemed guilty;
(2) the criminal is still out
there free from the
system.
Take the same trial as example.
In a Type II error,
H0 is false: the person is guilty;
H0 is accepted: the person is
deemed to be innocent.
This create the following
consequence:
(1) The criminal is deemed
innocent and released
Type I error, also known as
False Positive
Type II error, also known as
False Negative

Controlling Type I Error
 It is almost always impossible to simultaneously minimize the
probability of both types of errors.
 Classical hypothesis testing adopts the strategy of controlling α.
 In making small α, we have small probability of making errors if we are able
to reject our hypothesis.
 If we have evidence to reject null hypothesis, we will be confident in our
analysis.
 The null hypothesis should be something we want to reject, rather than
something we want to confirm.
105

 Population standard deviation σ is unknown
 Sample size is small
 Test Statistic
 When H0 is true (the sample is drawn from the specified
population that has mean of μ0, T random variable follows
a student t-distribution with df = n-1
One-sample t-test
𝑇 =
𝑋 − 𝜇0
𝑆/ 𝑛
106
𝑇: T-value

Limitations of classic hypothesis testing
 The specific significance level must be selected a
priori, and often arbitrary and lack of theoretical
basis
 The final decision regarding the null and
alternative hypothesis is binary:
 H0 is rejected or not rejected
 More flexible method is needed
 What is the exact significance level associated with
the test statistic?
107

p-value
 The probability of getting a test statistic value as
extreme as or more extreme than that observed
by chance, if the null hypothesis H0 is true
 If null hypothesis is rejected, p-value is the
probability of making a Type I error
 The smaller the p-value, the more convincing to
reject the null hypothesis
108
Rejecting a true hypothesis
Typically, we can reject the null hypothesis, when p value is less than 10% (loose standard).
5% for spatial analysis

Determining p-value
 Using Calculated z or t test statistic to determine p-value
 p-value corresponds to the shaded area under the
standard normal (or t) curve
109

Light Bulbs Example
 What is the p-value of the lightbulb test? Would
you reject your null hypothesis at α = 0.01
significance level?
110

But when to reject the Null Hypothesis?
111
When the p-value of our test
fall outside of the interval, in
other words, fall within the
shaded area, we can reject it.
Let’s say the p-value is 0.006.
We can reject the null
hypothesis in both 𝑝 < 0.1, 𝑝 <
0.05, and 𝑝 < 0.01 level.
hypothesis in both 𝑝 < 0.1, and
𝑝 < 0.05 level, but not at the
𝑝 < 0.01 level.
hypothesis only in 𝑝 < 0.1
level, but not at the 𝑝 < 0.05
and 𝑝 < 0.01 level.
Let’s say the p-value is 0.2. We
cannot reject the null
hypothesis in all the following
three (𝑝 < 0.1, 𝑝 < 0.05 and
𝑝 < 0.01) level.

Useful graph/area plotting URL
http://www.statdistributions.com/normal/ http://www.statdistributions.com/t/
112

113
Null hypothesis (you want to reject)
No difference / no change / equal to “=”
Alternative hypothesis (research interest that you want to accept)
One sided: Different / not equal “≠”
Left sided: smaller than / less than “<”
Right sided: larger than / more than “>”
No sample statistics are included in both H0 and Ha (H1)

2 Review of Statistics. 2 Review of Statistics.

Recommended

Recommended

More Related Content

Similar to 2 Review of Statistics. 2 Review of Statistics.

Similar to 2 Review of Statistics. 2 Review of Statistics. (20)

Recently uploaded

Recently uploaded (20)

2 Review of Statistics. 2 Review of Statistics.