Statistics (recap)

Statistics (Recap)
Finance & Management Students
Farzad Javidanrad
October 2013
University of Nottingham-Business School

Probability
• Some Preliminary Concepts:
 Random: Something that happens (occurs) by chance.
 Population: A set of all possible outcome of a random experiment
or a collection of all members of a specific group under study. This
collection makes an space that all possible samples can be derived
from. For that reason it is sometimes called sample space.
 Sample: Any subset of population (sample space).
In tossing a die:
Random event is the event of appearing any face of the die.
Population (sample space) is the set of .
Sample is any subset of the set above such as or .
 61,2,3,4,5,
 3  6,4,2

Probability
• Two events are mutually exclusive if they cannot happen together.
The occurrence of one of them prevents the occurrence of another.
For example, if the baby is a boy it cannot be a girl and vice versa.
• Two events are independent if occurrence of one of them has no
effect on the chance of occurrence of another. For example, the
result of rolling a die has no impact on the outcome of flipping a
coin. But in the experiment of taking two cards consecutively from a
set of 52 cards (if the cards can be chosen equally likely) the chance
of getting the second card is affected by the result of the first card.
• Two events are exhaustive if they include all possible outcomes
together. For example, in rolling a die the possibility of having odd
numbers or even numbers.

Probability
• If event 𝑨 can happen in 𝒎 different ways out of 𝒏 equally likely
ways, the probability of event 𝑨 can be shown as its relative
frequency; i.e. :
𝑃 𝐴 =
𝑚
𝑛
U: sample space (population)
A: an event (sample)
A’: mutually exclusive event with A
A & A’ are exhaustive collectively
No. of ways that event 𝐴
occurs
Total of equally likely and
possible outcomes
𝐴𝐴
𝐴′
U

Probability
• As 0 ≤ 𝑚 ≤ 𝑛 it can be concluded that
0 ≤
𝑚
𝑛
≤ 1
Or
0 ≤ 𝑃(𝐴) ≤ 1
• 𝑃 𝐴 = 0 means that event 𝐴 cannot happen and 𝑃 𝐴 = 1
means that the event will happen with certainty.
• With the definition of 𝐴′ as an event of “non-occurrence” of event
𝐴, we can find that:
𝑃 𝐴′
=
𝑛 − 𝑚
𝑛
= 1 −
𝑚
𝑛
= 1 − 𝑃 𝐴
Or
𝑃 𝐴 + 𝑃 𝐴′
= 1

Probability of Multiple Events
• If 𝑨 and 𝑩 are not mutually exclusive events so, the probability of
happening one of them (𝑨 𝑜𝑟 𝑩) can be calculated as following:
𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩)
𝑃 𝐴 𝑜𝑟 𝐵 𝑃 𝐴 𝑎𝑛𝑑 𝐵
𝑃 𝐴 𝑃 𝐵
𝑃 𝐴 ∩ 𝐵

P(A)
P(B)P(C)
𝑃 𝐴 ∩ 𝐵 ∩ 𝐶
In case, we are dealing with more events:
𝑷 𝑨 ∪ 𝑩 ∪ 𝑪 = 𝑷 𝑨 + 𝑷 𝑩 + 𝑷 𝑪 − 𝑷 𝑨 ∩ 𝑩 − 𝑷 𝑨 ∩ 𝑪 −
𝑷 𝑩 ∩ 𝑪 + 𝑷(𝑨 ∩ 𝑩 ∩ 𝑪)

• Considering 𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩) we can have the
following situations:
1. If 𝑨 and 𝑩 are mutually exclusive events, then :
𝑷 𝑨 ∩ 𝑩 = 𝟎
2. If 𝑨 and 𝑩 are two independent events, then:
𝑷 𝑨 ∩ 𝑩 = 𝑷(𝑨) × 𝑷(𝑩)
3. If 𝑨 and 𝑩 are dependent events, then:
𝑷 𝑨 ∩ 𝑩 = 𝑷(𝑨) × 𝑷(𝑩 𝑨) = 𝑷(𝑩) × 𝑷(𝑨 𝑩)
Where 𝑷(𝑨 𝑩) and 𝑷(𝑩 𝑨) are conditional probabilities and in the case of
𝑷(𝑨 𝑩) means the probability of event 𝐴 provided that event 𝐵 has already
happened.

o The probability of picking at random a Heart or a Queen on a single
experiment from a card deck of 52 is:
𝑃 𝐻 ∪ 𝑄 = 𝑃 𝐻 + 𝑃 𝑄 − 𝑃 𝐻 ∩ 𝑄 =
13
52
+
4
52
−
1
52
=
4
13
o The probability of getting a 1 or a 4 on a single toss of a fair die is:
𝑃 1 ∪ 4 = 𝑃 1 + 𝑃 4 =
1
6
+
1
6
=
1
3
As they cannot happen together they are mutually exclusive events
and 𝑃 1 ∩ 4 = 0.
o The probability of having two heads in the experiment of tossing
two fair coins is: (two independent events)
𝑃 𝐻 ∩ 𝐻 =
1
2
.
1
2
=
1
4

o The probability of picking two ace without returning the first card
into the batch of 52 playing cards, which represents a conditional
probability, is:
𝑃 1𝑠𝑡 𝑎𝑐𝑒 ∩ 2𝑛𝑑 𝑎𝑐𝑒 = 𝑃(1𝑠𝑡 𝑎𝑐𝑒) × 𝑃(2𝑛𝑑 𝑎𝑐𝑒 1𝑠𝑡 𝑎𝑐𝑒)
Or can be written with less words involved:
𝑃 𝐴1 ∩ 𝐴2 = 𝑃(𝐴1) × 𝑃(𝐴2 𝐴1) =
4
52
×
3
51
=
1
221
• If two events 𝑨 and 𝑩 are independent from each other then:
𝑷(𝑨 𝑩) = 𝑷 𝑨 𝒂𝒏𝒅 𝑷(𝑩 𝑨) = 𝑷(𝑩)

Random Variable & Probability Distribution
Some Basic Concepts:
• Variable: A letter (symbol) which represents the elements of a
specific set.
• Random Variable: A variable whose values are randomly appear
based on a probability distribution.
• Probability Distribution: A corresponding rule (function) which
corresponds a probability to the values of a random variable.
• Variables (including random variables) are divided into two general
categories:
1) Discrete Variables, and
2) Continuous Variables

• A discrete variable is the variable whose elements (values) can be
corresponded to the values of the natural numbers set or any subset
of that. So, it is possible to put an order and count its elements
(values). The number of elements can be finite or infinite.
• For a discrete variable it is not possible to define any neighbourhood,
whatever small, at any value in its domain. There is a jump from one
value to another value.
• If the elements of the domain of a variable can be corresponded to
the values of the real numbers set or any subset of that, the variable
is called continuous. It is not possible to order and count the
elements of a continuous variable. A variable is continuous if for any
value in its domain a neighbourhood, whatever small, can be defined.

• Probability Distribution: A rule (function) that associates a
probability either to all possible elements of a random variable (RV)
individually or a set of them in an interval.*
• For a discrete RV this rule associates a probability to each possible
individual outcome. For example, the probability distribution for
occurrence of a Head when filliping a fair coin: (Note: 𝑃𝑖 = 1)
𝒙 0 1
𝑃(𝑥) 0.5 0.5
In one trial 𝐻, 𝑇
𝒙 0 1 2
𝑃(𝑥) 0.25 0.5 0.25
In two trials
𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇
𝒙 = 𝑷𝒓𝒊𝒄𝒆 (+1) --- (0) (-1)
𝑃(𝑥) 0.6 0.1 0.3
Change in the price of a
share in one day
o The probability distribution for the price change of a share in stock market

Probability Distributions (Continuous)
• The probability that a continuous random variable chooses
just one of its values in its domain is zero, because the number
of all possible outcomes 𝒏 is infinite and
𝒎
∞
→ 𝟎.
• For the above reason, the probability of a continuous random
variable need to be calculated in an interval.
• The probability distribution of a continuous random variable is
often called a probability density function (PDF) or simply
probability function and it is usually shown by 𝒇(𝒙) and it has
following properties:
I. 𝑓(𝑥) ≥ 0 (similar to 𝑷(𝒙) ≥ 𝟎 for discrete RV*)
II. −∞
+∞
𝑓 𝑥 𝑑𝑥 = 1 (similar to 𝑷 𝒙 = 𝟏 for discrete RV)
III. 𝑎
𝑏
𝑓 𝑥 𝑑𝑥 = 𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = 𝐹 𝑏 − 𝐹 𝑎 (probability
given to set of values in an interval [a,b] )**

Probability Distributions (Continuous)
• where 𝐹(𝑥) is the integral of the PDF function (𝑓(𝑥)) and it is
called as Cumulative Distribution Function (CDF) and for any
real value of 𝒙 is defined as:
𝐹(𝑥) ≡ 𝑃(𝑋 ≤ 𝑥)
CDF shows the area under
PDF function (𝐟(𝐱)) from
− ∞ to 𝐱 . For discrete
random variable, CDF
shows the summation of
all probabilities before
the value of 𝐱 .
Adopted from http://beyondbitsandatomsblog.stanford.edu/spring2010/tag/embodied-artifacts/
𝐹(𝑥)
𝑓(𝑥)
𝐹(𝑥)≡𝑃(𝑋≤𝑥)
𝐹(𝑥)≡𝑃(𝑋≤𝑥)

Some Characteristics of Probability Distributions
• Expected Value (Probabilistic Mean Value): It is one of the most
important measures which shows the central tendency of the
distribution. It is the weighted average of all possible values of
random variable 𝒙 and it is shown by 𝑬(𝒙).
• For a discreet RV (with n possible outcomes)
𝑬 𝒙 = 𝒙 𝟏 𝑷 𝒙 𝟏 + 𝒙 𝟐 𝑷 𝒙 𝟐 + ⋯ + 𝒙 𝒏 𝑷 𝒙 𝒏 =
𝒊=𝟏
𝒏
𝒙𝒊 𝑷(𝒙𝒊)
• For a continuous RV
𝑬 𝒙 =
−∞
+∞
𝒙. 𝒇 𝒙 𝒅𝒙

• Properties of 𝑬(𝒙):
i. If 𝒄 is a constant then 𝑬 𝒄 = 𝒄 .
ii. If 𝒂 and 𝒃 are constants then 𝑬 𝒂𝒙 + 𝒃 = 𝒂𝑬 𝒙 + 𝒃 .
iii. If 𝒂 𝟏, … , 𝒂 𝒏 are constants then
𝑬 𝒂 𝟏 𝒙 𝟏 + ⋯ + 𝒂 𝒏 𝒙 𝒏 = 𝒂 𝟏 𝑬 𝒙 𝟏 + ⋯ + 𝒂 𝒏 𝑬(𝒙 𝒏)
Or
𝑬(
𝒊=𝟏
𝒏
𝒂𝒊 𝒙𝒊) =
𝒊=𝟏
𝒏
𝒂𝒊 𝑬(𝒙𝒊)
iv. If 𝒙 and 𝒚 are independent random variables then
𝑬 𝒙𝒚 = 𝑬 𝒙 . 𝑬 𝒚

v. If 𝒈 𝒙 is a function of random variable 𝒙 then
𝑬 𝒈 𝒙 = 𝒈 𝒙 . 𝑷(𝒙)
𝑬 𝒈 𝒙 = 𝒈 𝒙 . 𝒇 𝒙 𝒅𝒙
• Variance: To measure how random variable 𝒙 is dispersed around
its expected value, variance can help. If we show 𝑬 𝒙 = 𝝁 , then
𝒗𝒂𝒓 𝒙 = 𝝈 𝟐 = 𝑬[ 𝒙 − 𝑬 𝒙
𝟐
]
= 𝑬[ 𝒙 − 𝝁 𝟐]
= 𝑬[𝒙 𝟐 − 𝟐𝒙𝝁 + 𝝁 𝟐]
= 𝑬 𝒙 𝟐 − 𝟐𝝁𝑬 𝒙 + 𝝁 𝟐
= 𝑬 𝒙 𝟐 − 𝝁 𝟐
For discreet RV
For continuous RV

𝒗𝒂𝒓 𝒙 =
𝒊=𝟏
𝒏
𝒙𝒊 − 𝝁 𝟐. 𝑷(𝒙)
𝒗𝒂𝒓 𝒙 = −∞
+∞
𝒙𝒊 − 𝝁 𝟐. 𝒇 𝒙 𝒅𝒙
• Properties of Variance:
i. if 𝒄 is a constant then 𝒗𝒂𝒓 𝒄 = 𝟎 .
ii. If 𝒂 and 𝒃 are constants then 𝒗𝒂𝒓 𝒂𝒙 + 𝒃 = 𝒂 𝟐 𝒗𝒂𝒓(𝒙) .
iii. If 𝒙 and 𝒚 are independent random variables then
𝒗𝒂𝒓 𝒙 ± 𝒚 = 𝒗𝒂𝒓 𝒙 + 𝒗𝒂𝒓(𝒚)
can be extended to more
variables
For discreet RV
For continuous RV

• Some of the well-known probability distributions are:
• The Binomial Distribution:
1. The probability of the occurrence of an event is 𝒑 and is not
changing.
2. The experiment is repeated for 𝒏 times.
3. The probability that out of 𝒏 times, the event appears 𝒙 times is:
𝑃 𝑥 =
𝑛!
𝑥! 𝑛 − 𝑥 !
𝑝 𝑥(1 − 𝑝) 𝑛−𝑥
The mean value and standard deviation of the binomial distribution
are:
𝜇 = 𝑖=0
𝑛
𝑥𝑖. 𝑃 𝑥𝑖 = 𝑛𝑝 𝜎 = 𝑖=0
𝑛
𝑥𝑖 − 𝜇 2. 𝑃(𝑥𝑖) = 𝑛𝑝(1 − 𝑝)
So, to show that the probability distribution of the random variable 𝑋
is binomial we can write: 𝑋~𝐵𝑖(𝑛𝑝, 𝑛𝑝 1 − 𝑝 )
Probability Distributions (Discrete RV)

• A gambler thinks his chance to get a 1 in rolling a die is high. What
is his chance to have 4 one out of six experiments using a fair die?
The probability of having a one in an individual trial is
1
6
and it
remains the same in all 6 experiments. So,
𝑃 𝑥 = 4 =
6!
4! 2!
1
6
4
5
6
2
=
375
7776
= 0.048 ≈ 5%
• The Poisson Distribution:
1. It is used to calculate the probability of number of desired event
(no. of successes)in a specific period of time.
2. The average number of desired event (no. of successes) per unit of
time remains constant.

• So, the probability of having 𝒙 numbers of success is calculated by:
𝑃 𝑥 =
𝝀 𝑥 𝑒−𝝀
𝑥!
Where 𝝀 is the average number of successes in a specific period of time and
𝑒 = 2.7182 .
• The mean value and standard deviation of the Poisson distribution are:
𝜇 =
𝑖=0
𝑛
𝑥𝑖. 𝑃 𝑥𝑖 = 𝝀 and 𝜎 =
𝑖=0
𝑛
𝑥𝑖 − 𝜇 2. 𝑃(𝑥𝑖) = 𝝀
So, to show that the probability distribution of the random variable 𝑋 is
Poisson we can write: 𝑿~Poi(𝝀, 𝝀).
o The emergency section in a hospital receives 2 calls per half an hour (4
calls in an hour). The probability of getting just 2 calls in a randomly
chosen hour in a random day is:
𝑃 𝑥 = 2 =
42 𝑒−4
2!
= 0.146 ≈ 15%

The Normal Distribution (Continuous RV)
• The Normal Distribution: It is the best known probability
distribution which reflects the nature of most random
variables in the world. The probability density function (PDF)
of normal distribution is:
1. Symmetrical around its mean value (𝝁).
2. Bell-shaped, with two tails approaching the horizontal axis
asymptotically as we move further away from the mean.
Adopted from
http://www.pdnotebook.com/2
010/06/statistical-tolerance-
analysis-root-sum-square/

3. The probability density function (PDF) of normal distribution
can be represented by:
𝒇 𝒙 =
𝟏
𝝈 𝟐𝝅
𝒆
−
𝒙−𝝁 𝟐
𝟐𝝈 𝟐
(−∞ < 𝒙 < +∞)
Where 𝝁 and 𝝈 are mean and standard deviation respectively.
𝝁 = −∞
+∞
𝒙. 𝒇 𝒙 𝒅𝒙 and 𝝈 = −∞
+∞
𝒙 − 𝝁 𝟐 . 𝒇 𝒙 𝒅𝒙
So, 𝑿~𝑵(𝝁, 𝝈 𝟐).
• A linear combination of independent normally distributed random
variables is itself normally distributed, that is,
If 𝑿~𝑵 𝝁 𝟏, 𝝈 𝟏
𝟐 and 𝒀~𝑵 𝝁 𝟐, 𝝈 𝟐
𝟐 and if 𝒁 = 𝒂𝑿 + 𝒃𝒀 then
𝒁~𝑵(𝒂𝝁 𝟏 + 𝒃𝝁 𝟐 , 𝒂 𝟐
𝝈 𝟏
𝟐
+ 𝒃 𝟐
𝝈 𝟐
𝟐
)
• This can be extended to more than two random variables.

• Recalling the last property of PDF ( 𝑎
𝑏
𝑓 𝑥 𝑑𝑥 = 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏)), it is
difficult to calculate the probability using the above PDF with different
values of 𝝁 and 𝝈. The solution for this problem is to transform the normal
variable 𝒙 to the standardised normal variable (or simply, standard normal
variable) random variable 𝒛 , by: 𝒛 =
𝒙−𝝁
𝝈
which its parameters (𝜇 and 𝜎2
) are independent from the influence of other
random variables’ parameters with normal distribution because we always
have:𝑬 𝒛 = 𝟎 and 𝒗𝒂𝒓 𝒛 = 𝟏 (why?)
• The probability distribution for the standard normal variable is defined as:
𝒇 𝒛 =
𝟏
𝟐𝝅
𝒆−
𝒛 𝟐
𝟐 𝒁~𝑵(𝟎, 𝟏).
Standardised
Adopted and amended from
http://www.mathsisfun.com/data/standard-normal-
distribution.html
𝑿~𝑵(𝝁, 𝝈 𝟐) 𝒁~𝑵(𝟎, 𝟏)

The Standard Normal Distribution
0
• Properties of the standard normal distribution curve:
1. It is symmetrical around y-axis.
2. The area under the curve can be split into two equal areas, that is:
−∞
0
𝑓 𝑧 𝑑𝑧 =
0
+∞
𝑓 𝑧 𝑑𝑧 = 0.5
• To find the area under the curve and before 𝒛 𝟏 = 𝟏. 𝟐𝟔 , using the z-
table (next slide), we have:
𝑃 𝑧 ≤ 𝑧1 = 1.26 =
−∞
0
𝑓 𝑧 𝑑𝑧 +
0
𝑧1
𝑓 𝑧 𝑑𝑧 = 0.5 + 0.3962 = 0.8962 ≈ 90%
𝑓(𝑧)
50%
𝑧
50% 50%
𝒛 𝟏 = 𝟏. 𝟐𝟔
0.5
0.3962

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Working with the Z-Table
• To find the probability
𝑃 0.89 < 𝑧 < 1.5 =
0
𝑧2
𝑓(𝑧)𝑑𝑧 −
0
𝑧1
𝑓 𝑧 𝑑𝑧
= 𝐹 1.5 − 𝐹 0.89 = 0.4332 − 0.3133
= 0.119 ≈ 12%
as both values are positive.
• To find the probability in the negative area we
need to find the equivalent area in the positive side:
𝑃 −1.32 < 𝑧 < −1.25 = 𝑃 1.25 < 𝑧 < 1.32
= 𝐹 1.32 − 𝐹 1.25
= 0.4066 − 0.3944 = 0.0122 ≈ 1%
1.50.89

Working with the Z-Table
• To find 𝑃(−2.15 < 𝑧) we can write:
−∞
−2.15
𝑓. 𝑑𝑧 =
−∞
0
𝑓. 𝑑𝑧 −
−2.15
0
𝑓. 𝑑𝑧
= 0.5 − 0.4842 = 0.0158 ≈ 2%
• And finally, to find 𝑃(𝑧 ≥ 1.93) , we have:
1.93
+∞
𝑓. 𝑑𝑧 =
0
+∞
𝑓. 𝑑𝑧 −
0
1.93
𝑓. 𝑑𝑧
= 0.5 − 0.4732 = 0.0268
0-2.15 =≡
0
2.15
𝑓. 𝑑𝑧
0 =1.93

An Example
o If the income of employees in a big company normally distributed
with 𝝁 = £𝟐𝟎𝟎𝟎𝟎 and 𝝈 = £𝟒𝟎𝟎𝟎, what is the probability of an
employee picked randomly have an income
a) above £22000, b) between £16000 and £24000.
a) We need to transform 𝒙 to 𝒛 firstly:
𝑃 𝑥 > 22000 = 𝑃
𝑥 − 20000
4000
>
22000 − 20000
4000
= 𝑃 𝑧 > 0.5 = 0.5 − 01915 = 0.3085 ≈ 31%
b) 𝑃 16000 < 𝑥 < 24000 = 𝑃(
16000−20000
4000
<
𝑥−20000
4000
<
24000−20000
4000
)
= 𝑃 −1 < 𝑧 < 1
= 0.3413 + 0.3413
= 0.6826 ≈ 68%

The ࣑2
(Chi-Squared)Distribution
• The ࣑ 𝟐(Chi-Squared)Distribution:
Let 𝒁 𝟏, 𝒁 𝟐, … , 𝒁 𝒌be 𝒌 independent standardised normal distributed
random variables, then the sum of the squares of them
𝑋 =
𝑖=1
𝑘
𝑍𝑖
2
have a Chi-Square distribution with a degree of freedom equal to the
number of random variables (𝒅𝒇 = 𝒌). So, 𝑿~ .
The mean value and standard
deviation of the RV with a Chi-Squared
distribution are 𝒌 𝑎𝑛𝑑 𝟐𝒌
Respectively. So we can write:
𝑿~
2
k
Probability Density Function (PDF) of ࣑2
Distribution
Adoptedfromhttp://2012books.lardbucket.org/books/beginning-statistics/s15-chi-square-tests-and-f-tests.html

Adoptedfromhttp://www.docstoc.com/docs/80811492/chi--square-table
𝑃 𝑥2
= 32 𝑑𝑓 = 16 = 0.01 or 𝑥2
0.01 ,16 = 32

The t-Distribution
• If 𝒁~𝑵 𝟎, 𝟏 and 𝑿~ and two random variables
𝒁 and 𝑿 are independent then the random variable
𝒕 =
𝒁
𝑿
𝒌
=
𝒁. 𝒌
𝑿
follows student’s t-distribution (t-distribution) with 𝒌 degree of
freedom. For a sample size 𝒏 we have 𝒅𝒇 = 𝒌 = 𝒏 − 𝟏.
• The mean value and standard deviation of this distribution are
𝝁 =
𝟎 𝒏 > 𝟐
𝒖𝒏𝒅𝒆𝒇𝒊𝒏𝒆𝒅 𝒏 = 𝟏, 𝟐
𝝈 =
𝒏−𝟏
𝒏−𝟑
𝒏 > 𝟑
∞ 𝒏 = 𝟑
𝒖𝒏𝒅𝒆𝒇𝒊𝒏𝒆𝒅 𝒏 = 𝟏, 𝟐
)2,(2
kkk

The t-Distribution
• The t-distribution like the standard normal distribution is a bell-
shaped and symmetrical distribution with zero mean (n>2) but it is
flatter but as the degree of freedom increases (or 𝒏 increases)it
approaches the standard normal distribution and for 𝒏≥𝟑𝟎 their
behaviours are similar.
• From the table (next slide)
𝑃 𝑡 = 1.706 𝑑𝑓 =26 = 0.05 ≈ 5% or 𝑡0.05,26 = 1.706
Adoptedfromhttp://education-
portal.com/academy/lesson/what
-is-a-t-test-procedure-
interpretation-
examples.html#lesson
= 𝟏. 𝟕𝟎𝟔
5%

df 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005
1 1.376 1.963 3.078 6.314 12.706 31.821 63.656 127.321 318.289 636.578
2 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.089 22.328 31.600
3 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924
4 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610
5 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.869
6 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850
21 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.768
24 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.689
28 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.660
30 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
31 0.853 1.054 1.309 1.696 2.040 2.453 2.744 3.022 3.375 3.633
32 0.853 1.054 1.309 1.694 2.037 2.449 2.738 3.015 3.365 3.622
33 0.853 1.053 1.308 1.692 2.035 2.445 2.733 3.008 3.356 3.611
34 0.852 1.052 1.307 1.691 2.032 2.441 2.728 3.002 3.348 3.601
35 0.852 1.052 1.306 1.690 2.030 2.438 2.724 2.996 3.340 3.591
36 0.852 1.052 1.306 1.688 2.028 2.434 2.719 2.990 3.333 3.582
37 0.851 1.051 1.305 1.687 2.026 2.431 2.715 2.985 3.326 3.574
38 0.851 1.051 1.304 1.686 2.024 2.429 2.712 2.980 3.319 3.566
39 0.851 1.050 1.304 1.685 2.023 2.426 2.708 2.976 3.313 3.558
40 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
50 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496
60 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
80 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416
100 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390
150 0.844 1.040 1.287 1.655 1.976 2.351 2.609 2.849 3.145 3.357
Infinity 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.290

The F Distribution
• If 𝑍1~ and 𝑍2~ and 𝑍1 and 𝑍2 are independent then the
random variable
𝐹 =
𝑍1
𝑘1
𝑍2
𝑘2
follows F distribution with 𝑘1 and 𝑘2 degrees of freedom, i.e.:
𝐹~𝐹𝑘1,𝑘2
or 𝐹~𝐹(𝑘1, 𝑘2)
• This distribution is skewed to
the right as the Chi-Square
distribution but as 𝑘1 and 𝑘2
increase (𝑛 → ∞) it approaches
to normal distribution.
2
2k2
1k
Adoptedfrom
http://www.vosesoftware.com/ModelRiskHelp/index.htm#Dis
tributions/Continuous_distributions/F_distribution.htm

The F Distribution
• The mean and standard deviation of the F distribution are:
𝜇 =
𝑘2
𝑘2−2
𝑓𝑜𝑟 (𝑘2 > 2) and
𝜎 =
𝑘2
𝑘2−2
2(𝑘1+𝑘2−2)
𝑘1(𝑘2−4)
𝑓𝑜𝑟 (𝑘2 > 4)
• Relation between t & Chi-Square Distributions with F distribution:
• For a random variable 𝑋~𝑡 𝑘it can be shown that 𝑋2~𝐹1,𝑘. This can
also be written as
𝑡 𝑘
2 = 𝐹1,𝑘
• If 𝑘2 is large enough, then
𝑘1. 𝐹𝑘1,𝑘2
~
2
1k

𝛼 = 0.25
All adopted from
http://www.stat.purdue.edu/~yuzhu/stat514s05/tab
les.html

Statistical Inference (Estimation)
• Statistical inference or statistical induction is one of the most
important aspect of decision making and it refers to the process of
drawing a conclusion about the unknown parameters of the
population from a sample of randomly chosen data.
• So, the idea is that a sample of randomly chosen data provides the
best information about parameters of the population and it can be
considered as a representative of the population when its size
reasonably (appropriately) large.
• The first step in statistical inference (induction) is estimation which
is the process of finding an estimate or approximation for the
population parameters (such as mean value and standard deviation)
using the data in the sample.

Statistical Inference (Estimation)
• The value of 𝑿 (sample mean) in a randomly chosen and
appropriately large sample is a good estimator of the population
mean 𝝁 . The value of 𝒔 𝟐
(sample variance) is also a good estimator
of the population variance 𝝈 𝟐.
• Before taking any sample from population (when the sample is not
realised or observed) we can talk about the probability distribution
of a hypothetical sample. The probability distribution of a random
variable 𝒙 in a hypothetical sample follows the probability
distribution of the population even if the sampling process is
repeated for many times.
• But the probability distribution of the sample mean 𝑿 in repeated
sampling does not necessarily follow the probability distribution of
its population when number of sampling increases.

Central Limit Theorem
• Central Limit Theorem:
Imagine random variable 𝑿 with any probability distribution is defined
in a population with the mean 𝝁 and the variance 𝝈 𝟐. If we get
𝒏 independent samples 𝑿 𝟏, 𝑿 𝟐, … , 𝑿 𝒏 and for each sample we
calculate the mean values 𝑿 𝟏, 𝑿 𝟐, … , 𝑿 𝒏(see figure below)
𝑿~𝒊. 𝒊. 𝒅(𝝁, 𝝈 𝟐
)
𝑿 𝟏
𝑿 𝟐
⋮
𝑿 𝒏
𝑖. 𝑖. 𝑑 ≡Independent &
Identically Distributed RVs

Central Limit Theorem
As the number of sampling increases infinitely, the random variable 𝑿
has a normal distribution (regardless of the population distribution)
and we have
𝑿~𝑵 𝝁,
𝝈 𝟐
𝒏
when 𝒏 → +∞
And in the standard form:
𝒁 =
𝑿 − 𝝁 𝑿
𝝈 𝑿
=
𝑿 − 𝝁
𝝈
𝒏
=
𝒏 𝑿 − 𝝁
𝝈
~𝑵(𝟎, 𝟏)
o Taking sample of 36 elements from a population with the mean of 20 and
standard deviation of 12, what is the probability that the sample mean
falls between 18 and 24?
𝑃 18 < 𝑥 < 24 = 𝑃 −1 <
𝑥 − 20
12
36
< 2 = 0.3413 + 0.4772 ≈ 82%

Estimation
• In previous slides we introduced some of the most important
probability distributions for discrete & continuous random variables.
• In many cases we know the nature of the probability distribution of
a random variable, defined in a population, but have no idea about
its parameters such as mean value or/and standard deviation.
• Point Estimation:
• To estimate the unknown parameters of a probability distribution of
a random variable we can either have a point estimation or an
interval estimation using an estimator.
• The estimator is a function of the sample values 𝒙 𝟏, 𝒙 𝟐, … , 𝒙 𝒏 and it
is often called a statistic. If 𝜽 represent that estimator we have:
𝜽 = 𝒇(𝒙 𝟏, 𝒙 𝟐, … , 𝒙 𝒏)

Estimation
• 𝜽 is said to be an unbiased estimator of true 𝜽 (parameter of the
population) if 𝑬 𝜽 = 𝜽. Because the bias itself is defined as
𝑩𝒊𝒂𝒔 = 𝑬 𝜽 − 𝜽
o For example, the sample mean 𝑿 is a point and unbiased estimator
for the unknown parameter 𝝁 (population mean):
𝜽 = 𝑿 = 𝒇 𝒙 𝟏, 𝒙 𝟐, … , 𝒙 𝒏 =
𝟏
𝒏
𝒙 𝟏 + 𝒙 𝟐 + ⋯ + 𝒙 𝒏
It is unbiased because 𝑬 𝑿 = 𝝁.

• The sample variance in the form of 𝒔 𝟐
=
𝒙 𝒊− 𝒙 𝟐
𝒏
is a point but a
biased estimator of the population variance 𝝈 𝟐
in a small sample:
𝑬 𝒔 𝟐
= 𝝈 𝟐
(𝟏 −
𝟏
𝒏
) ≠ 𝝈 𝟐
But it is a consistent estimator because it will approaches to 𝝈 𝟐when the
sample size 𝒏 increases indefinitely (𝒏 → ∞)
• With Bessel’s correction (changing 𝒏 to (𝒏 − 𝟏)) we can define
another sample variance which is unbiased even for small sample
size.
𝒔 𝟐 =
𝒙𝒊 − 𝒙 𝟐
𝒏 − 𝟏
• The methods of finding point estimators are mostly least-square
method and maximum likelihood method which among them the first
method will be discussed later.
Estimation

Interval Estimation
• Interval Estimation:
• Interval estimation, in contrary, provides an interval or a range of
possible estimates at a specific level of probability, which is called
level of confidence, within which the true value of the population
parameter may lie.
• If 𝜽 𝟏 and 𝜽 𝟐 are respectively the lowest and highest estimates of 𝜽
the probability that 𝜽 is covered by the interval 𝜽 𝟏, 𝜽 𝟐 is:
𝐏𝐫 𝜽 𝟏 ≤ 𝜽 ≤ 𝜽 𝟐 = 𝟏 − 𝜶 (0 < 𝛼 < 1)
Where 𝟏 − 𝜶 is the level of confidence and 𝜶 itself is called level of
significance. The interval 𝜽 𝟏, 𝜽 𝟐 is called confidence interval.

Interval Estimation
 How to find 𝜽 𝟏 𝒂𝒏𝒅 𝜽 𝟐?
In order to find the lower and upper limits of a confidence interval we need to
have a prior knowledge about the nature of distribution of the random variable
in the population.
 If random variable 𝒙 is normally distributed in the population and the
population standard deviation (𝝈) is known, the 95% confidence interval for
the unknown population mean (𝝁) can be constructed by finding the
symmetric z-values associated to 95% area under the standard normal
curve:
𝟏 − 𝜶 = 𝟗𝟓% → 𝜶 = 𝟓% →
𝜶
𝟐
= 𝟐. 𝟓%
So, ±𝒁 𝟎.𝟎𝟐𝟓 = ±𝟏. 𝟗𝟔
We know that: 𝒁 =
𝑿−𝝁 𝑿
𝝈 𝑿
=
𝑿−𝝁
𝝈
𝒏
, so:
𝑷(−𝒁 𝜶
𝟐
≤ 𝒁 ≤ 𝒁 𝜶
𝟐
) = 𝟗𝟓%
Adopted & altered from http://upload.wikimedia.org/wikipedia/en/b/bf/NormalDist1.96.png
=1−𝛼
𝜶
𝟐
= 𝟎. 𝟎𝟐𝟓
𝜶
𝟐
= 𝟎. 𝟎𝟐𝟓
−𝒁 𝜶
𝟐
= = 𝒁 𝜶
𝟐

Interval Estimation
• So we can write:
𝑷 𝒙 − 𝟏. 𝟗𝟔𝝈 𝒙 ≤ 𝝁 ≤ 𝒙 + 𝟏. 𝟗𝟔𝝈 𝒙 = 𝟎. 𝟗𝟓
Or
𝑷 𝒙 − 𝟏. 𝟗𝟔
𝝈
𝒏
≤ 𝝁 ≤ 𝒙 + 𝟏. 𝟗𝟔
𝝈
𝒏
= 𝟎. 𝟗𝟓
Therefore, the interval 𝒙 − 𝟏. 𝟗𝟔
𝝈
𝒏
, 𝒙 + 𝟏. 𝟗𝟔
𝝈
𝒏
represents a 95%
confidence interval (𝐶𝐼95%)of the unknown value of 𝝁.
It means in repeated random
sampling (for 100 times) we
expect 95 out of 100 intervals,
such as the above, cover the
unknown value of the
population mean 𝝁 .
𝒙 ̅− 𝟏.𝟗𝟔 𝝈/√𝒏 = = 𝒙 ̅− 𝟏.𝟗𝟔 𝝈/√𝒏
Adopted and altered from http://forums.anarchy-online.com/showthread.php?t=604728

Interval Estimation for population Proportion
 A confidence interval can be constructed for the population
proportion (see the graph below)
𝑋~𝐵𝑖(𝑛𝑝, 𝑛𝑝 1 − 𝑝 )
𝒑 𝟏
𝜇 𝜎2
𝒑 𝟐
⋮
𝒑 𝒏
𝝁 𝒑 = 𝑬 𝒑 = 𝒑 =
𝝁
𝒏
𝝈 𝟐
𝒑
= 𝒗𝒂𝒓 𝒑 =
𝝈 𝟐
𝒏 𝟐
=
𝒑(𝟏 − 𝒑)
𝒏
𝒑 in each sample
represents a
sample proportion.
In repeated random
sampling 𝒑 has its
own probability
distribution with
mean value and
variance

Interval Estimation for population Proportion
• The 90% confidence interval for the population proportion 𝒑 when
sample size is bigger than 30 (n>30) and there is no information
about the population variance will be constructed as following:
±𝒁 𝜶
𝟐
=
𝒑 − 𝒑
𝒑(𝟏 − 𝒑)
𝒏
𝑷(−𝒁 𝜶
𝟐
≤ 𝒁 ≤ +𝒁 𝜶
𝟐
) = 𝟏 − 𝜶
𝑷( 𝒑 − 𝒁 𝜶
𝟐
.
𝒑(𝟏− 𝒑)
𝒏
≤ 𝒑 ≤ 𝒑+𝒁 𝜶
𝟐
.
𝒑(𝟏− 𝒑)
𝒏
) = 𝟎. 𝟗
So, the confidence interval can be simply
written as:
𝑪𝑰 𝟗𝟎% = 𝒑 ∓ 𝟏. 𝟔𝟒𝟓
𝒑(𝟏 − 𝒑)
𝒏 =90% 𝜶
𝟐 = 𝟎. 𝟎𝟓𝜶
𝟐 = 𝟎. 𝟎𝟓
−𝒁 𝜶
𝟐
= −𝟏. 𝟔𝟒𝟓 𝒁 𝜶
𝟐
= 𝟏. 𝟔𝟒𝟓
Obviously, if we had
knowledge about the
population variance we
were be able to estimate
the population
proportion 𝒑 directly.
Why?
Adopted and altered fromhttp://www.stat.wmich.edu/s216/book/node83.html

Examples
o Imagine the weight of people in a society distributed normally. A
random sample of 25 with the sample mean 72 kg is taken from this
society. If the standard deviation of the population is 6 kg find a)the
90% b)95% and c) 99% confidence interval for the unknown
population mean.
a) 1 − 𝛼 = 0.9 →
𝛼
2
= 0.05 → 𝑍 𝛼
2
= 1.645
So, 𝐶𝐼90% = 72 ± 1.645 ×
6
25
= 70.03 , 73.97
b) 1 − 𝛼 = 0.95 →
𝛼
2
= 0.025 → 𝑍 𝛼
2
= 1.96
So, 𝐶𝐼95% = 72 ± 1.96 ×
6
25
= 69.65 , 74.35
c) 1 − 𝛼 = 0.99 →
𝛼
2
= 0.005 → 𝑍 𝛼
2
= 2.58
So, 𝐶𝐼99% = 72 ± 2.58 ×
6
25
= 68.9 , 75.1

Examples
o Samples from one of the lines of production in a factory suggests
that 10% of products are defective. If the range of 1% difference
between sample and population proportion is acceptable what
sample size we need to construct a 95% confidence interval for the
population proportion? What about if the acceptable gap between
sample & population proportion increased to 3%?
1 − 𝛼 = 0.95 →
𝛼
2
= 0.025 → 𝑍 𝛼
2
= 1.96
𝑍 𝛼
2
=
𝑝 − 𝑝
𝑝(1 − 𝑝)
𝑛
→ 1.96 =
0.01
0.1 × 0.9
𝑛
→ 𝑛 = 196 × 0.3 2 ≈ 3458
If the gap increases to 3% then:
1.96 =
0.03
0.1×0.9
𝑛
→ 𝑛 = 196 × 0.1 2 ≈ 385

Interval Estimation (Using t-distribution)
• If the population standard deviation 𝝈 is unknown and we use
sample standard deviation 𝒔 instead, and the size of the sample is
less than 30 (𝒏 < 𝟑𝟎) then the random variable
𝒙 − 𝝁
𝒔
𝒏
~𝒕 𝒏−𝟏
has t-distribution with 𝒅𝒇 = 𝒏 − 𝟏.
This means a confidence interval for the population mean 𝝁 will be in
the form of:
𝑪𝑰(𝟏−𝜶) = 𝒙 − 𝒕 𝜶
𝟐,𝒏−𝟏
𝒔
𝒏
, 𝒙 + 𝒕 𝜶
𝟐,𝒏−𝟏
𝒔
𝒏
−𝒕 𝜶
𝟐
,𝒏−𝟏
𝒕 𝜶
𝟐
,𝒏−𝟏
1 − 𝛼 % 𝜶
𝟐
𝜶
𝟐
Adopted and altered from http://cnx.org/content/m46278/latest/?collection=col11521/latest

Interval Estimation
• The following flowchart can help to choose between Z and t-
distributions when the interval estimation is constructed for 𝝁 in
the population.
Use
nonparametric
methods
Adopted from http://www.expertsmind.com/questions/flow-chart-for-confidence-interval-30112489.aspx

Interval Estimation
• Here there is a list of confidence intervals for the subject parameters
in the population.
Adopted from http://www.bls-stats.org/uploads/1/7/6/7/1767713/250709.image0.jpg

Hypothesis Testing
• Hypothesis testing is one of the important aspects of statistical inference.
The main idea is to find out if some claims/statements (in the form of
hypothesis) about population parameters can be statistically rejected by
the evidence from the sample using a test statistic (a function of sample).
• Claims can be made in the form of null hypothesis (𝐻0) against the
alternative hypothesis (𝐻1) and they are just rejectable. These two
hypotheses should be mutually exclusive and collectively exhaustive. For
example:
𝐻0: 𝜇 = 0.8 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1: 𝜇 ≠ 0.8
𝐻0: 𝜇 ≥ 2.1 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1: 𝜇 < 2.1
𝐻0: 𝜎2
≤ 0.4 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1: 𝜎2
> 0.4
 Always remember that the equality sign comes with 𝐻0.
• If the value of the test statistic lies in the rejection area(s) the null
hypothesis must be rejected, otherwise the sample does not
provide sufficient evidence to reject the null hypothesis.

Hypothesis Testing
• Assuming we know the distribution of the random variable in the
population and also having statistical independence between different
random variables, in hypothesis testing we need to follow the following
steps:
1. Stating the relevant null & alternative hypotheses. The state of the null
hypothesis (being =, ≥, ≤ something)indicates how many rejection
regions we will have (for = sign we will have two regions and for others
just one region; depending on the difference between the value of
estimator and the claimed value for the population parameter the
rejection area could be on the right or left of the distribution curve).
𝐻0: 𝜇 = 0.5
𝐻1: 𝜇 ≠ 0.5
𝐻0: 𝜇 ≥ 0.5 (𝑜𝑟 𝜇 ≤ 0.5)
𝐻1: 𝜇 < 0.5 (𝑜𝑟 𝜇 > 0.5)
GraphsAdopted from http://www.soc.napier.ac.uk/~cs181/Modules/CM/Statistics/Statistics%203.html

Hypothesis Testing
2. Identifying the level of significance of the test (𝜶) and it is usually
considered to be 5% or 1%, depending on the nature of the test and the
goals of researcher. When 𝜶 is known with the prior knowledge about
the sample distribution, the critical region(s) (or rejection area(s)) can be
identified.
Here we have two
critical values for
standard normal
distributions
associated to the level
of significance 𝛼 =
5% and 𝛼 = 1%
Adoptedfrom http://www.psychstat.missouristate.edu/introbook/sbk26.htm
𝑍 𝛼=1.65
𝑍 𝛼=2.33

Hypothesis Testing
3. Constructing a test statistic (a function based on the sample distribution &
sample size). This function is used to decide whther or not to reject 𝑯 𝟎.
TableAdoptedfromhttp://www.bls-stats.org/uploads/1/7/6/7/1767713/250714.image0.jpg
Here we
have a list
of some of
the test
statistics
for testing
different
hypotheses

Hypothesis Testing
4. Taking a random sample from the population and calculating the value of
the test statistic. If the value is in the rejection area the null hypothesis 𝑯 𝟎
will be rejected in favour of the alternative 𝑯 𝟏at the predetermined
significance level 𝜶, otherwise the sample does not provide sufficient
evidence to reject 𝑯 𝟎 (this does not mean that we accept 𝑯 𝟎)
Adoptedfrom http://www.onekobo.com/Articles/Statistics/03-Hypotheses/Stats3%20-%2010%20-%20Rejection%20Region.htm
−𝒁 𝜶 𝑜𝑟 − 𝒕 𝜶,𝒅𝒇 if there is a left-tail test
−𝒁 𝜶
𝟐
𝑜𝑟 − 𝒕 𝜶
𝟐
,𝒅𝒇 if there is a two-tail test
+𝒁 𝜶 𝑜𝑟 + 𝒕 𝜶,𝒅𝒇 if there is a right-tail test
+𝒁 𝜶
𝟐
𝑜𝑟 + 𝒕 𝜶
𝟐
,𝒅𝒇 if there is a two-tail test

Example
o A chocolate factory claims that its new tin of cocoa powder contains at
least 500 gr of the powder. A standard checking agency takes a random
sample of 𝑛 = 25 of the tins and found out that sample mean weight of
tins is 𝑋 = 520 𝑔𝑟 and the sample standard deviation is 𝑠 = 75 𝑔𝑟. If we
assume the weight of cocoa powder in tins has a normal distribution,
does the sample provide enough evidence to support the claim at 95%
level of confidence?
1. 𝐻0: 𝜇 ≥ 500
𝐻1: 𝜇 < 500 (so, it is a one-tail test)
2. Level of significance 𝛼 = 5% → 𝑡 𝛼
2
,(𝑛−1)
= 𝑡0.05,24 = 1.711 (it is t-
distribution because 𝑛 < 30 and we do not have a prior knowledge
about the population standard deviation)
3. The value of the test statistics is : 𝑡 =
𝑋−𝜇
𝑠
𝑛
=
520−500
75
25
= 1.33
4. As 1.33 < 1.711 we are not in the rejection area so, the claim cannot be
rejected at 5% level of significance.

Type I & Type II Errors
• Two types of errors can occur in hypothesis testing:
A. Type I error; when based on our sample we reject a true null
hypothesis.
B. Type II error; when based on our sample we cannot reject a false
null hypothesis.
• By reducing the level of significance 𝜶 we can reduce the
probability of making type I error (why?) however, at the same
time, we increase the probability of making type II error.
• What would happen to type I and type II errors if we increase the
sample size? (Hint: look at the confidence intervals)
Adoptedfrom http://whatilearned.wikia.com/wiki/Hypothesis_Testing?file=Type_I_and_Type_II_Error_Table.jpg

Type I & Type II Errors
• The following graph shows how a change of the critical line (critical
value) changes the probability of making type I and type II errors:
𝑷 𝑻𝒚𝒑𝒆 𝑰 𝒆𝒓𝒓𝒐𝒓 = 𝜶
And
𝑷 𝑻𝒚𝒑𝒆 𝑰𝑰 𝒆𝒓𝒓𝒐𝒓 = 𝜷
Adoptedfrom http://www.weibull.com/hotwire/issue88/relbasics88.htm
The Power Of a Test:
The power of a test is
the probability that the
test will correctly reject
the null hypothesis. It is
the probability of not
committing type II
error. The power is
equal to 𝟏 − 𝜷 which
means by reducing 𝜷
the power of the test
will increase.

The P-Value
• It is not unusual to reject 𝐻0 at some level of significance, for
example 𝛼 = 5% , but being unable to reject it at some other
levels, e.g. 𝛼 = 1% . The dependence of the final decision to the
value of 𝛼 is the weak point of the classical approach.
• In the new approach, we try to find p-value which is the lowest
significance level at which 𝐻0 can be rejected. If the level of
significance is determined at 5% and the lowest significance level at
which 𝐻0 can be rejected (p-value) is 2% so the null hypothesis
should be rejected; i.e.
𝒑 − 𝒗𝒂𝒍𝒖𝒆 < 𝜶
 To understand this concept better let’s look at an example:
• Suppose we believe that the mean life expectancy of the people in
a city is 75 years (𝐻0: 𝜇 = 75). But our observation shows a sample
mean of 76 years for a sample size of 100 with a sample variance of
4 years.
Reject 𝐻0

The P-Value
• The Z-score (test statistic) can be calculated as following:
• At 5% level of significance the critical Z-value is 1.96 so we must
reject 𝑯 𝟎. But, we should not have had this result (or should not
have had those observations in our random sample) from the
beginning if our assumption about the population mean 𝝁 was
correct.
• The p-value is the probability of
having these type of results
or even worse than that (i.e. a Z-score
bigger than 2.5) considering the null
hypothesis was correct,
𝑷(𝒁 ≥ 𝟐. 𝟓 𝝁 = 𝟕𝟓) = 𝒑 − 𝒗𝒂𝒍𝒖𝒆 ≈ 𝟎. 𝟎𝟎𝟔 (it means in 1000 samples this type of
results can happen theoretically 6 times; but it has happened in our first random
sampling).
𝑍 =
𝑋 − 𝜇
𝑠
𝑛
=
76 − 75
4
100
= 2.5
Z=2.5
𝑷 𝒁 ≥ 𝟐. 𝟓
≈ 𝟎. 𝟎𝟎𝟔
http://faculty.elgin.edu/dkernler/statistics/ch10/10-2.html

The P-Value
• As we cannot deny what we have observed and obtained from the
sample, eventually we need to change our belief about the
population mean and reject our assumption about that.
• The smaller the p-value, the stronger evidence against 𝐻0.

Statistics (recap)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Statistics (recap)

Similar to Statistics (recap) (20)

More from Farzad Javidanrad

More from Farzad Javidanrad (12)

Recently uploaded

Recently uploaded (20)

Statistics (recap)