Procuring digital preservation CAN be quick and painless with our new dynamic...
Business statistics-i-part1-aarhus-bss
1. BUSINESS STATISTICS I
Lectures Part 1 — Weeks 36 – 42
Antonio Rivero Ostoic
School of Business and Social Sciences
September − October
AARHUS
UNIVERSITYAU
2. BUSINESS STATISTICS I
Lecture – Week 36
Antonio Rivero Ostoic
School of Business and Social Sciences
September
AARHUS
UNIVERSITYAU
3. Today’s Agenda
Fundamental Concepts in Statistics
Types of Data
Computing and Representing Data Types
and practical information about the course...
2 / 34
4. Definition of statistics
“Statistics is a way to get information from data” (Keller, p. 1)
ª Data is a set of observations, whereas information is the message
However, statistics can also be viewed as methods for
collecting and analysing data
ª in this case the research design is a statistical procedure as well
Statistics draw conclusion from numbers...
3 / 34
5. Types of statistics
Statistical analysis is classified as descriptive or inferential
Descriptive
The goal is to describe the data, i.e. organize, summarize, present...
Instead for listing all observations, we summarize the data through
numerical techniques or represent it through a graphical picture
Descriptive statistics provides a typical mark of the data that is more
meaningful than the complete listing
We can also find patterns of the data through as an explorative
phase of the analysis
4 / 34
6. Types of statistics II
Inferential
In this case the goal is to make conclusions from the data
Inferential statistics allows making generalizations from samples
to the population values
Statistical inferences are made through different kinds of test
ª like hypothesis testing or tests of significance, tests of reliability, etc.
We can also make predictions based on the data
5 / 34
7. Key statistical concepts
Population
The final goal in statistics is to learn about populations
Population constitutes the total set of subjects of interest in the
study
However in statistics, population —rather than be a particular
group of individuals or cases— refers to a variable
ª e.g. the teenagers downloading an app onto their mobile devices
6 / 34
8. Key statistical concepts II
Sample
Inferences about populations are based on sample data
Samples are selected individuals or cases of the population on
which the study collects data
Through samples is possible to study populations in a practical
manner
7 / 34
9. Key statistical concepts III
Descriptive measures
• For populations are parameters
• For samples are statistics
We use statistics to make inferences about parameters
8 / 34
12. Types of data
• Both populations and samples are described in
terms of variables
• A variable is a characteristic that consist of two or
more observed values, which constitute the data
11 / 34
13. Types of data II
A data type is classified according to the kind of variable or to the scale
Variables:
Quantitative
– continuous
– discrete
Qualitative or categorical
– discrete
Scales:
• Nominal
• Ordinal
• Interval ( Ratio)
a score is the numerical value which indicates the quantity of a variable
12 / 34
14. Data scales
Types of variables are measured according to different scales
Nominal
labels used for categorical variables
do not represent degree of difference
cannot be ranked
it is just possible to calculate the frequency of occurrences and
compare these measures
usually the responses are recorded using codes
ª e.g. Gender, Nationality
13 / 34
15. Data scales II
Interval (and Ratio)
used for quantitative variables
there is order and the adjacent intervals between the points of
the scale are of equal extent
there is a degree of difference (Interval without a ratio)
the measure has an arbitrary zero point (Interval), and an
absolute zero point (Ratio)
it is possible to calculate measures of location and dispersion
ª e.g. Temperature ◦
C (Interval), Age (Ratio)
14 / 34
16. Data scales III
Ordinal
for qualitative variables where the order of the values is significant
there is a degree of difference with ranks
typically measures of non-numeric concepts
it is possible to calculate measures of location
ª e.g. Degree of satisfaction, TRUE/FALSE (?)
It is important to identify the scale and type of data to produce
because it determines which statistical procedure we are going to use
15 / 34
18. Computing and representing data types
Nominal data
Recall that with nominal data we can only count the frequency of
the different categories, which is typically given in a frequency
distribution table
The percentage of the counts represents a relative frequency
Since the variable is qualitative, we can code the responses with
numbers
Single sets of data or one nominal variable are called univariate
17 / 34
19. Frequency table (Example of interval data treated as nominal)
COUNTRY % 2011
--------------------
1 Belgium 27.3
2 Bulgaria 23.2
3 Czech Rep. 25.0
4 Denmark 30.7
5 Germany 26.1
6 Estonia 25.5
7 Spain 27.5
8 France 26.9
9 Croatia 25.7
10 Italy 24.0
11 Cyprus 40.3
12 Latvia 26.5
13 Lithuania 24.3
14 Malta 43.5
15 Netherlands 26.3
16 Austria 29.2
17 Poland 28.4
18 Romania 17.5
19 Slovenia 32.3
20 Slovakia 22.5
21 Finland 26.6
22 Sweden 27.3
23 Gr. Britain 29.5
24 Iceland 26.1
25 Norway 22.3
26 Turkey 19.1
27 United States 30.2
28 Japan 31.3
Source: Eurostat
18 / 34
20. Bar chart (Example of interval data treated as nominal)
Graphical methods
Belgium Bulgaria Czech Rep. Denmark Germany Estonia
Expenditure on education per capita
(year 2011)
COUNTRY
%GDP
051015202530
19 / 34
21. Pie chart (Example of interval data treated as nominal)
Graphical methods
Belgium 27.3 %
Bulgaria 23.2 %
Czech Rep. 25 %
Denmark 30.7 %
Germany 26.1 %
Estonia 25.5 %
Expenditure on education per capita
(year 2011)
20 / 34
22. Computing and representing data types II
Ordinal data
Ordinal data should be treated as nominal
Frequency tables and charts are also used, but arranged in
ascending or descending ordinal values
ª bar charts with descending frequencies are also known as Pareto plots
21 / 34
23. Pareto plot (Example of interval data treated as nominal)
Graphical methods
Denmark Belgium Germany Estonia Czech Rep. Bulgaria
Expenditure on education per capita
(year 2011)
COUNTRY
%GDP
051015202530
22 / 34
24. Bivariate nominal data
With bivariate nominal data there are either two variables or one
pair of data sets in the analysis
The relationship between two nominal variables are represented
by a cross-tabulation table
ª another term used is contingency table
Graphically the bar charts are represented with two dimensions
Tables and graphics with more than two dimensions are difficult
to interpret
23 / 34
25. Frequency table with two data sets
Expenditure on education
as a percentage of GDP per capita
YEAR
=================
COUNTRY 2001 2011
-----------------------------
1 Belgium 25.6 27.3
2 Bulgaria 22.5 23.2
3 Czech Rep. 19.3 25.0
4 Denmark 28.9 30.7
5 Germany 25.3 26.1
6 Estonia NA 25.5
Source: Eurostat
24 / 34
26. Bivariate bar chart (Example of interval data treated as nominal)
Dodge
25.6
22.5
19.3
28.9
25.3
27.3
23.2
25
30.7
26.1
25.5
0
10
20
30
2001 2011
Year
%GDP
COUNTRY
Belgium
Bulgaria
Czech Rep.
Denmark
Estonia
Germany
Expenditure on education per capita
25 / 34
27. Computing and representing data types III
Interval data
Interval scale is used to represent quantitative information
As with nominal data, we can still use frequency distribution tables
for interval data, but categorizing the information in series of
intervals called classes
Classes of intervals do not overlap, and they cover the complete
range of information
The number of class intervals to choose is a function of the amount
of observations
Widths for the class intervals (called bins in the bar chart) is given by
largest obs. − smallest obs.
n
26 / 34
28. Histogram
Graphical methods
Expenditure on education per capita
(28 countries in Europe, USA, Japan)
% GDP, year 2011
Frequency
15 20 25 30 35 40 45
02468101214
27 / 34
29. Types of histograms
Histograms can be either symmetric or skewed
A special type of symmetric histogram is bell-shaped
A skewed histogram to the right is positively skewed, and to the left
is negatively skewed
Histograms with a single peak are called unimodal, and with two
peaks are bimodal
28 / 34
30. Other graphical techniques for interval data
• Stem-and-Leaf Display
ª to represent small samples of scores
• Ogive
ª for (cumulative) relative frequency distribution
• Line chart
ª for time-series data
• Pictograms...
29 / 34
31. Bivariate interval data
This is the relationship between two interval variables that is
very common in the statistical analysis
Graphically such relation is represented by a scatter diagram or
scatterplot
Scatter diagrams reveal forms of association among the
observations
Associations can be linearly related (weak or strong) with a
positive or a negative direction, or else a nonlinear relationship
Typically one variable depends on other variables(s)
30 / 34
33. Graphical presentation
Recommendations
A good graphical presentation should be concise, coherent,
and clear to the viewer
Well-designed graphics reveal substance rather than just form,
and illustrate patterns of relations between variables
Include the scales and marks in every axis with a caption that
has a direct link to what the diagram wants to display
Avoid distortion of the data
32 / 34
34. Applications of statistics
Traditional fields of applications of statistics in social sciences:
• demography
• econometrics
• psychometrics
• ...
A rapidly growing discipline for organisation and marketing is
business analytics = business + technology
Emphasis on statistical exploration of big data
Meaningful analysis in short time (or on the fly)
Data-driven decision making
33 / 34
35. Summary
Statistics is usually concerned in studying populations by means
of particular samples
The main concerns of statistics is to describe the sampled data
or to make inferences about the population
There are different kinds of data, which can be classified
according to the type of variable or its scale
Nominal and ordinal data is for a qualitative variable, whereas
interval data is for quantitative information
Each data type has its own graphical techniques where is
possible to combine two or more variables
34 / 34
36. BUSINESS STATISTICS I
Lecture – Week 37
Antonio Rivero Ostoic
School of Business and Social Sciences
September
AARHUS
UNIVERSITYAU
38. Review
lecture week 36
Recall that our final goal in statistics is to learn about populations
To make inferences on populations, we base the analysis on
sample data
We refer respectively as parameters and statistics the different
measures for populations and samples
3 / 36
39. Measures of central location
Measures or indices of central location search for describe
the center of the data
The most popular measure of central location is the mean
There are (at least) three types of means:
• arithmetic mean, • geometric mean, • harmonic mean
The arithmetic mean is simply called the ‘mean,’ and
another name for it is the average
• The mean equals to the sum of the scores in a sample
divided by the sample size
4 / 36
40. Arithmetic mean
To define the (arithmetic) mean, sample observations are
represented as x1, x2, . . . , xn where n is the sample size
Population mean
µ =
N
i=1 xi
N
where means ‘sum of’
Sample mean
¯x =
n
i=1 xi
n
* thus we use Greek letters for parameters and Latin letters for statistics
5 / 36
41. Median
The median is another popular measure of central location and
it refers to the score in the ‘middle’
• The median calculated by ranking the scores and where
half of them are above it and half of them are below
for even number of scores, the median may equals to the average
of the two scores in the middle
19, 17, 23, 29, 12
12, 17, 19, 23, 29
6 / 36
42. Mode
Another measure of central location that – unlike mean and
median – can be applied to qualitative or nominal data is the mode
• The mode is the most frequently occurring score or
category in the data
however the mode is not the frequency itself but a value
12 electricians, 30 nurses, and 4 clerks
7 / 36
43. Location measures
Summary
For interval data the mean is the most appropriate
measure of central location
ª i.e. the average
For ordinal data the median is a more suitable statistic
For nominal data the only measure of location we have
seen so far is the mode
ª cf. Keller’s book view on this
8 / 36
44. Measures of variability
In descriptive statistics we are not only interested in measuring the
central location of the data but also in the spread or dispersion of
the observations
Many times if we want to compare measures of location in different
data sets or variables, we need to know the variability in the data
For establishing the widths of the bins in a bar chart (cf. 26 in week 36)
we have already seen a variability measure
ª here we considered the numerical difference between the extreme values in
class intervals
9 / 36
45. Range
The difference between the largest and the smallest score in a
data set is known as the range
Range = Largest observation − Smallest observation
The range is the simplest measure of variability since it
considers only two scores
The disadvantage of the range is that do not tell us nothing
about the other scores
ª hence we can have two entirely different data sets with identical range
10 / 36
46. Variance
Variance is the average of the squared distances from the mean
That is
sum of(each score − mean score)2
number of scores
Thus the larger the variance is, the more the scores differ on
average from the mean
11 / 36
48. Sample variance
Sample variances are computed to estimate population variances
The sample variance is however corrected by the mean, which
implies that the squared differences from the mean is divided by n−1
ª it results that this formula gives a better estimate than without a correction
But why we square the differences from the mean before averaging?
One reason is to avoid the canceling effect, in which the sum of the
positive and the negative deviations equals 0
The interpretation of the variance is not straightforward since by squaring
the differences from the mean we transform the unit of the data set
13 / 36
49. Standard deviation
The standard deviation is the square root of the variance
ª ‘deviation’ is the difference between each score and the mean
sum of(each score − mean score)2
number of scores
By computing the square root of the variance we preserve the
original unit of the data set
14 / 36
50. Standard deviation II
Population standard deviation
σ =
√
σ2 =
N
i=1(xi − µ)2
N
Sample standard deviation
s =
√
s2 =
n
i=1(xi − x)2
n − 1
15 / 36
51. The empirical rule
By knowing the mean, the standard deviation, and the type of
distribution we can deduce relevant information of the data
For a normal distribution – with a bell shaped curve – it is
possible to apply the empirical rule
≈ 68%
≈ 95%
≈ 99.7%
µ − 3σ µ − 2σ µ − σ µ µ + σ µ + 2σ µ + 3σ
16 / 36
52. Dispersion measures
Last issues
Range, variance, and standard deviation are the most common
measures of variability for interval data
Are there measures of variation for ordinal and nominal data?
The proportion of the standard deviation and the mean provides
the coefficient of variation of the set of scores
ª this statistics can guide us about the magnitude of the standard deviation in
relation to the sample size
A generalization of the empirical rule that applies to all shape
distributions is found in the Chebysheff’s Theorem
17 / 36
53. Percentiles
Measure of relative standing
In order to provide the position of particular values relative to the
entire data set we count with measures of relative standing
ª applicable to interval and ordinal data
The Pth percentile is the value for which P% of the scores fall
below that value and (100 − P)% fall above it
Actually the median of a data set is a special case of percentile
ª it is the 50th percentile
Besides the median, there are other special cases of percentiles
called quartiles that correspond to the 25th and 75th percentiles
ª called lower and upper quartile respectively
18 / 36
54. Percentiles II
Location of a Percentile
LP = (n + 1)
P
100
Lower quartile, median, and upper quartile are labelled
respectively as Q1, Q2, and Q3
The interquartile range is the difference between the upper and
lower quartile, i.e. Q3 − Q1
19 / 36
55. Box Plots and Outliers
• A box plot is a graphical method used to represent
quartiles together with the most extreme observations
ª Box plots are useful in comparing different data sets
• Outliers are unusually large or small observations
ª ...and we suspect their validity
20 / 36
56. Box Plot without outliers
Example
0 5 10 15 20 25 30
4.21
4.19
score
exercise
21 / 36
57. Measures of Linear Relationship
Between variables
We have seen that it is possible to relate two quantitative
variables in a statistical analysis
The scatter diagram gives us an idea of the strength and
direction of the linear relationship between two variables
However the graphical information is quite loose and we can
obtain more precise information of the linear relationship with
different numerical measures
22 / 36
58. Covariance
As the name suggests, covariance is the variance that is
shared between two variables, X and Y
Population covariance
σxy =
N
i=1(xi − µx)(yi − µy)
N
Sample covariance
sxy =
n
i=1(xi − x)(yi − y)
n − 1
23 / 36
59. Correlation
The proportion of the covariance and the product of the standard
deviations results in the coefficient of correlation or just correlation
ª that is another measure of linear relationship
Population correlation
ρ =
σxy
σxσy
Sample correlation
r =
sxy
sxsy
where −1 { r, ρ } +1
If the statistic equals 0 then there is no linear relationship, whereas the
extreme values denote perfect positive, and negative relationship
ª otherwise the linear relationship is considered just as ‘weak’
24 / 36
60. Coefficient of Determination
If we square of the correlation between two quantitative variables
we obtain another measure of linear association called the
coefficient of determination, r2
The coefficient of determination measures the amount of
variation in Y that is explained by the variation in X
ª thus a clearer indication of the relationship than the correlation coefficient
Explained variation is important in statistics and it is at the core of
regression analysis
25 / 36
61. Method of Least Squares
A linear relationship can be visualized in a scatter diagram
by drawing a straight line through the scores
We need to estimate the line that represents ‘best’ the
sample data points
An objective method of producing a straight line is the
least squares method
ª which minimizes the sum of squared deviations between the scores
and the line
26 / 36
62. Method of Least Squares II
In a least squares line:
ˆy = b0 + b1x
ˆy represents the fitted value of y, whereas b0 and b1 are known as
the regression coefficients that we need to estimate
• b0 is the y-intercept: where the line crosses the y-axis
• b1 is the slope: the rise/run of the line
Least squares line coefficients
b1 =
sxy
s2
x
b0 = y − b1x
27 / 36
63. Guidelines for Exploring Data
We have seen that the descriptive statistic includes the
exploration of the data
Besides knowing the type of data we are dealing with, there
are important aspects to take into account in the exploration
– the central location in the data
– the dispersion (or lack of) in the observations
– the shape of the distribution for the data set
All these information guide us in choose the appropriate
numerical techniques to apply in further statistical analysis
28 / 36
64. Data collection and Sampling
We want to learn about populations when using statistics
Except for the census – where all members of the population
are the observations – inferences about populations are made
by means of samples
Sampling techniques are crucial for the statistical analysis,
and they are part of the research design
29 / 36
65. Methods for data collection
There are different methods of collecting data that corresponds
to a variable(s) of interest
Observational data
Direct Observation either by ‘observing’ behaviour in natural settings,
or by ‘asking’ questions to certain people
ª simple and inexpensive, but it has drawbacks
• we do not have control over the subject (main disadvantage)
• comparing groups is difficult
30 / 36
66. Methods for data collection II
Observational data
Surveys are based on samples of people from populations from
which we solicit information
There are different types of sample survey:
– Interviews (personal or telephonic)
– Self-administrated questionnaire
the questionnaires should be well designed for all survey types
ª questions should be short, simple, and concise;
start with demographic questions is recommended
31 / 36
67. Methods for data collection III
Experimental data
Experiments provide data from which we can compare responses
from subjects under different conditions that are known as treatments
ª more expensive than direct observation, but provides better results
• requires special planning to assign individuals
to treatments
• randomization in the sampling is needed
• it is however quite uncommon in social sciences
32 / 36
68. Sampling
When we perform sampling, we serve from different sampling plans
In a simple random sample or just random sample each case
in the population has an equal chance of being selected
ª this can be done through random number tables
A stratified random sampling separates the population into
mutually exclusive sets called strata, and then it draws a
random sample in each stratum
ª the population is ‘naturally’ divided into groups, i.e. the strata is
recognized by an easily identifiable variable
33 / 36
69. Sampling II
With a cluster random sampling the population is divided
into clusters and it draws random samples in each cluster
ª useful for large populations with samples across strata
In a systematic sampling each subject of the population is
assigned a number and – starting at a random number –
every kth member from then onwards is selected
Many times a random sampling is based on a sampling frame,
which is the list of cases from which the sample is selected
All sampling techniques that involve a random sample
constitutes a probability sampling
34 / 36
70. Sampling Errors
When taking observations from a population we may confront
with different types of variability
Sampling error is the variability of samples from the
characteristics of the population from which they came
ª e.g. the variability from the population mean
Nonsampling error, which results from:
– mistakes made in the acquisition of data
– when responses are not obtained from some subjects in
the sample (nonresponse error or bias)
– when some parts of the population are not selected for
the sample (selection bias)
35 / 36
71. Summary
We have seen important descriptive indices for the data, which
include location, variability, and measures of dispersion
ª these measures together with the shape of the distribution are crucial parts
of the descriptive statistics of the data
There are also numerical indicators when to calculate the
strength of linear relationship between two quantitative variables
Different methods for data collection and sampling are at the
core of the research design
ª we should avoid sampling and nonsampling variability
36 / 36
72. BUSINESS STATISTICS I
Lecture – Week 38a (38)
Antonio Rivero Ostoic
School of Business and Social Sciences
September
AARHUS
UNIVERSITYAU
74. What is probability?
Recall that we make inferences about populations (parameters)
based on sample data (statistics)
A link between population and sample lies can be found in
probability theory
Probability is a mathematical branch that was created to study
the chance that a particular outcome will occur
ª for gambling strategies in the 17th century that is actually before the
development of statistics itself
Many times probability is associated with random phenomena
3 / 31
75. Assigning probabilities
a) In the classical approach the probability is expressed in terms of the
proportion of a particular outcome to all possible outcomes
ª this is sort of an objective method
b) For the relative frequency approach we are interested in the
long-run relative frequency of the probability for an outcome to occur
c) The subjective approach includes a certain degree of belief to the
probability
4 / 31
76. Probability of events
Relative frequency
In order to obtain all possible outcomes we perform a random
experiment
Random experiments have an exhaustive and mutually
exclusive list of outcomes that constitutes the sample space,
S = { O1, O2, . . . , Ok }
Probabilities are expressed per event, and for each outcome
they vary between 0 and 1
The sum of probabilities of all outcomes in a sample space
must be 1
5 / 31
77. Probability of events II
Example
The toss of a die
– Sample space: S = {1, 2, 3, 4, 5, 6}
– Probability of events (classical approach): P(i) = 1
6
how would these probabilities be in case we apply a relative
frequency approach?
6 / 31
78. Joint probability
The joint probability is the probability of having two distinct traits,
and in this case we perform the intersection of simple events
For events A and B the intersection means that the event occurs
when both A and B occur, i.e. A and B
ª on the other hand, the union of events A and B corresponds to A or B
Joint probability table:
B1 B2
A1 P(A1 and B1) P(A1 and B2)
A2 P(A2 and B1) P(A2 and B2)
7 / 31
80. Marginal probability
The computation with the marginal probability is by adding
across rows or down columns of the joint probability table
e.g. in the joint probability table:
– Across row: P(A1 and B1) + P(A1 and B2)
– Down column: P(A1 and B2) + P(A2 and B2)
9 / 31
81. Conditional probability
The conditional probability is the probability of any event occurring
can be affected by another event
For events A and B, the conditional probability of event A given B is:
P(A | B) =
P(A and B)
P(B)
i.e., the proportion of the joint probability for both events to the given event
• In case that the probability of one event is not affected by other
event, then these events are said to be independent to each other
P(A | B) = P(A)
P(B | A) = P(B)
10 / 31
82. Other probability rules
Complement
P(A ) = 1 − P(A)
that is the event that occur when A does not occur
Addition
P(A or B) = P(A) + P(B) − P(A and B)
ª here we subtract the joint probability of the events because in the
marginal probability is taken twice
for mutually exclusive events their joint probability is zero
11 / 31
83. Other probability rules II
Multiplication for dependent events
P(A and B) = P(B) P(A | B) = P(A) P(B | A)
Multiplication for independent events
P(A and B) = P(A) P(B)
multiplication is used to compute the joint probability of two events
12 / 31
84. Probability tree
We can use a probability tree diagram to apply the probability
rules to a given problem
ª it displays all the outcomes of a sequence of branches
Toss of a coin
•
tail
tail1
2
head
1
21
2
head
tail1
2
head
1
2
1
2
head head 1
2
· 1
2
= 1
4
head tail 1
2
· 1
2
= 1
4
tail head 1
2
· 1
2
= 1
4
tail tail 1
2
· 1
2
= 1
4
13 / 31
86. The Monty Hall problem
In the original TV show game ‘Let’s Make a Deal’ the
participant has the choice to open one of three doors. One door
is concealing a car and the other two doors a goat
← ← ←
Once the participant makes his/her decision the host reveals
another door containing a goat and then asks the participant:
“Do you want to switch door or stay with your choice? ”
For us the main question here is: Does it make any difference
to switch from the original choice?
15 / 31
87. Probability tree diagram of the Monty Hall problem
•
2 3
2
1
2
3
1
22
3
1
3
1
2
2
1
2
1
3
probability Stay? Switch?
1
3 · 1
2 = 1
6 1
12
1
12
1
3 · 1
2 = 1
6 1
12
1
12
2
3 · 1
2 = 1
3 1
6
1
6
2
3 · 1
2 = 1
3 1
6
1
6
Initial probability for the car is 1
3
, and for a goat 2
3
. However when the host opens
a door then the probability for this door becomes 0, and 2
3
for the other door
16 / 31
88. Bayesian probability
Bayesian probability is one of the different interpretations of the
concept of probability
Here a probability is assigned to a hypothesis, whereas under
the frequentist view a hypothesis is typically tested without
being assigned a probability
ª hence Bayesian probability is related with the subjective approach
We have already seen that with conditional probability we can
measure the chance that an event occur given the occurrence
of another event
With Bayesian probability we can compute the chance of the
possible causes for a particular event to occur
17 / 31
89. Bayes’ Theorem
The Bayes’ Theorem is a rule that provides the conditional probability
of A occurring given that B already happened
P(A | B) =
P(A) P(B | A)
P(B)
Event A constitutes the hypothesis, whereas B is the observation
• P(A | B) is the posterior probability of A
ª an updated degree of belief
• P(A) is the prior probability of A
ª that is before the observation (with known probabilities)
• P(B | A) is the likelihood of A
ª i.e. the probability that the hypothesis confers upon the observation
• P(B) is the unconditional probability of B
ª i.e. the probability of the observation irrespective of any hypothesis
18 / 31
90. Bayes’ Theorem II
More specifically, the Bayes’ Theorem can be restated for a given
event B and events A1, A2, . . . , Ak where:
• P(A1 | B), P(A2 | B), . . . , P(Ak | B) are the posterior probabilities we seek
• P(A1), P(A2), . . . , P(Ak) are the prior probabilities
• P(B | A1), P(B | A2), . . . , P(B | Ak) are the likelihoods
Bayes Formula
P(Ai | B) =
P(Ai) P(B | Ai)
P(A1) P(B | A1) + P(A2) P(B | A2) + · · · + P(Ak) P(B | Ak)
19 / 31
91. Example
Bayes’ Theorem
• In a class where 60% were female, the probability of passing
the test was 90% for females and 70% for males. What is the
probability of someone passing the test being female?
• A1 and A2 are the proportions of being female and male
respectively, whereas B represents passing the test
• We need to calculate P(A1 | B):
.9 × .6
(.9 × .6) + (.7 × .4)
= .66
the prediction for a female passing the test has increased
due to the added information
20 / 31
92. Summarizing Probabilities
Correct method
When the joint probabilities are given we can compute the marginal
probabilities, which allows us to compute conditional probabilities
ª with conditional probability we can see whether the events are
independent or dependent
When the joint probabilities are required we can apply probability
rules and probability trees
– multiplication for intersection
– addition for mutually exclusive events
– Bayes formula for new conditional probabilities
21 / 31
93. Discrete Random Variables
A random variable is a variable from which its values has not
been chosen
ª in a fixed variable on the other hand the values are previously selected
Examples of random variables are the outcomes of flipping a
coin or rolling a die, and such actions are called experiments
Two types of random variables:
• Discrete: with countable number of values
ª e.g. flipping a coin whose values are the number of
occurrences for each possible outcome of the random variable
• Continuous: when the values are uncountable
ª e.g. amount of time to complete a task
22 / 31
94. Discrete Probability Distributions
For a discrete random variable the probability distribution describes
the associated probability of its possible outcome values
Each probability of a random variable is a quantity between 0 and 1,
and the sum of the probabilities of all possible values equals 1
That is, for x representing the outcome of a random variable X, and
P(x) the probability of that outcome:
0 P(x) 1 and
all x
P(x) = 1
23 / 31
95. Population and probability distributions
Probability distributions can be used as representatives of
populations as well
Both the population mean and variance have their counterparts
on parameters corresponding to a given probability distribution
• The mean of a probability distribution for a discrete variable X is
called the expected value of X and it is represented by E(X)
Mean of a probability distribution
E(X) = µ =
all x
x P(x)
24 / 31
96. Population and probability distributions II
Variance of a probability distribution
V(X) = σ2
=
all x
(x − µ)2
P(x)
• The standard deviation of a probability distribution (σ) equals
to square root the variance, i.e. σ =
√
σ2
If the probability distribution is approximately bell shaped then we
can apply the Empirical Rule to interpret σ
25 / 31
97. Laws of Expected Value and Variance
To quickly determine the expected value and variance of a
given constant or a random variable we use specific rules
X → a random variable
c → a constant
Expected Value
1. E(c) = c
2. E(X + c) = E(X) + c
3. E(cX) = cE(X)
Variance
V(c) = 0
V(X + c) = V(X)
V(cX) = c2
V(X)
26 / 31
98. Probability distributions involving two variables
Recall that P(x) represents the probability that a random
variable X equals x
ª in this case we are considering a single variable
It is possible to determine the probabilities for combinations
involving two variables X and Y, which is represented as:
P(x, y)
with conditions that the outcome for all pairs of values vary between
0 and 1, and the sum of probabilities in the sample space is 1
i.e. 0 P(x, y) 1 and all x all y P(x, y) = 1
27 / 31
99. Probability distributions involving two variables II
While univariate distributions represented the distribution of
one variable, with two variables such representation is made by
a bivariate distribution
It is possible to obtain both the joint probability and the marginal
probability of any bivariate probability distribution
The marginal distribution will provide us with the expected
mean, variance, and SD for each variable
However, with the association of two variables we can compute
the covariance and the coefficient of correlation as well
ª as for the linear relationship (cf. last lecture), but now involving probabilities
28 / 31
100. Covariance and Correlation
Covariance
Cov(X, Y) = σxy =
all x all y
(x − µX)(y − µY) P(x, y)
i.e. the product of the deviations from the mean for X and Y, and
their joint probability
Correlation
ρ =
σxy
σxσy
as before, the proportion of the covariance of the two variables
and their SDs
29 / 31
101. Sum of Two Variables
and expected outcomes
One combination that has practical applications is the sum of
two variables
Laws of Expected Value and Variance
1. E(X + Y) = E(X) + E(Y)
2. V(X + Y) = V(X) + V(Y) + 2Cov(X, Y)
if the variables are independent, then the covariance is 0
30 / 31
102. Summary
Probability deals with computing the chance for particular
outcomes in a given set of events, and there are different
approaches in assigning probabilities of events
With joint probabilities we can calculate marginal and conditional
probabilities and hence determine whether or not events are
independent
Probability trees are useful to compute probability models with
sequence of actions
Arbitrary phenomena is associated with probability through
random variables, and such types of variables are described by
probability distributions and rules of expected values
31 / 31
103. BUSINESS STATISTICS I
Lecture – Week 38b (39)
Antonio Rivero Ostoic
School of Business and Social Sciences
September
AARHUS
UNIVERSITYAU
105. Probability Distributions
Recall that we introduced the concept of probability distribution,
which describes the associated probability of possible outcomes
for a random variable
ª a random variable is a numerical outcome of an experiment
The distribution of such values is a very important piece of
information since provides us with the pattern of characteristics
of the variable values over the sample or population
Depending on the type of variable and the scale of the data, we
find a variety of distributions of populations or sampled data
In case we consider a single variable then the data is allocated
in univariate distributions
3 / 27
106. Binomial experiment
If e.g. we toss a coin n times and count the number of ‘heads’
then we perform a binomial experiment in a set number of trials
ª ‘binomial’ because there are two possible outcomes
In this case the outcomes of the trials are not affected by the
outcomes of other trials, which means that the trials are
independent to each other
By counting the number of heads, then a head is regarded a
success and a tail is considered as a failure
ª it can certainly be the other way around
The probability of success is denoted by p, and the probability of
failure as 1 − p
ª we try to estimate the value of p, which is between 0 and 1
4 / 27
107. Binomial random variable
The numbers of successes in the binomial experiment is called
the binomial random variable
ª A binomial random variable is discrete and can take on countable values
0, 1, 2, . . . , n
If we represent the number of successes in n trials as a random
variable X, then the number of failures becomes n − X
The probability for each sequence of branches in the probability
tree representing x successes and n − X failures is:
px
(1 − p)n−x
and the number of branch sequences for these outcomes is:
n
x
=
n!
x!(n − x)!
(cancelling)
where e.g. n! = n(n − 1)(n − 2) . . . (2)(1) (called ‘n factorial’) (but 0! = 1)
5 / 27
108. Binomial Random Variable Examples
Examples of a binomial random variable are:
• the number of correct guesses at n true/false questions when
you randomly guess all answers
• the number of left-handers in a randomly selected sample of n
unrelated people with replacement
• the number of (...)
6 / 27
109. Binomial distribution
Any binomial experiment is described by a binomial distribution
The probability of x successes in a binomial experiment is:
P(x) =
n
x
px
(1 − p)n−x
where P(x) is the probability of success for x = 0, 1, 2, . . . , n
The formula above is known as the probability mass function
(p.m.f.) corresponding to the binomial distribution
7 / 27
112. Binomial distribution
n = 30
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
x
P(x)
p = .25
p = .5
p = .75
10 / 27
113. Binomial distribution
n = 30
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
x
P(x)
p = .25
p = .5
p = .75
11 / 27
114. Expectations of the Binomial distribution
If a random variable X has a binomial distribution, we write
X ∼ B(n, p)
meaning that X is a binomially distributed random variable
Each random variable with a binomial distribution has the same
p.m.f., but they may have different values for the parameters
The parameters of the distribution are n and p, and both are
known, which means that for a binomial random variable we
can determine the expected values
• E(X) = µ = np
• V(X) = σ2 = np(1 − p)
• SD(X) = σ = V(X)
12 / 27
115. Example
Binomial distribution
The testing center (cf. Ex 15) shows that 14% of the new
cars have a defect
Suppose that the center tests 20 new cars in a daily basis
• What is the probability that the center finds just one defect
new car in a day?
i.e. P(X = 1) =
20
1
.141 (1 − .14)20−1 = .16
That is 16%
13 / 27
116. Cumulative Binomial Probabilities
We can use the cumulative probability if we want to calculate the
probability that a random variable is less than or equal to a value
That is, if we wish to determine P(X x)
P(X x) =
x
i=0
n
i
pi
(1 − p)n−i
where x is the greatest integer x (called the ‘floor’ under x)
this means that for P(X 1) = P(0) + P(1), and so on...
All such values are recorded in tables of binomial probabilities with
tabulated scores for different probabilities and trials with diverse size
14 / 27
117. Binomial Table
n = 5; p = .01, .05, .10, .20, .25 and x 4
TABLE 1 Binomial Probabilities
Tabulated values are . (Values are rounded to four decimal p
n ؍ 5
p
k 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.
0 0.9510 0.7738 0.5905 0.3277 0.2373 0.1681 0.0778 0.0
1 0.9990 0.9774 0.9185 0.7373 0.6328 0.5282 0.3370 0.1
2 1.0000 0.9988 0.9914 0.9421 0.8965 0.8369 0.6826 0.5
3 1.0000 1.0000 0.9995 0.9933 0.9844 0.9692 0.9130 0.8
4 1.0000 1.0000 1.0000 0.9997 0.9990 0.9976 0.9898 0.9
n ؍ 6
p
k 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.
0 0.9415 0.7351 0.5314 0.2621 0.1780 0.1176 0.0467 0.0
P(X … k) = a
k
x=0
p(xi)
thus for n = 5; p = .1 P(X 2) = .9914 (...)
15 / 27
118. Poisson Random Variable
Like in a binomial random variable, a Poisson random variable
corresponds to the number of occurrences of events or successes
ª named after S. D. Poisson
However in a Poisson random variable the number of successes is
considered in an interval time or specific region of space in a
Poisson experiment
Intervals are independent of each other, do not overlap, and the
probability of a success in an interval is proportional to its size
ª hence intervals with equal size have the same probability and an interval
approaches to 0 when it is very small
16 / 27
119. Poisson Random Variable Examples
Examples of a Poisson random variable are:
• the number of customers’ queueing (in a shop, a call center, a
public service) in a unit of time
• the number of hits on a Web site in a day
• the number of goals scored by a football team in a match
17 / 27
120. Poisson distribution
A Poisson experiment is described by a Poisson distribution
X ∼ P(µ)
µ is the expected value (mean) parameter of the underlying rate of
occurrence in an interval or region (also written as λ)
ª in this case the rate of occurrence is known and constant
The probability mass function for the Poisson distribution is:
P(x) =
e−µµx
x!
for a value of x = 0, 1, 2, . . . successes in a given interval or region
(theoretically with no upper limit)
e is a constant approx. 2.718 (Euler’s number) that is the base of the
natural logarithm
18 / 27
122. Expectations of the Poisson distribution
In the Poisson distribution the variance of is equal to its mean,
and the standard deviation of is equal to the square root of its
mean
E(X) = V(X) = µ = σ2
20 / 27
123. Example
Poisson distribution
Suppose a Website has 1.8 hits on average per minute
• What is the probability of receiving 5 hits in a given minute?
i.e. P(X = 5) =
e−1.81.85
5!
= .026
That is 3%
21 / 27
124. Hypergeometric experiment
The hypergeometric distribution is a probability distribution used to
describe the outcomes produced with a hypergeometric experiment
Here a sample of size n is randomly selected without replacement
from a population of N items, which means that once a particular
outcome has been selected it cannot be picked again
ª this contrasts to the binomial experiment where the probability of x successes in
the trials is with replacement
– Sampling with replacement: it is possible to select the same item
again, and the size of the population remains the same
ª e.g. tossing a coin
– Sampling without replacement: it is not possible to select the
same item again, thus the size of the remaining population
changes as we remove each item
ª e.g. picking a black ball from an urn containing black and white balls
22 / 27
125. Hypergeometric random variable
In a given population size N, k items are classified as successes and
N − k items are categorised as failures
A hypergeometric random variable X corresponds to the number of
successes in a sample size n, and it can take one of 0, 1, 2, . . . , n values
The probability of x = 0, 1, 2, . . . , n successes is described by the p.m.f.
of the hypergeometric distribution of X as:
P(x) =
(k
x)(N−k
n−x)
(N
n)
this is the proportion of • the number of samples of n items that contain
exactly x successes chosen from k and n − x failures chosen from (N − k)
• to the number of possible samples that can be drawn from the population
23 / 27
127. Expectations of the hypergeometric distribution
• E(X) = µ = n k
N
• V(X) = σ2 = n k
N 1 − k
N
N−n
N−1
• SD(X) = σ = V(X)
25 / 27
128. Example
Hypergeometric distribution
A graduate statistics course has 7 male and 3 female students.
The teacher wants to select 2 students at random to help her
conduct a research project.
• What is the probability that the two students chosen are female?
(solved)
• What is the probability that the one student chosen is female?
i.e. P(X = 1) =
(3
1)(10−3
2−1 )
(10
2 )
= .4666667
that is 21
90
+ 21
90
= 47% (c.f. fig.6.2, pp193)
26 / 27
129. Summary
Discrete probability distributions
The binomial distribution measures the probability of the
number of successes over a specific number of trials with
replacement
The Poisson distribution measures the probability of a
number of events occurring within a given time interval
The hypergeometric distribution measures the probability of a
specified number of successes over a specific number of
trials without replacement from a finite population
27 / 27
130. BUSINESS STATISTICS I
Lecture – Week 39 (40)
Antonio Rivero Ostoic
School of Business and Social Sciences
September
AARHUS
UNIVERSITYAU
131. Today’s Agenda
1. Review on Distributions
2. Continuous Random Variables
3. Uniform Distribution
4. Normal Distribution
2 / 31
132. Review on Distributions
Recall that probability distributions serve to describe random
variables
From a given probability distribution, there are two important
pieces of information that we can obtain:
– what are the values that the variable takes
– how often the variable takes these values
With the depiction of such information with graphical methods
then we get a shape that characterizes either the sampling or
the population representing the random variable
3 / 31
133. Modality of a distribution
An important characteristic of a probability distribution is its modality
The data depicted graphically by a bell shaped curve is an example
of unimodal distribution
ª this is because there is a single peak that represents the mode
Hence a data set which has two equally common modes produces
a bimodal distribution with two peaks
Also, a multimodal distribution is a distribution of scores having
more than two modes
4 / 31
134. Skewness and Kurtosis
Other important measures of the shape of a probability
distribution are:
1) Skewness that measures the degree of asymmetry of a
distribution
ª each type of probability distribution has its own formula to calculate
the skewness of the shape distribution, and perfectly symmetric
distributions have zero skewness
2) Kurtosis that measures the degree of ‘peakedness’ of the
distribution
ª as with the skewness, each distribution has a formula to calculate
the kurtosis
5 / 31
135. Continuous (and Discrete) Data
Recall that quantitative variables can be discrete or continuous
Continuous data is uncountable in the sense that has continuum
possible values in a range
ª this is opposed to the discrete data that can take relatively few different values
Examples of continuous data are time, the height of a person, etc.
However continuous variables must be rounded when measuring
and we usually think of them as discrete
ª we say e.g. that an individual is 20 years old and not between 20 and 21
continuous data has always an interval scale, whereas discrete
data can take any scale
6 / 31
136. Continuous Random Variable
A continuous random variable serves to represent continuous
data, and it takes an infinite number of possible values
Probability distributions treat very different the discrete and the
continuous random variables
Since there are theoretically an infinite number of values in a
continuous random variable then
ª it is not possible to list all possible values, and
ª the probability of each individual value is practically 0
Thus probability distributions for continuous random variables
consider just the range of the values
Then we estimate the probability that a randomly selected
outcome falls within a determinate range, which is the interval
7 / 31
137. Approximation Function
for continuous random variables
Recall that in the case of discrete distributions a probability mass
function was used to approximate the probability distribution for P(x)
In the case of continuous random variables the probability
distribution is characterized by a curve that is determined by a
function as well
However such approximation is made by a probability density
function (p.d.f.) or just density that is represented as f(x)
The conditions for a probability density function with a range
a x b is that f(x) 0 for all x, and the total area under the curve
between a and b is 1
ª a and b represent the most extreme values of the data
8 / 31
138. Approximation Function II
In order to calculate the probability of any interval in a
probability continuous distribution, we need to find the area
under the curve
In such case the integral of the density of the variable over
the range provides the probability of the random variable
falling within this particular range
ª with the use of integral calculus
9 / 31
139. Continuous Uniform distribution
A distribution that has constant probability is found in the
uniform distribution
ª actually there is both a discrete and a continuous version of this distribution
Another name for the uniform distribution is multimodal
Examples of the uniform distribution are:
• the amount of milk distributed daily in a given town
• the amount of electricity that a soft drink cooler machine
consumes per month
• (...)
10 / 31
140. Probability density function: Uniform distribution
The probability density function of the uniform distribution for
a x b is
f(x) =
1
b − a
and f(x) = 0 iff x a or x b
Besides, the probability that a continuous random variable that is
uniformly distributed equals any individual value is 0
The uniform distribution is depicted by a rectangle with height
f(x) and for P(x1 X x2) the base is x2 − x1
i.e. P(x1 X x2) = (x2 − x1) ×
1
b − a
11 / 31
143. Example
Uniform distribution
A vending machine consumes per year between 420 kWh and 500 kWh.
• What is the probability that the vending machine consumes at
least 480 kWh?
P(X 480) = (500 − 480) × ( 1
80
) = 0.25
• What is the probability that the vending machine consumes at
most 480 kWh?
P(X 480) = 1 − P(X 480)
• What about the probability that the vending machine consumes
precisely 500 kWh?
P(X = 500) = 0
14 / 31
144. Normal distribution
The most important distribution in statistics is the normal distribution
ª which is symmetric and it has a bell shaped curve
Its importance is partly because it approximates well the distributions
of many types of variables
ª in such cases the sampled data tend to be approximately bell-shaped
The properties of the normal distribution play a crucial role in
statistical inference
ª that is even when the sample data are not bell-shaped
Other names for the normal distribution are the Gaussian
distribution, the Z distribution...
15 / 31
145. Normal distribution II
Each normal distribution has two parameters, the mean µ and the
standard deviation σ, and the exact form of the distribution depends
on the values of these parameters
ª we know from the empirical rule that most of the scores will fall within e.g. three
standard deviations of the mean
A special case of the normal distribution is the standard normal
distribution Z that is a normal distribution with mean µ = 0 and
standard deviation σ = 1
Examples of the normal distribution are:
• heights of people
• errors in measurements
• (...)
16 / 31
146. Normal Density Function
The normal distribution is the probability distribution for a
normal random variable
X ∼ N(µ, σ2
)
The probability density function of a normal random variable
−∞ x ∞ is:
f(x) =
1
σ
√
2π
e−1
2 (x−µ
σ )
2
where e ≈ 2.7183 (Euler’s number), and π ≈ 3.1416 (Pi)
And Z ∼ N(0, 1)
17 / 31
148. Normal distributions
different means, same σ
3 6
x
f(x)
the shape remains the same when only the mean changes its value
ª increasing [decreasing] mean shifts the curve to the right [left]
19 / 31
150. Computing normal probabilities
Like the uniform distribution, the probability of a normal random
variable falls into an interval, and we must calculate the area of the
interval under the curve
However since the shape of the normal distribution is not a
rectangle anymore, in this case the function is more complicated
Then we use a probability table to calculate the probability of a
normal random variable
ª as we did with binomial and Poisson probabilities
Fortunately we just need one table of probabilities by standardizing
the random variable
21 / 31
151. Standard normal random variable
A standard normal random variable Z equals to the difference
between the score and the mean in X divided by its standard
deviation:
Z =
X − µ
σ
A positive standardized score (or z-score) indicates a datum above
the mean, and a negative standardized score indicates a datum
below the mean
A ‘Z transformation’ means that the probabilities for X are now
translated into statements for Z
We use the cumulative standardized normal probabilities for
P(Z z), which indicates the relative frequency of z-scores
ª Keller’s book has a table for −3.09 z 3.09 (others by approximation)
22 / 31
158. Expectations of the normal distribution
Normal distribution
• E(X) = µ
• V(X) = σ2
• SD(X) = σ
Standard normal distribution
• E(Z) = 0
• V(Z) = 1
• SD(Z) = 1
29 / 31
160. Summary
Probability distributions representing random variables have their
own modality, and measures of skewness and kurtosis
Continuous random variables represent uncountable data that is
infinite in theory, and the data is treated with different intervals
Uniform distributions have a constant probability and it is
represented by a rectangle where the product of the density is the
height and the difference between extreme values is the base
The normal distribution has a bell shaped curve and it plays a
central role in statistics because it properties
By standardizing the normal random variables is possible to obtain
cumulated normal probabilities for relative frequency scores
31 / 31
161. BUSINESS STATISTICS I
Lecture – Week 40a (41)
Antonio Rivero Ostoic
School of Business and Social Sciences
September
AARHUS
UNIVERSITYAU
162. Today’s Agenda
Continuous Random Variable Distributions
and Exponential distribution
1. Student t Distribution
2. Chi-Squared Distribution
3. F Distribution
2 / 32
163. Why other continuous distributions?
and not just the normal distribution
Despite many nice properties of the normal distribution, a major
concern is the derivation of the p.d.f. for this distribution
ª we need to count with the values of two parameters (µ and σ2
)
However, many times we do not have a variability parameter for
the population
Even more important we need to have the appropriate statistics
for small samples
There are other distributions that represent better than N small
samples, asymmetric data, or data with many outliers
ª even more some distributions only need a single parameter
3 / 32
164. Example: Exponential distribution
For instance, the exponential distribution requires one parameter
whose reciprocal transformation equals both the mean and the
standard deviation of the random variable
ª i.e. µ = σ = 1/λ
Thus the distribution is completely specified by a known parameter
Its probability density function for x 0 and a parameter λ 0 is:
f(x) = λe−λx
e ≈ 2.7183
The associated probabilities are:
• P(X x) = e−λx
• P(X x) = 1 − e−λx
• P(x1 X x2) = e−λx1 − e−λx2
4 / 32
166. Student t Distribution
The t distribution or Student t distribution is a distribution
commonly in statistical inference
ª ‘Student’ was the pseudonym of W.S. Gosset who derived this distribution,
and he used the letter t to represent the random variable
The Student t distribution represent the distribution of t values
that varies according to the sample size
ª the larger sample is then resembles more to a Z or normal distribution
ª the smaller the sample is, the flatter the distribution becomes
The t distribution depends on a single parameter called the
degrees of freedom that is represented by ν or sometimes by df
ª and the exact shape of a distribution is determined by this parameter
degrees of freedom – loosely speaking – implies values that are
‘free to vary’ or ‘not fixed by any parameter or scores’
6 / 32
167. t density function
The probability density function of the t distribution is:
f(t) =
Γ ν+1
2
Γ(ν
2 )
√
νπ
1 +
t2
ν
−ν+1
2
with a parameter ν 0, which is for the degrees of freedom;
π ≈ 3.1416, and the gamma function Γ
hence T ∼ tν has a t-distribution with ν degrees of freedom
7 / 32
169. t and Z distributions
0
Z
t
While N is bell shaped, the t-distribution is mound shaped
9 / 32
170. t distribution, different degrees of freedom
0
t
ν = 2
ν = 5
ν = 50
The shape resembles more the normal distribution with larger ν values
10 / 32
171. Student t random variable
As with the other random variables we have seen so far, we can
produce values for a Student t random variable through a
statistical experiment
ª then we can calculate the probability for such variable
The expected value and variance for a Student t random
variable with ν degrees of freedom are:
E(t) = 0
V(t) =
ν
ν−2 for ν 2
11 / 32
173. Computing t probabilities
Computing probabilities implies calculating areas now under the
t distribution curve, and to achieve this we use probability tables
The probability table for the t distribution gives the probabilities
of exceeding critical values that are determined by different ν
ª in Keller’s book the table is given for some degrees of freedom from 1 to 200
and ∞ (otherwise use approximation)
Only the right tail probability is given but, since the t distribution
curve is symmetric around 0, then the left table equals to −tA,ν
Notice as well that the different t values approximates z scores
when ν approximates to ∞
13 / 32
174. Applications of the t distribution
We can use the t distribution to test and obtain confidence
intervals of the mean in a normally distributed population
when the variance is unknown
ª (cf. lecture week 46)
Also we can compare two expected values for normally
distributed populations with unknown variances
ª (cf. lectures week 49 and 50)
We can perform tests and confidence intervals in correlation
and regression analyses
14 / 32
175. Chi-squared distribution
The Chi-squared or χ2 distribution is another type of distribution
commonly used in hypothesis testing
ª as with the t distribution, a χ2
distribution depends on a single parameter,
which is the number of degrees of freedom that shapes the distribution
The χ2 distribution is the sum of squares of ν independent
standard normal random variables
i.e., for Z1, Z2, . . . , Zν where ν 0, and each Zi ∼ N(0, 1) and
are independent to each other, then
Z2
1 + Z2
2 + · · · + Z2
ν ∼ χ2
(ν)
thus there are ν variables that represent the number of degrees of
freedom we can choose independently of ‘freely’
15 / 32
176. Chi-squared density function
The χ2 density function is:
f(χ2
) =
1
Γ ν
2 2
ν
2
(χ2
)
ν
2
−1
e−χ2
2
for χ2
0 with ν 0, e ≈ 2.7183, and the gamma function
16 / 32
179. Chi-squared distribution shape
In this case – unlike the t distribution – the values of the
random variable in the χ2 are not positioned around 0, but
they are rather concentrated on positive values
The values of the random variable for a particular sample
range from 0 to ∞
Although with a large number of degrees of freedom the
shape of the chi-squared distribution resembles the normal
distribution, it is nevertheless continually skewed to the right
In this case the mean is greater than the median, which
means that is positively skewed
19 / 32
180. Chi-squared random variable
A χ2 random variable is produced by a statistical experiment
The expected value and variance for a chi-squared random
variable with ν degrees of freedom are:
E(χ2) = ν
V(χ2) = 2ν
20 / 32
181. Computing χ2
probabilities
To calculate probability values for a χ2 random variable implies
(again) computing areas under the shape curve
χ2
A,ν represents the right area under the chi-squared curve
ª i.e. the right tail
However, since the shape of the distribution is not symmetric,
we cannot apply for the left tail anymore −χ2
A,ν
This means that if we compute A, then χ2
1−A,ν represents the
point such as the left area is A
Critical values of the chi-squared distribution for different
probabilities and df are recorded in the χ2 probability tables
ª for ν 100 can be approximated by N with µ = ν and σ =
√
2ν
21 / 32
183. Applications of the χ2
distribution
The chi-squared distribution allow us to test and compute
confidence interval estimators of the variance for a random
variable that is normally distributed
ª with a sufficiently large sample
Goodness-of-fit tests
Homogeneity and independence tests
23 / 32
184. F distribution
The F distribution is another continuous probability distribution
commonly used in statistical inference
ª ‘F’ stands for R.A. Fisher who described this distribution
In this case the shape of the distribution is determined by two
parameters, which are the degrees of freedom
That is because the F distribution arises as the proportion of two
independent chi-squared variables with their respective df
That is:
χ2
ν1
ν1
χ2
ν2
ν2
∼ F(ν1,ν2)
24 / 32
185. F density function
The probability density function for the F distribution is:
f(F) =
Γ ν1+ν2
2
Γ ν1
2 Γ ν2
2
ν1
ν2
ν1
2 F
ν1−2
2
1 + ν1F
ν2
ν1+ν2
2
for F 0, and where ν1 and ν2 are called respectively as the
numerator and denominator degrees of freedom
25 / 32
186. F distribution plot
0
F
f(F )
As with the chi-squared distribution, the shape of the F distribution
is asymmetric and positive skewed
26 / 32
187. F distribution, different degrees of freedom
0
F
f(F )
ν1 = 1, ν2 = 1
ν1 = 3, ν2 = 3
ν1 = 9, ν2 = 9
ν1 = 25, ν2 = 25
27 / 32
188. F random variable
The F random variable generates its values through a
statistical experiment as well
The expected value and variance for the F random variable
are:
E(F) = ν2
ν2−2 for ν2 2
V(F) =
2ν2
2 (ν1+ν2−2)
ν1(ν2−4)(ν2−2)2 for ν2 4
thus the mean parameter depends on the denominator degrees of
freedom only, and it approximates to 1 with a large value of ν2
28 / 32
189. Computing F probabilities
We can calculate the areas under the distribution shape
corresponding to probability values of the F distribution
In this case we also have an asymmetric distribution, which
means that areas in the two tails are FA,ν1,ν2
and F1−A,ν1,ν2
The following relation exists between these two critical regions:
F1−A,ν1,ν2
=
1
FA,ν2,ν1
And we use a different probability table for each value of A with
different numerator and denominator degrees of freedom
29 / 32
191. Applications of the F distribution
We can compare two variances from populations that
are normally distributed with the F distribution and
related statistics
Analysis of variance, which is used to compare the
means of two or more populations, is based on the F
distribution and statistics
31 / 32
192. Summary
We have seen various continuous distributions that are
important for inferential statistics and for small samples
The t distribution is symmetric, around zero, and it depends
on a single parameter that is its degree of freedom
The χ2 distribution is the sum of independent Z random
variables, and it is positively skewed
The F distribution has asymmetric shape and it depends on
the numerator and the denominator degrees of freedom
All these distributions are related to their respective statistics,
which have applications in statistical inference
ª Remember Ex. 33, questions 2, 6, 7, 8, 10, 12, and Ex. 2 from Re-exam2013
32 / 32
193. BUSINESS STATISTICS I
Lecture – Week 40b (42)
Antonio Rivero Ostoic
School of Business and Social Sciences
October
AARHUS
UNIVERSITYAU
194. Today’s Agenda
1. Sampling Distributions:
– of the Mean
– of a Proportion
– of the Difference between Two Means
2 / 32
195. Sampling Distributions
We used probability distributions to summarize probabilities of
possible outcomes for a random variable
However, using sample data from a population we estimate
characteristics of the distributions expressed in parameters
A sampling distribution is a probability distribution that determines
probabilities of the possible values of a sample statistic
ª it is obtained either by taking repeated random samples of a particular size
from a population, or else by relying on the associated probability rules
Each sample statistic has a sampling distribution
ª so there is a sampling distribution of a sample mean, a sampling distribution
of a sample proportion, a sampling distribution of a sample median, etc.
3 / 32
196. Sampling Distributions II
Unlike the distributions we have seen so far, a sampling
distribution refers to the values of a statistic computed from
observations in sample after sample
Sampling distributions play a fundamental role in statistical
inference because it allows us to measure how close a sample
statistic is to the population parameter
In other words, the sampling distribution determines the
probability that the statistic falls within any given distance of
the parameter it estimates
4 / 32
197. Distribution of the Sample Mean
probability rules
Recall that the sample space of a random trial is the set of all
possible outcomes
E.g. sample space for two dice thrown
S =
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
5 / 32
198. Distribution of the Sample Mean II
probability rules
Means of samples of size 2:
x =
1 1.5 2 2.5 3 3.5
1.5 2 2.5 3 3.5 4
2 2.5 3 3.5 4 4.5
2.5 3 3.5 4 4.5 5
3 3.5 4 4.5 5 5.5
3.5 4 4.5 5 5.5 6
6 / 32
199. Expectations of X
When using the probability approach we depend on the
laws of expected value and variance for the population
parameters of X:
µ = xP(x)
σ2 = (x − µ)2P(x)
σ =
√
σ2
7 / 32
200. Sampling Random Variables
In a sampling distribution, X constitutes a new random variable
created by sampling, and x is a statistic corresponding to the
sample mean
Even though each sample may have an equal probability, some
samples will have identical value of x
Thus we can draw the probabilities of the different values of x
that corresponds to sampling distribution of the sample mean
8 / 32
201. Expectations of X
The expected values and variance of the sampling distribution
of X are:
µx = xP(x)
σ2
x = (x − µx)2P(x)
σx = σ2
x
thus for n = 2 we have µx = µ, whereas σ2
x = σ2
/2
9 / 32
202. Distribution of X and X
1 2 3 4 5 6
0.14
0.16
0.18
x
p(x)
Distributions for the score
from rolling a single die
1 2 3 4 5 6
0.05
0.10
0.15
x
p(x)
Distributions for the mean
of two dice scores
The distribution of X is different from the distribution of X, even though
these variables are related
10 / 32
203. Sampling distributions of X
different sizes
It is possible to obtain sampling distributions of X for different
sample sizes and the sample statistic for the mean equals the
population parameter
µx = µ
However in the case of the variance of the sampling
distribution, this parameter equals to the proportion of the
population variance to the sample mean
σ2
x =
σ2
n
11 / 32
204. Standard Error of the Mean
The standard deviation of a sampling distribution is called the
standard error of the mean, and for infinitely large populations is
defined as:
σx =
σ
√
n
When the size of the population is finite and known there is a finite
population correction factor added to the expression, and the
standard error becomes:
σx =
σ
√
n
N − n
N − 1
However, such factor is close to 1 when N is large relative to n (like
20 times larger)
ª thus it can be omitted
12 / 32
206. Central Limit Theorem
We observe that as the sample size gets larger, the sampling
distribution of X becomes narrower
ª this is because the values are more concentrated around the mean
Even more significant, the larger the sample size is the more
the distribution is similar to a bell shaped distribtuion
The Central Limit Theorem states:
For a sufficiently large sample size, the sample distribution
of the mean of a random variable drawn from any population
is approximately normal
Thus the larger the sample size, the more closely the sampling
distribution of X resembles N
14 / 32
207. Central Limit Theorem II
The approximation given in the central limit theorem applies
also if the population is nonnormal distributed with a sufficiently
large sample size
ª for nonnormal populations a ‘sufficiently large’ sample is n 30
ª for highly skewed populations we need moderately large sample size
Because of the central limit theorem we benefit from the
properties of the standard normal distribution in order to
compute sample probabilities
In this case we use with Z the standard error of the mean:
Z =
X − µ
σ/
√
n
15 / 32
208. Example
Using X
A company’s vending machines consume on average 460 kwh of
electricity with a standard deviation of 5 kwh
• What is the probability that a vending machine in a given
location consumes less than 470 kwh?
P(X 470) = P X−µ
σ
470−460
5
= P(Z 2) = 0.9772 = 98%
• ...and the probability for using more than 470 kwh?
P(X 470) = P X−µ
σ
470−460
5
= P(Z 2) = 1 − P(Z 2)
= 1 − .9772 = 0.0228 = 2%
16 / 32
209. Example II
Using X
• What is the probability that 3 vending machines consume less
than 465 kwh?
i.e. P(X 465)
Since we assume that X is normally distributed, the standard
error of the mean must consider the sample size
σx = σ√
n
= 5√
3
= 2.89
P(X 465) = P
(X−µx)
σx
465−460
2.89
= P(Z 1.73) = .9582 = 96%
17 / 32
210. P(X 470) and P(X 465)
last two examples
µ = 460 470
x
µ = 460 465
x
18 / 32
211. Inference with the Sampling Distribution
Many times we do not know the values of µ and σ when we want
to calculate probabilities
However we can infer such values through the sampling
distribution, and for instance the value of µ can be deduced on
the basis of the distribution of the sample mean
More specifically, we can obtain a particular probability that the
sample mean fall between two values by using the properties of Z
A general formula for this problem is found in:
P(µ − zα/2
σ√
n
X zα/2
σ√
n
+ µ) = 1 − α
where α is the probability that X does not fall into the interval
19 / 32
212. Inference of the sample mean
Example
A sample distribution with n = 3 tells us that a vending machine
consumes on average 470 kwh with σ = 5 kwh
Then we can compute the 95% probability that the mean is located
in a certain range from the sample mean
Since z.025 = 1.96, then P(−1.96 Z 1.96) = .95
By adding µ and by multiplying by σ/
√
n to all terms in the
probability statement we get:
P(µ − 1.96 σ√
n
X 1.96 σ√
n
+ µ) = .95
P(470 − 1.96 5√
3
X 1.96 5√
3
+ 470) = .95
P(464.3 X 475.7) = .95
ª hence the sample mean will fall between 464.3 and 475.7 with 95%
probability, which means that the computed sample mean is
supported by the sample statistic
20 / 32
213. Sampling Distribution of a Proportion
Recall that the binomial distribution depends on a parameter
p that represents the probability of success in any trial
As with the previous example with the mean, typically the
value of this parameter is unknown and it needs to be
estimated
For that, we conduct a binomial experiment where we count
in n samples the number of successes X, and this random
variable is binomially distributed
21 / 32
214. Sampling Distribution of a Proportion II
The estimator of p is the proportion of the number of successes to
the sample size
ˆP =
X
n
For instance, for a sample size n and a probability of success p,
we can find the probability that X is at most x by using a binomial
probability table as:
P(ˆP
x
n
) = P(X x)
However as we have seen with quantitative variables, there exists
a normal approximation to the binomial distribution from which can
benefit in our calculations
22 / 32
215. Normal approximation to the binomial
Any X ∼ B(n, 0.5) is symmetric distributed and it produces a
bell shape curve by smoothing the ends of the rectangles
To calculate probabilities of X using the normal distribution
requires find the area under the normal curve by applying a
continuity correction factor that equals .5 to each interval x
• Hence P(X = x) ≈ P(x − .5 Y x + .5) where Y is a random
normal variable approximating X
• Also P(X x) ≈ P(Y x + .5) and P(X x) ≈ P(Y x − .5)
In the case of a range of values of X we can omit the correction
factor, but the accuracy of the estimation is decreased
23 / 32
216. Binomial distribution and normal approximation
n = 100, p = .5
x
P(x)
30 40 50 60 70
0
.02
.04
.06
.08
24 / 32
217. Sampling distribution of a sample proportion
The expectations of ˆP assumes that the population is normally
distributed
E(ˆP) = p
V(ˆP) = σ2
ˆp =
p(1−p)
n
σˆp =
p(1−p)
n
The standard deviation of ˆP is known as the standard error of
the proportion
Here we omitted the finite population correction factor (cf. definition
of σx) assuming that the sample size is relatively large
25 / 32
218. Example
finding ˆP
Last year 30% of the schools in town have installed our vending
machine cooler, and we want to see whether or not a proportion of
schools will continue using our machine in the next year
• If we make a random sample of 25 schools, what is the probability
that more than 35% of the sample schools will choose our machine?
Since we have just a success or failure we have a binomial
experiment with p = .30 with n = 25
We want to find P(ˆP .35)
σˆp = p(1 − p)/n = (.30)(.70)/25 = .0917
P(ˆP .35) =
ˆP−p
√
p(1−p)/n
.35−.30
.0917
= P(Z .545) = 1 − P(Z .545) ≈ 1 − .705 = .295 = 30%
26 / 32
219. Difference between Two Means
Sampling distribution of
We can use a sampling distribution to calculate the
difference between two means
The assumptions are that the random samples are
independent from each other and they represent
populations normally distributed
Since both populations have a normal distribution, the
difference between two sample means is also normally
distributed
27 / 32
220. Expectations of X1 − X2
The expectations of the sampling distribution of the difference
between two sample means are:
µx1−x2
= µ1 − µ2
and the variance is:
σ2
x1−x2
=
σ2
1
n1
+
σ2
2
n2
The standard error of the difference between two means is:
σx1−x2
=
σ2
1
n1
+
σ2
2
n2
for nonnormal populations we need sufficiently large sample sizes ( 30)
28 / 32
221. Example X1 − X2
Our company’s vending machines electricity consumption is normally
distributed with mean of 460 kwh and standard deviation of 5 kwh.
A rival company produces vending machine coolers with normally
distributed consumption of electricity with 455 kwh on average and
10 kwh as standard deviation.
• What is the probability that the average of electricity consumption
of our company’s machines exceed the rival machines if we take
random samples of size 30 and 10 respectively?
i.e. P(X1 − X2 0) with µ1 − µ2 = 460 − 455 = 5 and
σx1−x2
=
σ2
1
n1
+
σ2
2
n2
= 52
30
+ 102
10
=
√
10.833 = 3.29
P(X1 − X2 0) = P
(X1−X2)−(µ1−µ2)
σ2
1
n1
+
σ2
2
n2
0−5
3.29
= P(Z −1.52) = 1 − P(Z −1.52) = 1 − .0643 = .9357
29 / 32
222. Sampling distribution and Inference
Sampling distributions – rather probability distributions –
are commonly used for statistical inference
While a probability distribution refers to individual
observations, a sampling distribution refers to the values of
a statistic computed from those observations
Statistics computed through sampling distributions allows
us making inferences about the population parameters that
usually are unknown
30 / 32
223. Sampling distribution and Inference II
Population
Parameters
Individual
Probability distribution
Population
Parameters
Statistic
Sampling distribution
Statistic Parameter
Sampling distribution
31 / 32
224. Summary
Sampling distributions determines the probabilities for
sample statistics
Mean, proportion, and the difference between two means
are illustrations of sample statistics with their own type of
sampling distributions
We can make inferences about population parameters
though sample statistics computed by sampling distributions
32 / 32