SlideShare a Scribd company logo
1 of 224
Download to read offline
BUSINESS STATISTICS I
Lectures Part 1 — Weeks 36 – 42
Antonio Rivero Ostoic
School of Business and Social Sciences
 September −  October 
AARHUS
UNIVERSITYAU
BUSINESS STATISTICS I
Lecture – Week 36
Antonio Rivero Ostoic
School of Business and Social Sciences
 September 
AARHUS
UNIVERSITYAU
Today’s Agenda
Fundamental Concepts in Statistics
Types of Data
Computing and Representing Data Types
 and practical information about the course...
2 / 34
Definition of statistics
“Statistics is a way to get information from data” (Keller, p. 1)
ª Data is a set of observations, whereas information is the message
However, statistics can also be viewed as methods for
collecting and analysing data
ª in this case the research design is a statistical procedure as well
 Statistics draw conclusion from numbers...
3 / 34
Types of statistics
Statistical analysis is classified as descriptive or inferential
Descriptive
The goal is to describe the data, i.e. organize, summarize, present...
Instead for listing all observations, we summarize the data through
numerical techniques or represent it through a graphical picture
Descriptive statistics provides a typical mark of the data that is more
meaningful than the complete listing
We can also find patterns of the data through as an explorative
phase of the analysis
4 / 34
Types of statistics II
Inferential
In this case the goal is to make conclusions from the data
Inferential statistics allows making generalizations from samples
to the population values
Statistical inferences are made through different kinds of test
ª like hypothesis testing or tests of significance, tests of reliability, etc.
We can also make predictions based on the data
5 / 34
Key statistical concepts
Population
The final goal in statistics is to learn about populations
Population constitutes the total set of subjects of interest in the
study
However in statistics, population —rather than be a particular
group of individuals or cases— refers to a variable
ª e.g. the teenagers downloading an app onto their mobile devices
6 / 34
Key statistical concepts II
Sample
Inferences about populations are based on sample data
Samples are selected individuals or cases of the population on
which the study collects data
Through samples is possible to study populations in a practical
manner
7 / 34
Key statistical concepts III
Descriptive measures
• For populations are parameters
• For samples are statistics
 We use statistics to make inferences about parameters
8 / 34
Population and sample
Population
Sample
9 / 34
Population and sample II
Population
Sample
SamplingInference
10 / 34
Types of data
• Both populations and samples are described in
terms of variables
• A variable is a characteristic that consist of two or
more observed values, which constitute the data
11 / 34
Types of data II
A data type is classified according to the kind of variable or to the scale
Variables:
Quantitative
– continuous
– discrete
Qualitative or categorical
– discrete
Scales:
• Nominal
• Ordinal
• Interval ( Ratio)
 a score is the numerical value which indicates the quantity of a variable
12 / 34
Data scales
Types of variables are measured according to different scales
Nominal
labels used for categorical variables
do not represent degree of difference
cannot be ranked
it is just possible to calculate the frequency of occurrences and
compare these measures
usually the responses are recorded using codes
ª e.g. Gender, Nationality
13 / 34
Data scales II
Interval (and Ratio)
used for quantitative variables
there is order and the adjacent intervals between the points of
the scale are of equal extent
there is a degree of difference (Interval without a ratio)
the measure has an arbitrary zero point (Interval), and an
absolute zero point (Ratio)
it is possible to calculate measures of location and dispersion
ª e.g. Temperature ◦
C (Interval), Age (Ratio)
14 / 34
Data scales III
Ordinal
for qualitative variables where the order of the values is significant
there is a degree of difference with ranks
typically measures of non-numeric concepts
it is possible to calculate measures of location
ª e.g. Degree of satisfaction, TRUE/FALSE (?)
 It is important to identify the scale and type of data to produce
because it determines which statistical procedure we are going to use
15 / 34
Hierarchy in treatment of data types
Interval  Ratio
Ordinal
Nominal
16 / 34
Computing and representing data types
Nominal data
Recall that with nominal data we can only count the frequency of
the different categories, which is typically given in a frequency
distribution table
The percentage of the counts represents a relative frequency
Since the variable is qualitative, we can code the responses with
numbers
Single sets of data or one nominal variable are called univariate
17 / 34
Frequency table (Example of interval data treated as nominal)
COUNTRY % 2011
--------------------
1 Belgium 27.3
2 Bulgaria 23.2
3 Czech Rep. 25.0
4 Denmark 30.7
5 Germany 26.1
6 Estonia 25.5
7 Spain 27.5
8 France 26.9
9 Croatia 25.7
10 Italy 24.0
11 Cyprus 40.3
12 Latvia 26.5
13 Lithuania 24.3
14 Malta 43.5
15 Netherlands 26.3
16 Austria 29.2
17 Poland 28.4
18 Romania 17.5
19 Slovenia 32.3
20 Slovakia 22.5
21 Finland 26.6
22 Sweden 27.3
23 Gr. Britain 29.5
24 Iceland 26.1
25 Norway 22.3
26 Turkey 19.1
27 United States 30.2
28 Japan 31.3
Source: Eurostat
18 / 34
Bar chart (Example of interval data treated as nominal)
Graphical methods
Belgium Bulgaria Czech Rep. Denmark Germany Estonia
Expenditure on education per capita
(year 2011)
COUNTRY
%GDP
051015202530
19 / 34
Pie chart (Example of interval data treated as nominal)
Graphical methods
Belgium 27.3 %
Bulgaria 23.2 %
Czech Rep. 25 %
Denmark 30.7 %
Germany 26.1 %
Estonia 25.5 %
Expenditure on education per capita
(year 2011)
20 / 34
Computing and representing data types II
Ordinal data
Ordinal data should be treated as nominal
Frequency tables and charts are also used, but arranged in
ascending or descending ordinal values
ª bar charts with descending frequencies are also known as Pareto plots
21 / 34
Pareto plot (Example of interval data treated as nominal)
Graphical methods
Denmark Belgium Germany Estonia Czech Rep. Bulgaria
Expenditure on education per capita
(year 2011)
COUNTRY
%GDP
051015202530
22 / 34
Bivariate nominal data
With bivariate nominal data there are either two variables or one
pair of data sets in the analysis
The relationship between two nominal variables are represented
by a cross-tabulation table
ª another term used is contingency table
Graphically the bar charts are represented with two dimensions
Tables and graphics with more than two dimensions are difficult
to interpret
23 / 34
Frequency table with two data sets
Expenditure on education
as a percentage of GDP per capita
YEAR
=================
COUNTRY 2001 2011
-----------------------------
1 Belgium 25.6 27.3
2 Bulgaria 22.5 23.2
3 Czech Rep. 19.3 25.0
4 Denmark 28.9 30.7
5 Germany 25.3 26.1
6 Estonia NA 25.5
Source: Eurostat
24 / 34
Bivariate bar chart (Example of interval data treated as nominal)
Dodge
25.6
22.5
19.3
28.9
25.3
27.3
23.2
25
30.7
26.1
25.5
0
10
20
30
2001 2011
Year
%GDP
COUNTRY
Belgium
Bulgaria
Czech Rep.
Denmark
Estonia
Germany
Expenditure on education per capita
25 / 34
Computing and representing data types III
Interval data
Interval scale is used to represent quantitative information
As with nominal data, we can still use frequency distribution tables
for interval data, but categorizing the information in series of
intervals called classes
Classes of intervals do not overlap, and they cover the complete
range of information
The number of class intervals to choose is a function of the amount
of observations
Widths for the class intervals (called bins in the bar chart) is given by
largest obs. − smallest obs.
n
26 / 34
Histogram
Graphical methods
Expenditure on education per capita
(28 countries in Europe, USA, Japan)
% GDP, year 2011
Frequency
15 20 25 30 35 40 45
02468101214
27 / 34
Types of histograms
Histograms can be either symmetric or skewed
A special type of symmetric histogram is bell-shaped
A skewed histogram to the right is positively skewed, and to the left
is negatively skewed
Histograms with a single peak are called unimodal, and with two
peaks are bimodal
28 / 34
Other graphical techniques for interval data
• Stem-and-Leaf Display
ª to represent small samples of scores
• Ogive
ª for (cumulative) relative frequency distribution
• Line chart
ª for time-series data
• Pictograms...
29 / 34
Bivariate interval data
This is the relationship between two interval variables that is
very common in the statistical analysis
Graphically such relation is represented by a scatter diagram or
scatterplot
Scatter diagrams reveal forms of association among the
observations
Associations can be linearly related (weak or strong) with a
positive or a negative direction, or else a nonlinear relationship
Typically one variable depends on other variables(s)
30 / 34
Scatter diagram
Graphical methods
1 2 3 4 5 6
0
5
10
15
years
Bonus($×.000)
from Keller (xm16 1)
31 / 34
Graphical presentation
Recommendations
A good graphical presentation should be concise, coherent,
and clear to the viewer
Well-designed graphics reveal substance rather than just form,
and illustrate patterns of relations between variables
Include the scales and marks in every axis with a caption that
has a direct link to what the diagram wants to display
Avoid distortion of the data
32 / 34
Applications of statistics
Traditional fields of applications of statistics in social sciences:
• demography
• econometrics
• psychometrics
• ...
 A rapidly growing discipline for organisation and marketing is
business analytics = business + technology
Emphasis on statistical exploration of big data
Meaningful analysis in short time (or on the fly)
Data-driven decision making
33 / 34
Summary
Statistics is usually concerned in studying populations by means
of particular samples
The main concerns of statistics is to describe the sampled data
or to make inferences about the population
There are different kinds of data, which can be classified
according to the type of variable or its scale
Nominal and ordinal data is for a qualitative variable, whereas
interval data is for quantitative information
Each data type has its own graphical techniques where is
possible to combine two or more variables
34 / 34
BUSINESS STATISTICS I
Lecture – Week 37
Antonio Rivero Ostoic
School of Business and Social Sciences
 September 
AARHUS
UNIVERSITYAU
Today’s Agenda
1. Numerical descriptive techniques
2. Data collection and sampling methods
2 / 36
Review
lecture week 36
Recall that our final goal in statistics is to learn about populations
To make inferences on populations, we base the analysis on
sample data
We refer respectively as parameters and statistics the different
measures for populations and samples
3 / 36
Measures of central location
Measures or indices of central location search for describe
the center of the data
The most popular measure of central location is the mean
There are (at least) three types of means:
• arithmetic mean, • geometric mean, • harmonic mean
The arithmetic mean is simply called the ‘mean,’ and
another name for it is the average
• The mean equals to the sum of the scores in a sample
divided by the sample size
4 / 36
Arithmetic mean
To define the (arithmetic) mean, sample observations are
represented as x1, x2, . . . , xn where n is the sample size
Population mean
µ =
N
i=1 xi
N
where means ‘sum of’
Sample mean
¯x =
n
i=1 xi
n
* thus we use Greek letters for parameters and Latin letters for statistics
5 / 36
Median
The median is another popular measure of central location and
it refers to the score in the ‘middle’
• The median calculated by ranking the scores and where
half of them are above it and half of them are below
 for even number of scores, the median may equals to the average
of the two scores in the middle
19, 17, 23, 29, 12
12, 17, 19, 23, 29
6 / 36
Mode
Another measure of central location that – unlike mean and
median – can be applied to qualitative or nominal data is the mode
• The mode is the most frequently occurring score or
category in the data
 however the mode is not the frequency itself but a value
12 electricians, 30 nurses, and 4 clerks
7 / 36
Location measures
Summary
For interval data the mean is the most appropriate
measure of central location
ª i.e. the average
For ordinal data the median is a more suitable statistic
For nominal data the only measure of location we have
seen so far is the mode
ª cf. Keller’s book view on this
8 / 36
Measures of variability
In descriptive statistics we are not only interested in measuring the
central location of the data but also in the spread or dispersion of
the observations
Many times if we want to compare measures of location in different
data sets or variables, we need to know the variability in the data
For establishing the widths of the bins in a bar chart (cf. 26 in week 36)
we have already seen a variability measure
ª here we considered the numerical difference between the extreme values in
class intervals
9 / 36
Range
The difference between the largest and the smallest score in a
data set is known as the range
Range = Largest observation − Smallest observation
The range is the simplest measure of variability since it
considers only two scores
The disadvantage of the range is that do not tell us nothing
about the other scores
ª hence we can have two entirely different data sets with identical range
10 / 36
Variance
Variance is the average of the squared distances from the mean
That is
sum of(each score − mean score)2
number of scores
 Thus the larger the variance is, the more the scores differ on
average from the mean
11 / 36
Variance II
Population variance
σ2
=
N
i=1(xi − µ)2
N
Sample variance
s2
=
n
i=1(xi − x)2
n − 1
12 / 36
Sample variance
Sample variances are computed to estimate population variances
The sample variance is however corrected by the mean, which
implies that the squared differences from the mean is divided by n−1
ª it results that this formula gives a better estimate than without a correction
But why we square the differences from the mean before averaging?
One reason is to avoid the canceling effect, in which the sum of the
positive and the negative deviations equals 0
 The interpretation of the variance is not straightforward since by squaring
the differences from the mean we transform the unit of the data set
13 / 36
Standard deviation
The standard deviation is the square root of the variance
ª ‘deviation’ is the difference between each score and the mean
sum of(each score − mean score)2
number of scores
 By computing the square root of the variance we preserve the
original unit of the data set
14 / 36
Standard deviation II
Population standard deviation
σ =
√
σ2 =
N
i=1(xi − µ)2
N
Sample standard deviation
s =
√
s2 =
n
i=1(xi − x)2
n − 1
15 / 36
The empirical rule
By knowing the mean, the standard deviation, and the type of
distribution we can deduce relevant information of the data
For a normal distribution – with a bell shaped curve – it is
possible to apply the empirical rule
≈ 68%
≈ 95%
≈ 99.7%
µ − 3σ µ − 2σ µ − σ µ µ + σ µ + 2σ µ + 3σ
16 / 36
Dispersion measures
Last issues
Range, variance, and standard deviation are the most common
measures of variability for interval data
Are there measures of variation for ordinal and nominal data?
The proportion of the standard deviation and the mean provides
the coefficient of variation of the set of scores
ª this statistics can guide us about the magnitude of the standard deviation in
relation to the sample size
 A generalization of the empirical rule that applies to all shape
distributions is found in the Chebysheff’s Theorem
17 / 36
Percentiles
Measure of relative standing
In order to provide the position of particular values relative to the
entire data set we count with measures of relative standing
ª applicable to interval and ordinal data
The Pth percentile is the value for which P% of the scores fall
below that value and (100 − P)% fall above it
Actually the median of a data set is a special case of percentile
ª it is the 50th percentile
Besides the median, there are other special cases of percentiles
called quartiles that correspond to the 25th and 75th percentiles
ª called lower and upper quartile respectively
18 / 36
Percentiles II
Location of a Percentile
LP = (n + 1)
P
100
Lower quartile, median, and upper quartile are labelled
respectively as Q1, Q2, and Q3
The interquartile range is the difference between the upper and
lower quartile, i.e. Q3 − Q1
19 / 36
Box Plots and Outliers
• A box plot is a graphical method used to represent
quartiles together with the most extreme observations
ª Box plots are useful in comparing different data sets
• Outliers are unusually large or small observations
ª ...and we suspect their validity
20 / 36
Box Plot without outliers
Example
0 5 10 15 20 25 30
4.21
4.19
score
exercise
21 / 36
Measures of Linear Relationship
Between variables
We have seen that it is possible to relate two quantitative
variables in a statistical analysis
The scatter diagram gives us an idea of the strength and
direction of the linear relationship between two variables
However the graphical information is quite loose and we can
obtain more precise information of the linear relationship with
different numerical measures
22 / 36
Covariance
As the name suggests, covariance is the variance that is
shared between two variables, X and Y
Population covariance
σxy =
N
i=1(xi − µx)(yi − µy)
N
Sample covariance
sxy =
n
i=1(xi − x)(yi − y)
n − 1
23 / 36
Correlation
The proportion of the covariance and the product of the standard
deviations results in the coefficient of correlation or just correlation
ª that is another measure of linear relationship
Population correlation
ρ =
σxy
σxσy
Sample correlation
r =
sxy
sxsy
where −1 { r, ρ } +1
 If the statistic equals 0 then there is no linear relationship, whereas the
extreme values denote perfect positive, and negative relationship
ª otherwise the linear relationship is considered just as ‘weak’
24 / 36
Coefficient of Determination
If we square of the correlation between two quantitative variables
we obtain another measure of linear association called the
coefficient of determination, r2
The coefficient of determination measures the amount of
variation in Y that is explained by the variation in X
ª thus a clearer indication of the relationship than the correlation coefficient
Explained variation is important in statistics and it is at the core of
regression analysis
25 / 36
Method of Least Squares
A linear relationship can be visualized in a scatter diagram
by drawing a straight line through the scores
We need to estimate the line that represents ‘best’ the
sample data points
An objective method of producing a straight line is the
least squares method
ª which minimizes the sum of squared deviations between the scores
and the line
26 / 36
Method of Least Squares II
In a least squares line:
ˆy = b0 + b1x
 ˆy represents the fitted value of y, whereas b0 and b1 are known as
the regression coefficients that we need to estimate
• b0 is the y-intercept: where the line crosses the y-axis
• b1 is the slope: the rise/run of the line
Least squares line coefficients
b1 =
sxy
s2
x
b0 = y − b1x
27 / 36
Guidelines for Exploring Data
We have seen that the descriptive statistic includes the
exploration of the data
Besides knowing the type of data we are dealing with, there
are important aspects to take into account in the exploration
– the central location in the data
– the dispersion (or lack of) in the observations
– the shape of the distribution for the data set
All these information guide us in choose the appropriate
numerical techniques to apply in further statistical analysis
28 / 36
Data collection and Sampling
We want to learn about populations when using statistics
Except for the census – where all members of the population
are the observations – inferences about populations are made
by means of samples
Sampling techniques are crucial for the statistical analysis,
and they are part of the research design
29 / 36
Methods for data collection
There are different methods of collecting data that corresponds
to a variable(s) of interest
Observational data
Direct Observation either by ‘observing’ behaviour in natural settings,
or by ‘asking’ questions to certain people
ª simple and inexpensive, but it has drawbacks
• we do not have control over the subject (main disadvantage)
• comparing groups is difficult
30 / 36
Methods for data collection II
Observational data
Surveys are based on samples of people from populations from
which we solicit information
There are different types of sample survey:
– Interviews (personal or telephonic)
– Self-administrated questionnaire
 the questionnaires should be well designed for all survey types
ª questions should be short, simple, and concise;
start with demographic questions is recommended
31 / 36
Methods for data collection III
Experimental data
Experiments provide data from which we can compare responses
from subjects under different conditions that are known as treatments
ª more expensive than direct observation, but provides better results
• requires special planning to assign individuals
to treatments
• randomization in the sampling is needed
• it is however quite uncommon in social sciences
32 / 36
Sampling
When we perform sampling, we serve from different sampling plans
In a simple random sample or just random sample each case
in the population has an equal chance of being selected
ª this can be done through random number tables
A stratified random sampling separates the population into
mutually exclusive sets called strata, and then it draws a
random sample in each stratum
ª the population is ‘naturally’ divided into groups, i.e. the strata is
recognized by an easily identifiable variable
33 / 36
Sampling II
With a cluster random sampling the population is divided
into clusters and it draws random samples in each cluster
ª useful for large populations with samples across strata
In a systematic sampling each subject of the population is
assigned a number and – starting at a random number –
every kth member from then onwards is selected
 Many times a random sampling is based on a sampling frame,
which is the list of cases from which the sample is selected
All sampling techniques that involve a random sample
constitutes a probability sampling
34 / 36
Sampling Errors
When taking observations from a population we may confront
with different types of variability
Sampling error is the variability of samples from the
characteristics of the population from which they came
ª e.g. the variability from the population mean
Nonsampling error, which results from:
– mistakes made in the acquisition of data
– when responses are not obtained from some subjects in
the sample (nonresponse error or bias)
– when some parts of the population are not selected for
the sample (selection bias)
35 / 36
Summary
We have seen important descriptive indices for the data, which
include location, variability, and measures of dispersion
ª these measures together with the shape of the distribution are crucial parts
of the descriptive statistics of the data
There are also numerical indicators when to calculate the
strength of linear relationship between two quantitative variables
Different methods for data collection and sampling are at the
core of the research design
ª we should avoid sampling and nonsampling variability
36 / 36
BUSINESS STATISTICS I
Lecture – Week 38a (38)
Antonio Rivero Ostoic
School of Business and Social Sciences
 September 
AARHUS
UNIVERSITYAU
Today’s Agenda
Probability
Discrete Random Variables
(Binomial distribution)
2 / 31
What is probability?
Recall that we make inferences about populations (parameters)
based on sample data (statistics)
A link between population and sample lies can be found in
probability theory
Probability is a mathematical branch that was created to study
the chance that a particular outcome will occur
ª for gambling strategies in the 17th century that is actually before the
development of statistics itself
Many times probability is associated with random phenomena
3 / 31
Assigning probabilities
a) In the classical approach the probability is expressed in terms of the
proportion of a particular outcome to all possible outcomes
ª this is sort of an objective method
b) For the relative frequency approach we are interested in the
long-run relative frequency of the probability for an outcome to occur
c) The subjective approach includes a certain degree of belief to the
probability
4 / 31
Probability of events
Relative frequency
In order to obtain all possible outcomes we perform a random
experiment
Random experiments have an exhaustive and mutually
exclusive list of outcomes that constitutes the sample space,
S = { O1, O2, . . . , Ok }
Probabilities are expressed per event, and for each outcome
they vary between 0 and 1
The sum of probabilities of all outcomes in a sample space
must be 1
5 / 31
Probability of events II
Example
The toss of a die
– Sample space: S = {1, 2, 3, 4, 5, 6}
– Probability of events (classical approach): P(i) = 1
6
 how would these probabilities be in case we apply a relative
frequency approach?
6 / 31
Joint probability
The joint probability is the probability of having two distinct traits,
and in this case we perform the intersection of simple events
For events A and B the intersection means that the event occurs
when both A and B occur, i.e. A and B
ª on the other hand, the union of events A and B corresponds to A or B
Joint probability table:
B1 B2
A1 P(A1 and B1) P(A1 and B2)
A2 P(A2 and B1) P(A2 and B2)
7 / 31
Joint probability II
A and B S
A B
A B
A or B S
8 / 31
Marginal probability
The computation with the marginal probability is by adding
across rows or down columns of the joint probability table
 e.g. in the joint probability table:
– Across row: P(A1 and B1) + P(A1 and B2)
– Down column: P(A1 and B2) + P(A2 and B2)
9 / 31
Conditional probability
The conditional probability is the probability of any event occurring
can be affected by another event
For events A and B, the conditional probability of event A given B is:
P(A | B) =
P(A and B)
P(B)
 i.e., the proportion of the joint probability for both events to the given event
• In case that the probability of one event is not affected by other
event, then these events are said to be independent to each other
P(A | B) = P(A)
P(B | A) = P(B)
10 / 31
Other probability rules
Complement
P(A ) = 1 − P(A)
 that is the event that occur when A does not occur
Addition
P(A or B) = P(A) + P(B) − P(A and B)
ª here we subtract the joint probability of the events because in the
marginal probability is taken twice
 for mutually exclusive events their joint probability is zero
11 / 31
Other probability rules II
Multiplication for dependent events
P(A and B) = P(B) P(A | B) = P(A) P(B | A)
Multiplication for independent events
P(A and B) = P(A) P(B)
 multiplication is used to compute the joint probability of two events
12 / 31
Probability tree
We can use a probability tree diagram to apply the probability
rules to a given problem
ª it displays all the outcomes of a sequence of branches
Toss of a coin
•
tail
tail1
2
head
1
21
2
head
tail1
2
head
1
2
1
2
head head 1
2
· 1
2
= 1
4
head tail 1
2
· 1
2
= 1
4
tail head 1
2
· 1
2
= 1
4
tail tail 1
2
· 1
2
= 1
4
13 / 31
Exercise 15
The Monty Hall problem
In the original TV show game ‘Let’s Make a Deal’ the
participant has the choice to open one of three doors. One door
is concealing a car and the other two doors a goat
← ← ←
Once the participant makes his/her decision the host reveals
another door containing a goat and then asks the participant:
“Do you want to switch door or stay with your choice? ”
For us the main question here is: Does it make any difference
to switch from the original choice?
15 / 31
Probability tree diagram of the Monty Hall problem
•
2 3
2
1
2
3
1
22
3
1
3
1
2
2
1
2
1
3
probability Stay? Switch?
1
3 · 1
2 = 1
6 1
12
1
12
1
3 · 1
2 = 1
6 1
12
1
12
2
3 · 1
2 = 1
3 1
6
1
6
2
3 · 1
2 = 1
3 1
6
1
6
 Initial probability for the car is 1
3
, and for a goat 2
3
. However when the host opens
a door then the probability for this door becomes 0, and 2
3
for the other door
16 / 31
Bayesian probability
Bayesian probability is one of the different interpretations of the
concept of probability
Here a probability is assigned to a hypothesis, whereas under
the frequentist view a hypothesis is typically tested without
being assigned a probability
ª hence Bayesian probability is related with the subjective approach
We have already seen that with conditional probability we can
measure the chance that an event occur given the occurrence
of another event
With Bayesian probability we can compute the chance of the
possible causes for a particular event to occur
17 / 31
Bayes’ Theorem
The Bayes’ Theorem is a rule that provides the conditional probability
of A occurring given that B already happened
P(A | B) =
P(A) P(B | A)
P(B)
 Event A constitutes the hypothesis, whereas B is the observation
• P(A | B) is the posterior probability of A
ª an updated degree of belief
• P(A) is the prior probability of A
ª that is before the observation (with known probabilities)
• P(B | A) is the likelihood of A
ª i.e. the probability that the hypothesis confers upon the observation
• P(B) is the unconditional probability of B
ª i.e. the probability of the observation irrespective of any hypothesis
18 / 31
Bayes’ Theorem II
More specifically, the Bayes’ Theorem can be restated for a given
event B and events A1, A2, . . . , Ak where:
• P(A1 | B), P(A2 | B), . . . , P(Ak | B) are the posterior probabilities we seek
• P(A1), P(A2), . . . , P(Ak) are the prior probabilities
• P(B | A1), P(B | A2), . . . , P(B | Ak) are the likelihoods
Bayes Formula
P(Ai | B) =
P(Ai) P(B | Ai)
P(A1) P(B | A1) + P(A2) P(B | A2) + · · · + P(Ak) P(B | Ak)
19 / 31
Example
Bayes’ Theorem
• In a class where 60% were female, the probability of passing
the test was 90% for females and 70% for males. What is the
probability of someone passing the test being female?
• A1 and A2 are the proportions of being female and male
respectively, whereas B represents passing the test
• We need to calculate P(A1 | B):
.9 × .6
(.9 × .6) + (.7 × .4)
= .66
 the prediction for a female passing the test has increased
due to the added information
20 / 31
Summarizing Probabilities
Correct method
When the joint probabilities are given we can compute the marginal
probabilities, which allows us to compute conditional probabilities
ª with conditional probability we can see whether the events are
independent or dependent
When the joint probabilities are required we can apply probability
rules and probability trees
– multiplication for intersection
– addition for mutually exclusive events
– Bayes formula for new conditional probabilities
21 / 31
Discrete Random Variables
A random variable is a variable from which its values has not
been chosen
ª in a fixed variable on the other hand the values are previously selected
Examples of random variables are the outcomes of flipping a
coin or rolling a die, and such actions are called experiments
Two types of random variables:
• Discrete: with countable number of values
ª e.g. flipping a coin whose values are the number of
occurrences for each possible outcome of the random variable
• Continuous: when the values are uncountable
ª e.g. amount of time to complete a task
22 / 31
Discrete Probability Distributions
For a discrete random variable the probability distribution describes
the associated probability of its possible outcome values
Each probability of a random variable is a quantity between 0 and 1,
and the sum of the probabilities of all possible values equals 1
That is, for x representing the outcome of a random variable X, and
P(x) the probability of that outcome:
0 P(x) 1 and
all x
P(x) = 1
23 / 31
Population and probability distributions
Probability distributions can be used as representatives of
populations as well
Both the population mean and variance have their counterparts
on parameters corresponding to a given probability distribution
• The mean of a probability distribution for a discrete variable X is
called the expected value of X and it is represented by E(X)
Mean of a probability distribution
E(X) = µ =
all x
x P(x)
24 / 31
Population and probability distributions II
Variance of a probability distribution
V(X) = σ2
=
all x
(x − µ)2
P(x)
• The standard deviation of a probability distribution (σ) equals
to square root the variance, i.e. σ =
√
σ2
 If the probability distribution is approximately bell shaped then we
can apply the Empirical Rule to interpret σ
25 / 31
Laws of Expected Value and Variance
To quickly determine the expected value and variance of a
given constant or a random variable we use specific rules
X → a random variable
c → a constant
Expected Value
1. E(c) = c
2. E(X + c) = E(X) + c
3. E(cX) = cE(X)
Variance
V(c) = 0
V(X + c) = V(X)
V(cX) = c2
V(X)
26 / 31
Probability distributions involving two variables
Recall that P(x) represents the probability that a random
variable X equals x
ª in this case we are considering a single variable
It is possible to determine the probabilities for combinations
involving two variables X and Y, which is represented as:
P(x, y)
with conditions that the outcome for all pairs of values vary between
0 and 1, and the sum of probabilities in the sample space is 1
i.e. 0 P(x, y) 1 and all x all y P(x, y) = 1
27 / 31
Probability distributions involving two variables II
While univariate distributions represented the distribution of
one variable, with two variables such representation is made by
a bivariate distribution
It is possible to obtain both the joint probability and the marginal
probability of any bivariate probability distribution
The marginal distribution will provide us with the expected
mean, variance, and SD for each variable
However, with the association of two variables we can compute
the covariance and the coefficient of correlation as well
ª as for the linear relationship (cf. last lecture), but now involving probabilities
28 / 31
Covariance and Correlation
Covariance
Cov(X, Y) = σxy =
all x all y
(x − µX)(y − µY) P(x, y)
 i.e. the product of the deviations from the mean for X and Y, and
their joint probability
Correlation
ρ =
σxy
σxσy
 as before, the proportion of the covariance of the two variables
and their SDs
29 / 31
Sum of Two Variables
and expected outcomes
One combination that has practical applications is the sum of
two variables
Laws of Expected Value and Variance
1. E(X + Y) = E(X) + E(Y)
2. V(X + Y) = V(X) + V(Y) + 2Cov(X, Y)
 if the variables are independent, then the covariance is 0
30 / 31
Summary
Probability deals with computing the chance for particular
outcomes in a given set of events, and there are different
approaches in assigning probabilities of events
With joint probabilities we can calculate marginal and conditional
probabilities and hence determine whether or not events are
independent
Probability trees are useful to compute probability models with
sequence of actions
Arbitrary phenomena is associated with probability through
random variables, and such types of variables are described by
probability distributions and rules of expected values
31 / 31
BUSINESS STATISTICS I
Lecture – Week 38b (39)
Antonio Rivero Ostoic
School of Business and Social Sciences
 September 
AARHUS
UNIVERSITYAU
Today’s Agenda
1. Binomial distribution
2. Poisson distribution
3. Hypergeometric distribution
2 / 27
Probability Distributions
Recall that we introduced the concept of probability distribution,
which describes the associated probability of possible outcomes
for a random variable
ª a random variable is a numerical outcome of an experiment
The distribution of such values is a very important piece of
information since provides us with the pattern of characteristics
of the variable values over the sample or population
Depending on the type of variable and the scale of the data, we
find a variety of distributions of populations or sampled data
In case we consider a single variable then the data is allocated
in univariate distributions
3 / 27
Binomial experiment
If e.g. we toss a coin n times and count the number of ‘heads’
then we perform a binomial experiment in a set number of trials
ª ‘binomial’ because there are two possible outcomes
In this case the outcomes of the trials are not affected by the
outcomes of other trials, which means that the trials are
independent to each other
By counting the number of heads, then a head is regarded a
success and a tail is considered as a failure
ª it can certainly be the other way around
The probability of success is denoted by p, and the probability of
failure as 1 − p
ª we try to estimate the value of p, which is between 0 and 1
4 / 27
Binomial random variable
The numbers of successes in the binomial experiment is called
the binomial random variable
ª A binomial random variable is discrete and can take on countable values
0, 1, 2, . . . , n
If we represent the number of successes in n trials as a random
variable X, then the number of failures becomes n − X
The probability for each sequence of branches in the probability
tree representing x successes and n − X failures is:
px
(1 − p)n−x
and the number of branch sequences for these outcomes is:
n
x
=
n!
x!(n − x)!
(cancelling)
where e.g. n! = n(n − 1)(n − 2) . . . (2)(1) (called ‘n factorial’) (but 0! = 1)
5 / 27
Binomial Random Variable Examples
Examples of a binomial random variable are:
• the number of correct guesses at n true/false questions when
you randomly guess all answers
• the number of left-handers in a randomly selected sample of n
unrelated people with replacement
• the number of (...)
6 / 27
Binomial distribution
Any binomial experiment is described by a binomial distribution
The probability of x successes in a binomial experiment is:
P(x) =
n
x
px
(1 − p)n−x
where P(x) is the probability of success for x = 0, 1, 2, . . . , n
The formula above is known as the probability mass function
(p.m.f.) corresponding to the binomial distribution
7 / 27
Binomial distribution
n = 30
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
x
P(x)
p = .25
p = .5
8 / 27
Binomial distribution
n = 30
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
x
P(x)
p = .25
9 / 27
Binomial distribution
n = 30
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
x
P(x)
p = .25
p = .5
p = .75
10 / 27
Binomial distribution
n = 30
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
x
P(x)
p = .25
p = .5
p = .75
11 / 27
Expectations of the Binomial distribution
If a random variable X has a binomial distribution, we write
X ∼ B(n, p)
meaning that X is a binomially distributed random variable
Each random variable with a binomial distribution has the same
p.m.f., but they may have different values for the parameters
The parameters of the distribution are n and p, and both are
known, which means that for a binomial random variable we
can determine the expected values
• E(X) = µ = np
• V(X) = σ2 = np(1 − p)
• SD(X) = σ = V(X)
12 / 27
Example
Binomial distribution
The testing center (cf. Ex 15) shows that 14% of the new
cars have a defect
Suppose that the center tests 20 new cars in a daily basis
• What is the probability that the center finds just one defect
new car in a day?
i.e. P(X = 1) =
20
1
.141 (1 − .14)20−1 = .16
 That is 16%
13 / 27
Cumulative Binomial Probabilities
We can use the cumulative probability if we want to calculate the
probability that a random variable is less than or equal to a value
That is, if we wish to determine P(X x)
P(X x) =
x
i=0
n
i
pi
(1 − p)n−i
where x is the greatest integer x (called the ‘floor’ under x)
 this means that for P(X 1) = P(0) + P(1), and so on...
All such values are recorded in tables of binomial probabilities with
tabulated scores for different probabilities and trials with diverse size
14 / 27
Binomial Table
n = 5; p = .01, .05, .10, .20, .25 and x 4
TABLE 1 Binomial Probabilities
Tabulated values are . (Values are rounded to four decimal p
n ‫؍‬ 5
p
k 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.
0 0.9510 0.7738 0.5905 0.3277 0.2373 0.1681 0.0778 0.0
1 0.9990 0.9774 0.9185 0.7373 0.6328 0.5282 0.3370 0.1
2 1.0000 0.9988 0.9914 0.9421 0.8965 0.8369 0.6826 0.5
3 1.0000 1.0000 0.9995 0.9933 0.9844 0.9692 0.9130 0.8
4 1.0000 1.0000 1.0000 0.9997 0.9990 0.9976 0.9898 0.9
n ‫؍‬ 6
p
k 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.
0 0.9415 0.7351 0.5314 0.2621 0.1780 0.1176 0.0467 0.0
P(X … k) = a
k
x=0
p(xi)
 thus for n = 5; p = .1 P(X 2) = .9914 (...)
15 / 27
Poisson Random Variable
Like in a binomial random variable, a Poisson random variable
corresponds to the number of occurrences of events or successes
ª named after S. D. Poisson
However in a Poisson random variable the number of successes is
considered in an interval time or specific region of space in a
Poisson experiment
Intervals are independent of each other, do not overlap, and the
probability of a success in an interval is proportional to its size
ª hence intervals with equal size have the same probability and an interval
approaches to 0 when it is very small
16 / 27
Poisson Random Variable Examples
Examples of a Poisson random variable are:
• the number of customers’ queueing (in a shop, a call center, a
public service) in a unit of time
• the number of hits on a Web site in a day
• the number of goals scored by a football team in a match
17 / 27
Poisson distribution
A Poisson experiment is described by a Poisson distribution
X ∼ P(µ)
µ is the expected value (mean) parameter of the underlying rate of
occurrence in an interval or region (also written as λ)
ª in this case the rate of occurrence is known and constant
The probability mass function for the Poisson distribution is:
P(x) =
e−µµx
x!
for a value of x = 0, 1, 2, . . . successes in a given interval or region
(theoretically with no upper limit)
e is a constant approx. 2.718 (Euler’s number) that is the base of the
natural logarithm
18 / 27
Poisson distribution
0 2 4 6 8 10
0.0
0.1
0.2
0.3
0.4
x
P(x)
µ = 1
µ = 2
µ = 3
19 / 27
Expectations of the Poisson distribution
In the Poisson distribution the variance of is equal to its mean,
and the standard deviation of is equal to the square root of its
mean
E(X) = V(X) = µ = σ2
20 / 27
Example
Poisson distribution
Suppose a Website has 1.8 hits on average per minute
• What is the probability of receiving 5 hits in a given minute?
i.e. P(X = 5) =
e−1.81.85
5!
= .026
 That is 3%
21 / 27
Hypergeometric experiment
The hypergeometric distribution is a probability distribution used to
describe the outcomes produced with a hypergeometric experiment
Here a sample of size n is randomly selected without replacement
from a population of N items, which means that once a particular
outcome has been selected it cannot be picked again
ª this contrasts to the binomial experiment where the probability of x successes in
the trials is with replacement
– Sampling with replacement: it is possible to select the same item
again, and the size of the population remains the same
ª e.g. tossing a coin
– Sampling without replacement: it is not possible to select the
same item again, thus the size of the remaining population
changes as we remove each item
ª e.g. picking a black ball from an urn containing black and white balls
22 / 27
Hypergeometric random variable
In a given population size N, k items are classified as successes and
N − k items are categorised as failures
A hypergeometric random variable X corresponds to the number of
successes in a sample size n, and it can take one of 0, 1, 2, . . . , n values
The probability of x = 0, 1, 2, . . . , n successes is described by the p.m.f.
of the hypergeometric distribution of X as:
P(x) =
(k
x)(N−k
n−x)
(N
n)
 this is the proportion of • the number of samples of n items that contain
exactly x successes chosen from k and n − x failures chosen from (N − k)
• to the number of possible samples that can be drawn from the population
23 / 27
Hypergeometric distribution
0 2 4 6 8 10
0.0
0.1
0.2
0.3
N = 52, n = 10, k = 16
24 / 27
Expectations of the hypergeometric distribution
• E(X) = µ = n k
N
• V(X) = σ2 = n k
N 1 − k
N
N−n
N−1
• SD(X) = σ = V(X)
25 / 27
Example
Hypergeometric distribution
A graduate statistics course has 7 male and 3 female students.
The teacher wants to select 2 students at random to help her
conduct a research project.
• What is the probability that the two students chosen are female?
(solved)
• What is the probability that the one student chosen is female?
i.e. P(X = 1) =
(3
1)(10−3
2−1 )
(10
2 )
= .4666667
 that is 21
90
+ 21
90
= 47% (c.f. fig.6.2, pp193)
26 / 27
Summary
Discrete probability distributions
The binomial distribution measures the probability of the
number of successes over a specific number of trials with
replacement
The Poisson distribution measures the probability of a
number of events occurring within a given time interval
The hypergeometric distribution measures the probability of a
specified number of successes over a specific number of
trials without replacement from a finite population
27 / 27
BUSINESS STATISTICS I
Lecture – Week 39 (40)
Antonio Rivero Ostoic
School of Business and Social Sciences
 September 
AARHUS
UNIVERSITYAU
Today’s Agenda
1. Review on Distributions
2. Continuous Random Variables
3. Uniform Distribution
4. Normal Distribution
2 / 31
Review on Distributions
Recall that probability distributions serve to describe random
variables
From a given probability distribution, there are two important
pieces of information that we can obtain:
– what are the values that the variable takes
– how often the variable takes these values
With the depiction of such information with graphical methods
then we get a shape that characterizes either the sampling or
the population representing the random variable
3 / 31
Modality of a distribution
An important characteristic of a probability distribution is its modality
The data depicted graphically by a bell shaped curve is an example
of unimodal distribution
ª this is because there is a single peak that represents the mode
Hence a data set which has two equally common modes produces
a bimodal distribution with two peaks
Also, a multimodal distribution is a distribution of scores having
more than two modes
4 / 31
Skewness and Kurtosis
Other important measures of the shape of a probability
distribution are:
1) Skewness that measures the degree of asymmetry of a
distribution
ª each type of probability distribution has its own formula to calculate
the skewness of the shape distribution, and perfectly symmetric
distributions have zero skewness
2) Kurtosis that measures the degree of ‘peakedness’ of the
distribution
ª as with the skewness, each distribution has a formula to calculate
the kurtosis
5 / 31
Continuous (and Discrete) Data
Recall that quantitative variables can be discrete or continuous
Continuous data is uncountable in the sense that has continuum
possible values in a range
ª this is opposed to the discrete data that can take relatively few different values
Examples of continuous data are time, the height of a person, etc.
However continuous variables must be rounded when measuring
and we usually think of them as discrete
ª we say e.g. that an individual is 20 years old and not between 20 and 21
 continuous data has always an interval scale, whereas discrete
data can take any scale
6 / 31
Continuous Random Variable
A continuous random variable serves to represent continuous
data, and it takes an infinite number of possible values
Probability distributions treat very different the discrete and the
continuous random variables
Since there are theoretically an infinite number of values in a
continuous random variable then
ª it is not possible to list all possible values, and
ª the probability of each individual value is practically 0
Thus probability distributions for continuous random variables
consider just the range of the values
Then we estimate the probability that a randomly selected
outcome falls within a determinate range, which is the interval
7 / 31
Approximation Function
for continuous random variables
Recall that in the case of discrete distributions a probability mass
function was used to approximate the probability distribution for P(x)
In the case of continuous random variables the probability
distribution is characterized by a curve that is determined by a
function as well
However such approximation is made by a probability density
function (p.d.f.) or just density that is represented as f(x)
The conditions for a probability density function with a range
a x b is that f(x) 0 for all x, and the total area under the curve
between a and b is 1
ª a and b represent the most extreme values of the data
8 / 31
Approximation Function II
In order to calculate the probability of any interval in a
probability continuous distribution, we need to find the area
under the curve
In such case the integral of the density of the variable over
the range provides the probability of the random variable
falling within this particular range
ª with the use of integral calculus
9 / 31
Continuous Uniform distribution
A distribution that has constant probability is found in the
uniform distribution
ª actually there is both a discrete and a continuous version of this distribution
Another name for the uniform distribution is multimodal
Examples of the uniform distribution are:
• the amount of milk distributed daily in a given town
• the amount of electricity that a soft drink cooler machine
consumes per month
• (...)
10 / 31
Probability density function: Uniform distribution
The probability density function of the uniform distribution for
a x b is
f(x) =
1
b − a
and f(x) = 0 iff x  a or x  b
Besides, the probability that a continuous random variable that is
uniformly distributed equals any individual value is 0
The uniform distribution is depicted by a rectangle with height
f(x) and for P(x1  X  x2) the base is x2 − x1
 i.e. P(x1  X  x2) = (x2 − x1) ×
1
b − a
11 / 31
Uniform distribution plot
x
f(x)
a
1
b−a
b
12 / 31
Uniform distribution plot
P(x1  X  x2)
x
f(x)
x1 x2a
1
b−a
b x
13 / 31
Example
Uniform distribution
A vending machine consumes per year between 420 kWh and 500 kWh.
• What is the probability that the vending machine consumes at
least 480 kWh?
P(X 480) = (500 − 480) × ( 1
80
) = 0.25
• What is the probability that the vending machine consumes at
most 480 kWh?
P(X 480) = 1 − P(X 480)
• What about the probability that the vending machine consumes
precisely 500 kWh?
P(X = 500) = 0
14 / 31
Normal distribution
The most important distribution in statistics is the normal distribution
ª which is symmetric and it has a bell shaped curve
Its importance is partly because it approximates well the distributions
of many types of variables
ª in such cases the sampled data tend to be approximately bell-shaped
The properties of the normal distribution play a crucial role in
statistical inference
ª that is even when the sample data are not bell-shaped
Other names for the normal distribution are the Gaussian
distribution, the Z distribution...
15 / 31
Normal distribution II
Each normal distribution has two parameters, the mean µ and the
standard deviation σ, and the exact form of the distribution depends
on the values of these parameters
ª we know from the empirical rule that most of the scores will fall within e.g. three
standard deviations of the mean
A special case of the normal distribution is the standard normal
distribution Z that is a normal distribution with mean µ = 0 and
standard deviation σ = 1
Examples of the normal distribution are:
• heights of people
• errors in measurements
• (...)
16 / 31
Normal Density Function
The normal distribution is the probability distribution for a
normal random variable
X ∼ N(µ, σ2
)
The probability density function of a normal random variable
−∞  x  ∞ is:
f(x) =
1
σ
√
2π
e−1
2 (x−µ
σ )
2
where e ≈ 2.7183 (Euler’s number), and π ≈ 3.1416 (Pi)
 And Z ∼ N(0, 1)
17 / 31
Normal distribution plot
µ
x
f(x)
18 / 31
Normal distributions
different means, same σ
3 6
x
f(x)
 the shape remains the same when only the mean changes its value
ª increasing [decreasing] mean shifts the curve to the right [left]
19 / 31
Normal distributions
different variances, same µ
µ
x
f(x)
 the shape becomes flatter, the bigger the variance (and SD) is
20 / 31
Computing normal probabilities
Like the uniform distribution, the probability of a normal random
variable falls into an interval, and we must calculate the area of the
interval under the curve
However since the shape of the normal distribution is not a
rectangle anymore, in this case the function is more complicated
Then we use a probability table to calculate the probability of a
normal random variable
ª as we did with binomial and Poisson probabilities
Fortunately we just need one table of probabilities by standardizing
the random variable
21 / 31
Standard normal random variable
A standard normal random variable Z equals to the difference
between the score and the mean in X divided by its standard
deviation:
Z =
X − µ
σ
A positive standardized score (or z-score) indicates a datum above
the mean, and a negative standardized score indicates a datum
below the mean
A ‘Z transformation’ means that the probabilities for X are now
translated into statements for Z
We use the cumulative standardized normal probabilities for
P(Z  z), which indicates the relative frequency of z-scores
ª Keller’s book has a table for −3.09  z  3.09 (others by approximation)
22 / 31
Normal distribution for P(X  (µ + σ))
µ µ + σ
x
23 / 31
Normal distribution for P(Z  1)
0 1
z
24 / 31
Cumulative standardized normal probabilities
for P(Z  z)
B-8 A P P E N D I X B
TABLE 3 Cumulative Standardized Normal Probabilities
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
P(- q 6 Z 6 z)
0z
.
Z 0.00 0.01 0.02 0.03 0.04
0.0 0.5000 0.5040 0.5080 0.5120 0.516
0.1 0.5398 0.5438 0.5478 0.5517 0.555
0.2 0.5793 0.5832 0.5871 0.5910 0.594
0.3 0.6179 0.6217 0.6255 0.6293 0.633
0.4 0.6554 0.6591 0.6628 0.6664 0.670
0.5 0.6915 0.6950 0.6985 0.7019 0.705
0.6 0.7257 0.7291 0.7324 0.7357 0.738
0.7 0.7580 0.7611 0.7642 0.7673 0.770
0.8 0.7881 0.7910 0.7939 0.7967 0.799
0.9 0.8159 0.8186 0.8212 0.8238 0.826
1.0 0.8413 0.8438 0.8461 0.8485 0.850
P(- q 6 Z 6 z)
0 z
TABLE 3 (Continued)
25 / 31
Normal distribution for P(Z  2)
0 2
z
 P(Z  2) = 1 − P(Z  2) = .0228
26 / 31
Normal distribution for P(Z  ZA) = A
0 zA
z
A
 ZA equals the 100(1 − A)th percentiles of Z
27 / 31
Normal distribution for P(−1  Z  1)
−1 0 1
z
 P(−1  Z  1) = P(Z  1) − P(Z  −1) = .6826
28 / 31
Expectations of the normal distribution
Normal distribution
• E(X) = µ
• V(X) = σ2
• SD(X) = σ
Standard normal distribution
• E(Z) = 0
• V(Z) = 1
• SD(Z) = 1
29 / 31
Example
Standard normal distribution
• Find z.075
1 − .075 = .925  Z.075 ≈ 1.44
• Find z−.075
 Z−.075 ≈ −1.44
30 / 31
Summary
Probability distributions representing random variables have their
own modality, and measures of skewness and kurtosis
Continuous random variables represent uncountable data that is
infinite in theory, and the data is treated with different intervals
Uniform distributions have a constant probability and it is
represented by a rectangle where the product of the density is the
height and the difference between extreme values is the base
The normal distribution has a bell shaped curve and it plays a
central role in statistics because it properties
By standardizing the normal random variables is possible to obtain
cumulated normal probabilities for relative frequency scores
31 / 31
BUSINESS STATISTICS I
Lecture – Week 40a (41)
Antonio Rivero Ostoic
School of Business and Social Sciences
 September 
AARHUS
UNIVERSITYAU
Today’s Agenda
Continuous Random Variable Distributions
and Exponential distribution
1. Student t Distribution
2. Chi-Squared Distribution
3. F Distribution
2 / 32
Why other continuous distributions?
and not just the normal distribution
Despite many nice properties of the normal distribution, a major
concern is the derivation of the p.d.f. for this distribution
ª we need to count with the values of two parameters (µ and σ2
)
However, many times we do not have a variability parameter for
the population
Even more important we need to have the appropriate statistics
for small samples
There are other distributions that represent better than N small
samples, asymmetric data, or data with many outliers
ª even more some distributions only need a single parameter
3 / 32
Example: Exponential distribution
For instance, the exponential distribution requires one parameter
whose reciprocal transformation equals both the mean and the
standard deviation of the random variable
ª i.e. µ = σ = 1/λ
Thus the distribution is completely specified by a known parameter
Its probability density function for x 0 and a parameter λ  0 is:
f(x) = λe−λx
e ≈ 2.7183
The associated probabilities are:
• P(X  x) = e−λx
• P(X  x) = 1 − e−λx
• P(x1  X  x2) = e−λx1 − e−λx2
4 / 32
Example: Exponential distribution plot
different values of λ
x
f(x)
λ = .5
λ = 1
λ = 2
5 / 32
Student t Distribution
The t distribution or Student t distribution is a distribution
commonly in statistical inference
ª ‘Student’ was the pseudonym of W.S. Gosset who derived this distribution,
and he used the letter t to represent the random variable
The Student t distribution represent the distribution of t values
that varies according to the sample size
ª the larger sample is then resembles more to a Z or normal distribution
ª the smaller the sample is, the flatter the distribution becomes
The t distribution depends on a single parameter called the
degrees of freedom that is represented by ν or sometimes by df
ª and the exact shape of a distribution is determined by this parameter
 degrees of freedom – loosely speaking – implies values that are
‘free to vary’ or ‘not fixed by any parameter or scores’
6 / 32
t density function
The probability density function of the t distribution is:
f(t) =
Γ ν+1
2
Γ(ν
2 )
√
νπ
1 +
t2
ν
−ν+1
2
with a parameter ν  0, which is for the degrees of freedom;
π ≈ 3.1416, and the gamma function Γ
 hence T ∼ tν has a t-distribution with ν degrees of freedom
7 / 32
Student t distribution plot
0
t
8 / 32
t and Z distributions
0
Z
t
 While N is bell shaped, the t-distribution is mound shaped
9 / 32
t distribution, different degrees of freedom
0
t
ν = 2
ν = 5
ν = 50
 The shape resembles more the normal distribution with larger ν values
10 / 32
Student t random variable
As with the other random variables we have seen so far, we can
produce values for a Student t random variable through a
statistical experiment
ª then we can calculate the probability for such variable
The expected value and variance for a Student t random
variable with ν degrees of freedom are:
E(t) = 0
V(t) =
ν
ν−2 for ν  2
11 / 32
t distribution for tA with a ν
0 tA
t
A
 P(t  TA,ν) = A
12 / 32
Computing t probabilities
Computing probabilities implies calculating areas now under the
t distribution curve, and to achieve this we use probability tables
The probability table for the t distribution gives the probabilities
of exceeding critical values that are determined by different ν
ª in Keller’s book the table is given for some degrees of freedom from 1 to 200
and ∞ (otherwise use approximation)
Only the right tail probability is given but, since the t distribution
curve is symmetric around 0, then the left table equals to −tA,ν
Notice as well that the different t values approximates z scores
when ν approximates to ∞
13 / 32
Applications of the t distribution
We can use the t distribution to test and obtain confidence
intervals of the mean in a normally distributed population
when the variance is unknown
ª (cf. lecture week 46)
Also we can compare two expected values for normally
distributed populations with unknown variances
ª (cf. lectures week 49 and 50)
We can perform tests and confidence intervals in correlation
and regression analyses
14 / 32
Chi-squared distribution
The Chi-squared or χ2 distribution is another type of distribution
commonly used in hypothesis testing
ª as with the t distribution, a χ2
distribution depends on a single parameter,
which is the number of degrees of freedom that shapes the distribution
The χ2 distribution is the sum of squares of ν independent
standard normal random variables
i.e., for Z1, Z2, . . . , Zν where ν  0, and each Zi ∼ N(0, 1) and
are independent to each other, then
Z2
1 + Z2
2 + · · · + Z2
ν ∼ χ2
(ν)
 thus there are ν variables that represent the number of degrees of
freedom we can choose independently of ‘freely’
15 / 32
Chi-squared density function
The χ2 density function is:
f(χ2
) =
1
Γ ν
2 2
ν
2
(χ2
)
ν
2
−1
e−χ2
2
for χ2
 0 with ν  0, e ≈ 2.7183, and the gamma function
16 / 32
Chi-squared distribution plot
0
χ2
f(χ2
)
17 / 32
Chi-squared distribution, different df
0
χ2
f(χ2
)
ν = 1
ν = 4
ν = 5
ν = 10
18 / 32
Chi-squared distribution shape
In this case – unlike the t distribution – the values of the
random variable in the χ2 are not positioned around 0, but
they are rather concentrated on positive values
The values of the random variable for a particular sample
range from 0 to ∞
Although with a large number of degrees of freedom the
shape of the chi-squared distribution resembles the normal
distribution, it is nevertheless continually skewed to the right
In this case the mean is greater than the median, which
means that is positively skewed
19 / 32
Chi-squared random variable
A χ2 random variable is produced by a statistical experiment
The expected value and variance for a chi-squared random
variable with ν degrees of freedom are:
E(χ2) = ν
V(χ2) = 2ν
20 / 32
Computing χ2
probabilities
To calculate probability values for a χ2 random variable implies
(again) computing areas under the shape curve
χ2
A,ν represents the right area under the chi-squared curve
ª i.e. the right tail
However, since the shape of the distribution is not symmetric,
we cannot apply for the left tail anymore −χ2
A,ν
This means that if we compute A, then χ2
1−A,ν represents the
point such as the left area is A
 Critical values of the chi-squared distribution for different
probabilities and df are recorded in the χ2 probability tables
ª for ν  100 can be approximated by N with µ = ν and σ =
√
2ν
21 / 32
Chi-squared critical regions
0 χ2
1−A
χ2
A
A
A
χ2
f(χ2
)
22 / 32
Applications of the χ2
distribution
The chi-squared distribution allow us to test and compute
confidence interval estimators of the variance for a random
variable that is normally distributed
ª with a sufficiently large sample
Goodness-of-fit tests
Homogeneity and independence tests
23 / 32
F distribution
The F distribution is another continuous probability distribution
commonly used in statistical inference
ª ‘F’ stands for R.A. Fisher who described this distribution
In this case the shape of the distribution is determined by two
parameters, which are the degrees of freedom
That is because the F distribution arises as the proportion of two
independent chi-squared variables with their respective df
That is:
χ2
ν1
ν1
χ2
ν2
ν2
∼ F(ν1,ν2)
24 / 32
F density function
The probability density function for the F distribution is:
f(F) =
Γ ν1+ν2
2
Γ ν1
2 Γ ν2
2
ν1
ν2
ν1
2 F
ν1−2
2
1 + ν1F
ν2
ν1+ν2
2
 for F  0, and where ν1 and ν2 are called respectively as the
numerator and denominator degrees of freedom
25 / 32
F distribution plot
0
F
f(F )
 As with the chi-squared distribution, the shape of the F distribution
is asymmetric and positive skewed
26 / 32
F distribution, different degrees of freedom
0
F
f(F )
ν1 = 1, ν2 = 1
ν1 = 3, ν2 = 3
ν1 = 9, ν2 = 9
ν1 = 25, ν2 = 25
27 / 32
F random variable
The F random variable generates its values through a
statistical experiment as well
The expected value and variance for the F random variable
are:
E(F) = ν2
ν2−2 for ν2  2
V(F) =
2ν2
2 (ν1+ν2−2)
ν1(ν2−4)(ν2−2)2 for ν2  4
 thus the mean parameter depends on the denominator degrees of
freedom only, and it approximates to 1 with a large value of ν2
28 / 32
Computing F probabilities
We can calculate the areas under the distribution shape
corresponding to probability values of the F distribution
In this case we also have an asymmetric distribution, which
means that areas in the two tails are FA,ν1,ν2
and F1−A,ν1,ν2
The following relation exists between these two critical regions:
F1−A,ν1,ν2
=
1
FA,ν2,ν1
And we use a different probability table for each value of A with
different numerator and denominator degrees of freedom
29 / 32
Critical regions of the F distribution
0 F1−A FA
A
A
F
f(F )
30 / 32
Applications of the F distribution
We can compare two variances from populations that
are normally distributed with the F distribution and
related statistics
Analysis of variance, which is used to compare the
means of two or more populations, is based on the F
distribution and statistics
31 / 32
Summary
We have seen various continuous distributions that are
important for inferential statistics and for small samples
The t distribution is symmetric, around zero, and it depends
on a single parameter that is its degree of freedom
The χ2 distribution is the sum of independent Z random
variables, and it is positively skewed
The F distribution has asymmetric shape and it depends on
the numerator and the denominator degrees of freedom
 All these distributions are related to their respective statistics,
which have applications in statistical inference
ª Remember Ex. 33, questions 2, 6, 7, 8, 10, 12, and Ex. 2 from Re-exam2013
32 / 32
BUSINESS STATISTICS I
Lecture – Week 40b (42)
Antonio Rivero Ostoic
School of Business and Social Sciences
 October 
AARHUS
UNIVERSITYAU
Today’s Agenda
1. Sampling Distributions:
– of the Mean
– of a Proportion
– of the Difference between Two Means
2 / 32
Sampling Distributions
We used probability distributions to summarize probabilities of
possible outcomes for a random variable
However, using sample data from a population we estimate
characteristics of the distributions expressed in parameters
A sampling distribution is a probability distribution that determines
probabilities of the possible values of a sample statistic
ª it is obtained either by taking repeated random samples of a particular size
from a population, or else by relying on the associated probability rules
Each sample statistic has a sampling distribution
ª so there is a sampling distribution of a sample mean, a sampling distribution
of a sample proportion, a sampling distribution of a sample median, etc.
3 / 32
Sampling Distributions II
Unlike the distributions we have seen so far, a sampling
distribution refers to the values of a statistic computed from
observations in sample after sample
Sampling distributions play a fundamental role in statistical
inference because it allows us to measure how close a sample
statistic is to the population parameter
In other words, the sampling distribution determines the
probability that the statistic falls within any given distance of
the parameter it estimates
4 / 32
Distribution of the Sample Mean
probability rules
Recall that the sample space of a random trial is the set of all
possible outcomes
E.g. sample space for two dice thrown
S =



(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)



5 / 32
Distribution of the Sample Mean II
probability rules
Means of samples of size 2:
x =



1 1.5 2 2.5 3 3.5
1.5 2 2.5 3 3.5 4
2 2.5 3 3.5 4 4.5
2.5 3 3.5 4 4.5 5
3 3.5 4 4.5 5 5.5
3.5 4 4.5 5 5.5 6



6 / 32
Expectations of X
When using the probability approach we depend on the
laws of expected value and variance for the population
parameters of X:
µ = xP(x)
σ2 = (x − µ)2P(x)
σ =
√
σ2
7 / 32
Sampling Random Variables
In a sampling distribution, X constitutes a new random variable
created by sampling, and x is a statistic corresponding to the
sample mean
Even though each sample may have an equal probability, some
samples will have identical value of x
Thus we can draw the probabilities of the different values of x
that corresponds to sampling distribution of the sample mean
8 / 32
Expectations of X
The expected values and variance of the sampling distribution
of X are:
µx = xP(x)
σ2
x = (x − µx)2P(x)
σx = σ2
x
 thus for n = 2 we have µx = µ, whereas σ2
x = σ2
/2
9 / 32
Distribution of X and X
1 2 3 4 5 6
0.14
0.16
0.18
x
p(x)
Distributions for the score
from rolling a single die
1 2 3 4 5 6
0.05
0.10
0.15
x
p(x)
Distributions for the mean
of two dice scores
 The distribution of X is different from the distribution of X, even though
these variables are related
10 / 32
Sampling distributions of X
different sizes
It is possible to obtain sampling distributions of X for different
sample sizes and the sample statistic for the mean equals the
population parameter
µx = µ
However in the case of the variance of the sampling
distribution, this parameter equals to the proportion of the
population variance to the sample mean
σ2
x =
σ2
n
11 / 32
Standard Error of the Mean
The standard deviation of a sampling distribution is called the
standard error of the mean, and for infinitely large populations is
defined as:
σx =
σ
√
n
When the size of the population is finite and known there is a finite
population correction factor added to the expression, and the
standard error becomes:
σx =
σ
√
n
N − n
N − 1
However, such factor is close to 1 when N is large relative to n (like
20 times larger)
ª thus it can be omitted
12 / 32
Sampling distributions of X for n = 6
3.51 6
0.0
0.1
0.2
0.3
0.4
0.5
13 / 32
Central Limit Theorem
We observe that as the sample size gets larger, the sampling
distribution of X becomes narrower
ª this is because the values are more concentrated around the mean
Even more significant, the larger the sample size is the more
the distribution is similar to a bell shaped distribtuion
The Central Limit Theorem states:
For a sufficiently large sample size, the sample distribution
of the mean of a random variable drawn from any population
is approximately normal
 Thus the larger the sample size, the more closely the sampling
distribution of X resembles N
14 / 32
Central Limit Theorem II
The approximation given in the central limit theorem applies
also if the population is nonnormal distributed with a sufficiently
large sample size
ª for nonnormal populations a ‘sufficiently large’ sample is n  30
ª for highly skewed populations we need moderately large sample size
Because of the central limit theorem we benefit from the
properties of the standard normal distribution in order to
compute sample probabilities
In this case we use with Z the standard error of the mean:
Z =
X − µ
σ/
√
n
15 / 32
Example
Using X
A company’s vending machines consume on average 460 kwh of
electricity with a standard deviation of 5 kwh
• What is the probability that a vending machine in a given
location consumes less than 470 kwh?
P(X  470) = P X−µ
σ
 470−460
5
= P(Z  2) = 0.9772 = 98%
• ...and the probability for using more than 470 kwh?
P(X  470) = P X−µ
σ
 470−460
5
= P(Z  2) = 1 − P(Z  2)
= 1 − .9772 = 0.0228 = 2%
16 / 32
Example II
Using X
• What is the probability that 3 vending machines consume less
than 465 kwh?
 i.e. P(X  465)
Since we assume that X is normally distributed, the standard
error of the mean must consider the sample size
σx = σ√
n
= 5√
3
= 2.89
P(X  465) = P
(X−µx)
σx
 465−460
2.89
= P(Z  1.73) = .9582 = 96%
17 / 32
P(X  470) and P(X  465)
last two examples
µ = 460 470
x
µ = 460 465
x
18 / 32
Inference with the Sampling Distribution
Many times we do not know the values of µ and σ when we want
to calculate probabilities
However we can infer such values through the sampling
distribution, and for instance the value of µ can be deduced on
the basis of the distribution of the sample mean
More specifically, we can obtain a particular probability that the
sample mean fall between two values by using the properties of Z
A general formula for this problem is found in:
P(µ − zα/2
σ√
n
 X  zα/2
σ√
n
+ µ) = 1 − α
where α is the probability that X does not fall into the interval
19 / 32
Inference of the sample mean
Example
A sample distribution with n = 3 tells us that a vending machine
consumes on average 470 kwh with σ = 5 kwh
Then we can compute the 95% probability that the mean is located
in a certain range from the sample mean
Since z.025 = 1.96, then P(−1.96  Z  1.96) = .95
By adding µ and by multiplying by σ/
√
n to all terms in the
probability statement we get:
P(µ − 1.96 σ√
n
 X  1.96 σ√
n
+ µ) = .95
P(470 − 1.96 5√
3
 X  1.96 5√
3
+ 470) = .95
P(464.3  X  475.7) = .95
ª hence the sample mean will fall between 464.3 and 475.7 with 95%
probability, which means that the computed sample mean is
supported by the sample statistic
20 / 32
Sampling Distribution of a Proportion
Recall that the binomial distribution depends on a parameter
p that represents the probability of success in any trial
As with the previous example with the mean, typically the
value of this parameter is unknown and it needs to be
estimated
For that, we conduct a binomial experiment where we count
in n samples the number of successes X, and this random
variable is binomially distributed
21 / 32
Sampling Distribution of a Proportion II
The estimator of p is the proportion of the number of successes to
the sample size
ˆP =
X
n
For instance, for a sample size n and a probability of success p,
we can find the probability that X is at most x by using a binomial
probability table as:
P(ˆP
x
n
) = P(X x)
However as we have seen with quantitative variables, there exists
a normal approximation to the binomial distribution from which can
benefit in our calculations
22 / 32
Normal approximation to the binomial
Any X ∼ B(n, 0.5) is symmetric distributed and it produces a
bell shape curve by smoothing the ends of the rectangles
To calculate probabilities of X using the normal distribution
requires find the area under the normal curve by applying a
continuity correction factor that equals .5 to each interval x
• Hence P(X = x) ≈ P(x − .5  Y  x + .5) where Y is a random
normal variable approximating X
• Also P(X x) ≈ P(Y x + .5) and P(X x) ≈ P(Y x − .5)
 In the case of a range of values of X we can omit the correction
factor, but the accuracy of the estimation is decreased
23 / 32
Binomial distribution and normal approximation
n = 100, p = .5
x
P(x)
30 40 50 60 70
0
.02
.04
.06
.08
24 / 32
Sampling distribution of a sample proportion
The expectations of ˆP assumes that the population is normally
distributed
E(ˆP) = p
V(ˆP) = σ2
ˆp =
p(1−p)
n
σˆp =
p(1−p)
n
The standard deviation of ˆP is known as the standard error of
the proportion
 Here we omitted the finite population correction factor (cf. definition
of σx) assuming that the sample size is relatively large
25 / 32
Example
finding ˆP
Last year 30% of the schools in town have installed our vending
machine cooler, and we want to see whether or not a proportion of
schools will continue using our machine in the next year
• If we make a random sample of 25 schools, what is the probability
that more than 35% of the sample schools will choose our machine?
Since we have just a success or failure we have a binomial
experiment with p = .30 with n = 25
We want to find P(ˆP  .35)
σˆp = p(1 − p)/n = (.30)(.70)/25 = .0917
P(ˆP  .35) =
ˆP−p
√
p(1−p)/n
 .35−.30
.0917
= P(Z  .545) = 1 − P(Z  .545) ≈ 1 − .705 = .295 = 30%
26 / 32
Difference between Two Means
Sampling distribution of
We can use a sampling distribution to calculate the
difference between two means
The assumptions are that the random samples are
independent from each other and they represent
populations normally distributed
Since both populations have a normal distribution, the
difference between two sample means is also normally
distributed
27 / 32
Expectations of X1 − X2
The expectations of the sampling distribution of the difference
between two sample means are:
µx1−x2
= µ1 − µ2
and the variance is:
σ2
x1−x2
=
σ2
1
n1
+
σ2
2
n2
The standard error of the difference between two means is:
σx1−x2
=
σ2
1
n1
+
σ2
2
n2
 for nonnormal populations we need sufficiently large sample sizes ( 30)
28 / 32
Example X1 − X2
Our company’s vending machines electricity consumption is normally
distributed with mean of 460 kwh and standard deviation of 5 kwh.
A rival company produces vending machine coolers with normally
distributed consumption of electricity with 455 kwh on average and
10 kwh as standard deviation.
• What is the probability that the average of electricity consumption
of our company’s machines exceed the rival machines if we take
random samples of size 30 and 10 respectively?
i.e. P(X1 − X2  0) with µ1 − µ2 = 460 − 455 = 5 and
σx1−x2
=
σ2
1
n1
+
σ2
2
n2
= 52
30
+ 102
10
=
√
10.833 = 3.29
P(X1 − X2  0) = P
(X1−X2)−(µ1−µ2)
σ2
1
n1
+
σ2
2
n2
 0−5
3.29
= P(Z  −1.52) = 1 − P(Z  −1.52) = 1 − .0643 = .9357
29 / 32
Sampling distribution and Inference
Sampling distributions – rather probability distributions –
are commonly used for statistical inference
While a probability distribution refers to individual
observations, a sampling distribution refers to the values of
a statistic computed from those observations
Statistics computed through sampling distributions allows
us making inferences about the population parameters that
usually are unknown
30 / 32
Sampling distribution and Inference II
Population
 Parameters
Individual
Probability distribution
Population
 Parameters
Statistic
Sampling distribution
Statistic Parameter
Sampling distribution
31 / 32
Summary
Sampling distributions determines the probabilities for
sample statistics
Mean, proportion, and the difference between two means
are illustrations of sample statistics with their own type of
sampling distributions
We can make inferences about population parameters
though sample statistics computed by sampling distributions
32 / 32

More Related Content

What's hot

Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE Smitha Sumod
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boaraileeanne
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDouglas Joubert
 
Basic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture NotesBasic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture NotesDr. Nirav Vyas
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Descriptiongetyourcheaton
 
Data collection and presentation
Data collection and presentationData collection and presentation
Data collection and presentationferdaus44
 
Statistics
StatisticsStatistics
Statisticsitutor
 
Bcs 040 Descriptive Statistics
Bcs 040 Descriptive StatisticsBcs 040 Descriptive Statistics
Bcs 040 Descriptive StatisticsNarayan Thapa
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Business Statistics Chapter 3
Business Statistics Chapter 3Business Statistics Chapter 3
Business Statistics Chapter 3Lux PP
 
Business Statistics Chapter 1
Business Statistics Chapter 1Business Statistics Chapter 1
Business Statistics Chapter 1Lux PP
 
Data presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative ResearchData presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative ResearchJimnaira Abanto
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointSarah Hallum
 
Exploratory data analysis project
Exploratory data analysis project Exploratory data analysis project
Exploratory data analysis project BabatundeSogunro
 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theoryUnsa Shakir
 

What's hot (20)

Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
 
Day 3 descriptive statistics
Day 3  descriptive statisticsDay 3  descriptive statistics
Day 3 descriptive statistics
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Basic concepts of statistics
Basic concepts of statistics Basic concepts of statistics
Basic concepts of statistics
 
Basic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture NotesBasic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture Notes
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Description
 
Data collection and presentation
Data collection and presentationData collection and presentation
Data collection and presentation
 
Panel data content
Panel data contentPanel data content
Panel data content
 
Statistics
StatisticsStatistics
Statistics
 
Bcs 040 Descriptive Statistics
Bcs 040 Descriptive StatisticsBcs 040 Descriptive Statistics
Bcs 040 Descriptive Statistics
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Business Statistics Chapter 3
Business Statistics Chapter 3Business Statistics Chapter 3
Business Statistics Chapter 3
 
Business Statistics Chapter 1
Business Statistics Chapter 1Business Statistics Chapter 1
Business Statistics Chapter 1
 
STATISTICS
STATISTICSSTATISTICS
STATISTICS
 
Data presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative ResearchData presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative Research
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Exploratory data analysis project
Exploratory data analysis project Exploratory data analysis project
Exploratory data analysis project
 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theory
 

Similar to Business statistics-i-part1-aarhus-bss

Introduction to statistics and graphical representation
Introduction to statistics and graphical representationIntroduction to statistics and graphical representation
Introduction to statistics and graphical representationAMNA BUTT
 
STATS 101 WK7 NOTE.pptx
STATS 101 WK7 NOTE.pptxSTATS 101 WK7 NOTE.pptx
STATS 101 WK7 NOTE.pptxMulbahKromah
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)AhmedToheed3
 
Introduction to Statistics information.pptx
Introduction to Statistics information.pptxIntroduction to Statistics information.pptx
Introduction to Statistics information.pptxHarshiHarshika1
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfBensonNduati1
 
Chapter one Business statistics referesh
Chapter one Business statistics refereshChapter one Business statistics referesh
Chapter one Business statistics refereshYasin Abdela
 
Unit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptxUnit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptxshakirRahman10
 
Intro to quant_analysis_students
Intro to quant_analysis_studentsIntro to quant_analysis_students
Intro to quant_analysis_studentsmstegman
 
Business mathematics and statistics by G.Reka
Business mathematics and statistics by G.RekaBusiness mathematics and statistics by G.Reka
Business mathematics and statistics by G.RekaPOLIKAIYOOR REKA
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdfRaRaRamirez
 
Graphs in pharmaceutical biostatistics
Graphs in pharmaceutical biostatisticsGraphs in pharmaceutical biostatistics
Graphs in pharmaceutical biostatisticsVandanaGupta127
 
Quatitative Data Analysis
Quatitative Data Analysis Quatitative Data Analysis
Quatitative Data Analysis maneesh mani
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptxssuser03ba7c
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X ClassRanveer Kumar
 
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptxGraphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptxRanggaMasyhuriNuur
 

Similar to Business statistics-i-part1-aarhus-bss (20)

Introduction to statistics and graphical representation
Introduction to statistics and graphical representationIntroduction to statistics and graphical representation
Introduction to statistics and graphical representation
 
STATS 101 WK7 NOTE.pptx
STATS 101 WK7 NOTE.pptxSTATS 101 WK7 NOTE.pptx
STATS 101 WK7 NOTE.pptx
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
 
Ch 3 DATA.doc
Ch 3 DATA.docCh 3 DATA.doc
Ch 3 DATA.doc
 
Introduction to Statistics information.pptx
Introduction to Statistics information.pptxIntroduction to Statistics information.pptx
Introduction to Statistics information.pptx
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdf
 
Chapter one Business statistics referesh
Chapter one Business statistics refereshChapter one Business statistics referesh
Chapter one Business statistics referesh
 
Bio stat
Bio statBio stat
Bio stat
 
Unit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptxUnit 11. Interepreting the Research Findings.pptx
Unit 11. Interepreting the Research Findings.pptx
 
Intro to quant_analysis_students
Intro to quant_analysis_studentsIntro to quant_analysis_students
Intro to quant_analysis_students
 
Business mathematics and statistics by G.Reka
Business mathematics and statistics by G.RekaBusiness mathematics and statistics by G.Reka
Business mathematics and statistics by G.Reka
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdf
 
Basic Statistics to start Analytics
Basic Statistics to start AnalyticsBasic Statistics to start Analytics
Basic Statistics to start Analytics
 
Graphs in pharmaceutical biostatistics
Graphs in pharmaceutical biostatisticsGraphs in pharmaceutical biostatistics
Graphs in pharmaceutical biostatistics
 
Quatitative Data Analysis
Quatitative Data Analysis Quatitative Data Analysis
Quatitative Data Analysis
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx
 
Module 8-S M & T C I, Regular.pptx
Module 8-S M & T C I, Regular.pptxModule 8-S M & T C I, Regular.pptx
Module 8-S M & T C I, Regular.pptx
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X Class
 
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptxGraphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
Graphical Presentation of Data - Rangga Masyhuri Nuur LLU 27.pptx
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 

Recently uploaded

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Recently uploaded (20)

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

Business statistics-i-part1-aarhus-bss

  • 1. BUSINESS STATISTICS I Lectures Part 1 — Weeks 36 – 42 Antonio Rivero Ostoic School of Business and Social Sciences  September −  October  AARHUS UNIVERSITYAU
  • 2. BUSINESS STATISTICS I Lecture – Week 36 Antonio Rivero Ostoic School of Business and Social Sciences  September  AARHUS UNIVERSITYAU
  • 3. Today’s Agenda Fundamental Concepts in Statistics Types of Data Computing and Representing Data Types and practical information about the course... 2 / 34
  • 4. Definition of statistics “Statistics is a way to get information from data” (Keller, p. 1) ª Data is a set of observations, whereas information is the message However, statistics can also be viewed as methods for collecting and analysing data ª in this case the research design is a statistical procedure as well Statistics draw conclusion from numbers... 3 / 34
  • 5. Types of statistics Statistical analysis is classified as descriptive or inferential Descriptive The goal is to describe the data, i.e. organize, summarize, present... Instead for listing all observations, we summarize the data through numerical techniques or represent it through a graphical picture Descriptive statistics provides a typical mark of the data that is more meaningful than the complete listing We can also find patterns of the data through as an explorative phase of the analysis 4 / 34
  • 6. Types of statistics II Inferential In this case the goal is to make conclusions from the data Inferential statistics allows making generalizations from samples to the population values Statistical inferences are made through different kinds of test ª like hypothesis testing or tests of significance, tests of reliability, etc. We can also make predictions based on the data 5 / 34
  • 7. Key statistical concepts Population The final goal in statistics is to learn about populations Population constitutes the total set of subjects of interest in the study However in statistics, population —rather than be a particular group of individuals or cases— refers to a variable ª e.g. the teenagers downloading an app onto their mobile devices 6 / 34
  • 8. Key statistical concepts II Sample Inferences about populations are based on sample data Samples are selected individuals or cases of the population on which the study collects data Through samples is possible to study populations in a practical manner 7 / 34
  • 9. Key statistical concepts III Descriptive measures • For populations are parameters • For samples are statistics We use statistics to make inferences about parameters 8 / 34
  • 11. Population and sample II Population Sample SamplingInference 10 / 34
  • 12. Types of data • Both populations and samples are described in terms of variables • A variable is a characteristic that consist of two or more observed values, which constitute the data 11 / 34
  • 13. Types of data II A data type is classified according to the kind of variable or to the scale Variables: Quantitative – continuous – discrete Qualitative or categorical – discrete Scales: • Nominal • Ordinal • Interval ( Ratio) a score is the numerical value which indicates the quantity of a variable 12 / 34
  • 14. Data scales Types of variables are measured according to different scales Nominal labels used for categorical variables do not represent degree of difference cannot be ranked it is just possible to calculate the frequency of occurrences and compare these measures usually the responses are recorded using codes ª e.g. Gender, Nationality 13 / 34
  • 15. Data scales II Interval (and Ratio) used for quantitative variables there is order and the adjacent intervals between the points of the scale are of equal extent there is a degree of difference (Interval without a ratio) the measure has an arbitrary zero point (Interval), and an absolute zero point (Ratio) it is possible to calculate measures of location and dispersion ª e.g. Temperature ◦ C (Interval), Age (Ratio) 14 / 34
  • 16. Data scales III Ordinal for qualitative variables where the order of the values is significant there is a degree of difference with ranks typically measures of non-numeric concepts it is possible to calculate measures of location ª e.g. Degree of satisfaction, TRUE/FALSE (?) It is important to identify the scale and type of data to produce because it determines which statistical procedure we are going to use 15 / 34
  • 17. Hierarchy in treatment of data types Interval Ratio Ordinal Nominal 16 / 34
  • 18. Computing and representing data types Nominal data Recall that with nominal data we can only count the frequency of the different categories, which is typically given in a frequency distribution table The percentage of the counts represents a relative frequency Since the variable is qualitative, we can code the responses with numbers Single sets of data or one nominal variable are called univariate 17 / 34
  • 19. Frequency table (Example of interval data treated as nominal) COUNTRY % 2011 -------------------- 1 Belgium 27.3 2 Bulgaria 23.2 3 Czech Rep. 25.0 4 Denmark 30.7 5 Germany 26.1 6 Estonia 25.5 7 Spain 27.5 8 France 26.9 9 Croatia 25.7 10 Italy 24.0 11 Cyprus 40.3 12 Latvia 26.5 13 Lithuania 24.3 14 Malta 43.5 15 Netherlands 26.3 16 Austria 29.2 17 Poland 28.4 18 Romania 17.5 19 Slovenia 32.3 20 Slovakia 22.5 21 Finland 26.6 22 Sweden 27.3 23 Gr. Britain 29.5 24 Iceland 26.1 25 Norway 22.3 26 Turkey 19.1 27 United States 30.2 28 Japan 31.3 Source: Eurostat 18 / 34
  • 20. Bar chart (Example of interval data treated as nominal) Graphical methods Belgium Bulgaria Czech Rep. Denmark Germany Estonia Expenditure on education per capita (year 2011) COUNTRY %GDP 051015202530 19 / 34
  • 21. Pie chart (Example of interval data treated as nominal) Graphical methods Belgium 27.3 % Bulgaria 23.2 % Czech Rep. 25 % Denmark 30.7 % Germany 26.1 % Estonia 25.5 % Expenditure on education per capita (year 2011) 20 / 34
  • 22. Computing and representing data types II Ordinal data Ordinal data should be treated as nominal Frequency tables and charts are also used, but arranged in ascending or descending ordinal values ª bar charts with descending frequencies are also known as Pareto plots 21 / 34
  • 23. Pareto plot (Example of interval data treated as nominal) Graphical methods Denmark Belgium Germany Estonia Czech Rep. Bulgaria Expenditure on education per capita (year 2011) COUNTRY %GDP 051015202530 22 / 34
  • 24. Bivariate nominal data With bivariate nominal data there are either two variables or one pair of data sets in the analysis The relationship between two nominal variables are represented by a cross-tabulation table ª another term used is contingency table Graphically the bar charts are represented with two dimensions Tables and graphics with more than two dimensions are difficult to interpret 23 / 34
  • 25. Frequency table with two data sets Expenditure on education as a percentage of GDP per capita YEAR ================= COUNTRY 2001 2011 ----------------------------- 1 Belgium 25.6 27.3 2 Bulgaria 22.5 23.2 3 Czech Rep. 19.3 25.0 4 Denmark 28.9 30.7 5 Germany 25.3 26.1 6 Estonia NA 25.5 Source: Eurostat 24 / 34
  • 26. Bivariate bar chart (Example of interval data treated as nominal) Dodge 25.6 22.5 19.3 28.9 25.3 27.3 23.2 25 30.7 26.1 25.5 0 10 20 30 2001 2011 Year %GDP COUNTRY Belgium Bulgaria Czech Rep. Denmark Estonia Germany Expenditure on education per capita 25 / 34
  • 27. Computing and representing data types III Interval data Interval scale is used to represent quantitative information As with nominal data, we can still use frequency distribution tables for interval data, but categorizing the information in series of intervals called classes Classes of intervals do not overlap, and they cover the complete range of information The number of class intervals to choose is a function of the amount of observations Widths for the class intervals (called bins in the bar chart) is given by largest obs. − smallest obs. n 26 / 34
  • 28. Histogram Graphical methods Expenditure on education per capita (28 countries in Europe, USA, Japan) % GDP, year 2011 Frequency 15 20 25 30 35 40 45 02468101214 27 / 34
  • 29. Types of histograms Histograms can be either symmetric or skewed A special type of symmetric histogram is bell-shaped A skewed histogram to the right is positively skewed, and to the left is negatively skewed Histograms with a single peak are called unimodal, and with two peaks are bimodal 28 / 34
  • 30. Other graphical techniques for interval data • Stem-and-Leaf Display ª to represent small samples of scores • Ogive ª for (cumulative) relative frequency distribution • Line chart ª for time-series data • Pictograms... 29 / 34
  • 31. Bivariate interval data This is the relationship between two interval variables that is very common in the statistical analysis Graphically such relation is represented by a scatter diagram or scatterplot Scatter diagrams reveal forms of association among the observations Associations can be linearly related (weak or strong) with a positive or a negative direction, or else a nonlinear relationship Typically one variable depends on other variables(s) 30 / 34
  • 32. Scatter diagram Graphical methods 1 2 3 4 5 6 0 5 10 15 years Bonus($×.000) from Keller (xm16 1) 31 / 34
  • 33. Graphical presentation Recommendations A good graphical presentation should be concise, coherent, and clear to the viewer Well-designed graphics reveal substance rather than just form, and illustrate patterns of relations between variables Include the scales and marks in every axis with a caption that has a direct link to what the diagram wants to display Avoid distortion of the data 32 / 34
  • 34. Applications of statistics Traditional fields of applications of statistics in social sciences: • demography • econometrics • psychometrics • ... A rapidly growing discipline for organisation and marketing is business analytics = business + technology Emphasis on statistical exploration of big data Meaningful analysis in short time (or on the fly) Data-driven decision making 33 / 34
  • 35. Summary Statistics is usually concerned in studying populations by means of particular samples The main concerns of statistics is to describe the sampled data or to make inferences about the population There are different kinds of data, which can be classified according to the type of variable or its scale Nominal and ordinal data is for a qualitative variable, whereas interval data is for quantitative information Each data type has its own graphical techniques where is possible to combine two or more variables 34 / 34
  • 36. BUSINESS STATISTICS I Lecture – Week 37 Antonio Rivero Ostoic School of Business and Social Sciences  September  AARHUS UNIVERSITYAU
  • 37. Today’s Agenda 1. Numerical descriptive techniques 2. Data collection and sampling methods 2 / 36
  • 38. Review lecture week 36 Recall that our final goal in statistics is to learn about populations To make inferences on populations, we base the analysis on sample data We refer respectively as parameters and statistics the different measures for populations and samples 3 / 36
  • 39. Measures of central location Measures or indices of central location search for describe the center of the data The most popular measure of central location is the mean There are (at least) three types of means: • arithmetic mean, • geometric mean, • harmonic mean The arithmetic mean is simply called the ‘mean,’ and another name for it is the average • The mean equals to the sum of the scores in a sample divided by the sample size 4 / 36
  • 40. Arithmetic mean To define the (arithmetic) mean, sample observations are represented as x1, x2, . . . , xn where n is the sample size Population mean µ = N i=1 xi N where means ‘sum of’ Sample mean ¯x = n i=1 xi n * thus we use Greek letters for parameters and Latin letters for statistics 5 / 36
  • 41. Median The median is another popular measure of central location and it refers to the score in the ‘middle’ • The median calculated by ranking the scores and where half of them are above it and half of them are below for even number of scores, the median may equals to the average of the two scores in the middle 19, 17, 23, 29, 12 12, 17, 19, 23, 29 6 / 36
  • 42. Mode Another measure of central location that – unlike mean and median – can be applied to qualitative or nominal data is the mode • The mode is the most frequently occurring score or category in the data however the mode is not the frequency itself but a value 12 electricians, 30 nurses, and 4 clerks 7 / 36
  • 43. Location measures Summary For interval data the mean is the most appropriate measure of central location ª i.e. the average For ordinal data the median is a more suitable statistic For nominal data the only measure of location we have seen so far is the mode ª cf. Keller’s book view on this 8 / 36
  • 44. Measures of variability In descriptive statistics we are not only interested in measuring the central location of the data but also in the spread or dispersion of the observations Many times if we want to compare measures of location in different data sets or variables, we need to know the variability in the data For establishing the widths of the bins in a bar chart (cf. 26 in week 36) we have already seen a variability measure ª here we considered the numerical difference between the extreme values in class intervals 9 / 36
  • 45. Range The difference between the largest and the smallest score in a data set is known as the range Range = Largest observation − Smallest observation The range is the simplest measure of variability since it considers only two scores The disadvantage of the range is that do not tell us nothing about the other scores ª hence we can have two entirely different data sets with identical range 10 / 36
  • 46. Variance Variance is the average of the squared distances from the mean That is sum of(each score − mean score)2 number of scores Thus the larger the variance is, the more the scores differ on average from the mean 11 / 36
  • 47. Variance II Population variance σ2 = N i=1(xi − µ)2 N Sample variance s2 = n i=1(xi − x)2 n − 1 12 / 36
  • 48. Sample variance Sample variances are computed to estimate population variances The sample variance is however corrected by the mean, which implies that the squared differences from the mean is divided by n−1 ª it results that this formula gives a better estimate than without a correction But why we square the differences from the mean before averaging? One reason is to avoid the canceling effect, in which the sum of the positive and the negative deviations equals 0 The interpretation of the variance is not straightforward since by squaring the differences from the mean we transform the unit of the data set 13 / 36
  • 49. Standard deviation The standard deviation is the square root of the variance ª ‘deviation’ is the difference between each score and the mean sum of(each score − mean score)2 number of scores By computing the square root of the variance we preserve the original unit of the data set 14 / 36
  • 50. Standard deviation II Population standard deviation σ = √ σ2 = N i=1(xi − µ)2 N Sample standard deviation s = √ s2 = n i=1(xi − x)2 n − 1 15 / 36
  • 51. The empirical rule By knowing the mean, the standard deviation, and the type of distribution we can deduce relevant information of the data For a normal distribution – with a bell shaped curve – it is possible to apply the empirical rule ≈ 68% ≈ 95% ≈ 99.7% µ − 3σ µ − 2σ µ − σ µ µ + σ µ + 2σ µ + 3σ 16 / 36
  • 52. Dispersion measures Last issues Range, variance, and standard deviation are the most common measures of variability for interval data Are there measures of variation for ordinal and nominal data? The proportion of the standard deviation and the mean provides the coefficient of variation of the set of scores ª this statistics can guide us about the magnitude of the standard deviation in relation to the sample size A generalization of the empirical rule that applies to all shape distributions is found in the Chebysheff’s Theorem 17 / 36
  • 53. Percentiles Measure of relative standing In order to provide the position of particular values relative to the entire data set we count with measures of relative standing ª applicable to interval and ordinal data The Pth percentile is the value for which P% of the scores fall below that value and (100 − P)% fall above it Actually the median of a data set is a special case of percentile ª it is the 50th percentile Besides the median, there are other special cases of percentiles called quartiles that correspond to the 25th and 75th percentiles ª called lower and upper quartile respectively 18 / 36
  • 54. Percentiles II Location of a Percentile LP = (n + 1) P 100 Lower quartile, median, and upper quartile are labelled respectively as Q1, Q2, and Q3 The interquartile range is the difference between the upper and lower quartile, i.e. Q3 − Q1 19 / 36
  • 55. Box Plots and Outliers • A box plot is a graphical method used to represent quartiles together with the most extreme observations ª Box plots are useful in comparing different data sets • Outliers are unusually large or small observations ª ...and we suspect their validity 20 / 36
  • 56. Box Plot without outliers Example 0 5 10 15 20 25 30 4.21 4.19 score exercise 21 / 36
  • 57. Measures of Linear Relationship Between variables We have seen that it is possible to relate two quantitative variables in a statistical analysis The scatter diagram gives us an idea of the strength and direction of the linear relationship between two variables However the graphical information is quite loose and we can obtain more precise information of the linear relationship with different numerical measures 22 / 36
  • 58. Covariance As the name suggests, covariance is the variance that is shared between two variables, X and Y Population covariance σxy = N i=1(xi − µx)(yi − µy) N Sample covariance sxy = n i=1(xi − x)(yi − y) n − 1 23 / 36
  • 59. Correlation The proportion of the covariance and the product of the standard deviations results in the coefficient of correlation or just correlation ª that is another measure of linear relationship Population correlation ρ = σxy σxσy Sample correlation r = sxy sxsy where −1 { r, ρ } +1 If the statistic equals 0 then there is no linear relationship, whereas the extreme values denote perfect positive, and negative relationship ª otherwise the linear relationship is considered just as ‘weak’ 24 / 36
  • 60. Coefficient of Determination If we square of the correlation between two quantitative variables we obtain another measure of linear association called the coefficient of determination, r2 The coefficient of determination measures the amount of variation in Y that is explained by the variation in X ª thus a clearer indication of the relationship than the correlation coefficient Explained variation is important in statistics and it is at the core of regression analysis 25 / 36
  • 61. Method of Least Squares A linear relationship can be visualized in a scatter diagram by drawing a straight line through the scores We need to estimate the line that represents ‘best’ the sample data points An objective method of producing a straight line is the least squares method ª which minimizes the sum of squared deviations between the scores and the line 26 / 36
  • 62. Method of Least Squares II In a least squares line: ˆy = b0 + b1x ˆy represents the fitted value of y, whereas b0 and b1 are known as the regression coefficients that we need to estimate • b0 is the y-intercept: where the line crosses the y-axis • b1 is the slope: the rise/run of the line Least squares line coefficients b1 = sxy s2 x b0 = y − b1x 27 / 36
  • 63. Guidelines for Exploring Data We have seen that the descriptive statistic includes the exploration of the data Besides knowing the type of data we are dealing with, there are important aspects to take into account in the exploration – the central location in the data – the dispersion (or lack of) in the observations – the shape of the distribution for the data set All these information guide us in choose the appropriate numerical techniques to apply in further statistical analysis 28 / 36
  • 64. Data collection and Sampling We want to learn about populations when using statistics Except for the census – where all members of the population are the observations – inferences about populations are made by means of samples Sampling techniques are crucial for the statistical analysis, and they are part of the research design 29 / 36
  • 65. Methods for data collection There are different methods of collecting data that corresponds to a variable(s) of interest Observational data Direct Observation either by ‘observing’ behaviour in natural settings, or by ‘asking’ questions to certain people ª simple and inexpensive, but it has drawbacks • we do not have control over the subject (main disadvantage) • comparing groups is difficult 30 / 36
  • 66. Methods for data collection II Observational data Surveys are based on samples of people from populations from which we solicit information There are different types of sample survey: – Interviews (personal or telephonic) – Self-administrated questionnaire the questionnaires should be well designed for all survey types ª questions should be short, simple, and concise; start with demographic questions is recommended 31 / 36
  • 67. Methods for data collection III Experimental data Experiments provide data from which we can compare responses from subjects under different conditions that are known as treatments ª more expensive than direct observation, but provides better results • requires special planning to assign individuals to treatments • randomization in the sampling is needed • it is however quite uncommon in social sciences 32 / 36
  • 68. Sampling When we perform sampling, we serve from different sampling plans In a simple random sample or just random sample each case in the population has an equal chance of being selected ª this can be done through random number tables A stratified random sampling separates the population into mutually exclusive sets called strata, and then it draws a random sample in each stratum ª the population is ‘naturally’ divided into groups, i.e. the strata is recognized by an easily identifiable variable 33 / 36
  • 69. Sampling II With a cluster random sampling the population is divided into clusters and it draws random samples in each cluster ª useful for large populations with samples across strata In a systematic sampling each subject of the population is assigned a number and – starting at a random number – every kth member from then onwards is selected Many times a random sampling is based on a sampling frame, which is the list of cases from which the sample is selected All sampling techniques that involve a random sample constitutes a probability sampling 34 / 36
  • 70. Sampling Errors When taking observations from a population we may confront with different types of variability Sampling error is the variability of samples from the characteristics of the population from which they came ª e.g. the variability from the population mean Nonsampling error, which results from: – mistakes made in the acquisition of data – when responses are not obtained from some subjects in the sample (nonresponse error or bias) – when some parts of the population are not selected for the sample (selection bias) 35 / 36
  • 71. Summary We have seen important descriptive indices for the data, which include location, variability, and measures of dispersion ª these measures together with the shape of the distribution are crucial parts of the descriptive statistics of the data There are also numerical indicators when to calculate the strength of linear relationship between two quantitative variables Different methods for data collection and sampling are at the core of the research design ª we should avoid sampling and nonsampling variability 36 / 36
  • 72. BUSINESS STATISTICS I Lecture – Week 38a (38) Antonio Rivero Ostoic School of Business and Social Sciences  September  AARHUS UNIVERSITYAU
  • 73. Today’s Agenda Probability Discrete Random Variables (Binomial distribution) 2 / 31
  • 74. What is probability? Recall that we make inferences about populations (parameters) based on sample data (statistics) A link between population and sample lies can be found in probability theory Probability is a mathematical branch that was created to study the chance that a particular outcome will occur ª for gambling strategies in the 17th century that is actually before the development of statistics itself Many times probability is associated with random phenomena 3 / 31
  • 75. Assigning probabilities a) In the classical approach the probability is expressed in terms of the proportion of a particular outcome to all possible outcomes ª this is sort of an objective method b) For the relative frequency approach we are interested in the long-run relative frequency of the probability for an outcome to occur c) The subjective approach includes a certain degree of belief to the probability 4 / 31
  • 76. Probability of events Relative frequency In order to obtain all possible outcomes we perform a random experiment Random experiments have an exhaustive and mutually exclusive list of outcomes that constitutes the sample space, S = { O1, O2, . . . , Ok } Probabilities are expressed per event, and for each outcome they vary between 0 and 1 The sum of probabilities of all outcomes in a sample space must be 1 5 / 31
  • 77. Probability of events II Example The toss of a die – Sample space: S = {1, 2, 3, 4, 5, 6} – Probability of events (classical approach): P(i) = 1 6 how would these probabilities be in case we apply a relative frequency approach? 6 / 31
  • 78. Joint probability The joint probability is the probability of having two distinct traits, and in this case we perform the intersection of simple events For events A and B the intersection means that the event occurs when both A and B occur, i.e. A and B ª on the other hand, the union of events A and B corresponds to A or B Joint probability table: B1 B2 A1 P(A1 and B1) P(A1 and B2) A2 P(A2 and B1) P(A2 and B2) 7 / 31
  • 79. Joint probability II A and B S A B A B A or B S 8 / 31
  • 80. Marginal probability The computation with the marginal probability is by adding across rows or down columns of the joint probability table e.g. in the joint probability table: – Across row: P(A1 and B1) + P(A1 and B2) – Down column: P(A1 and B2) + P(A2 and B2) 9 / 31
  • 81. Conditional probability The conditional probability is the probability of any event occurring can be affected by another event For events A and B, the conditional probability of event A given B is: P(A | B) = P(A and B) P(B) i.e., the proportion of the joint probability for both events to the given event • In case that the probability of one event is not affected by other event, then these events are said to be independent to each other P(A | B) = P(A) P(B | A) = P(B) 10 / 31
  • 82. Other probability rules Complement P(A ) = 1 − P(A) that is the event that occur when A does not occur Addition P(A or B) = P(A) + P(B) − P(A and B) ª here we subtract the joint probability of the events because in the marginal probability is taken twice for mutually exclusive events their joint probability is zero 11 / 31
  • 83. Other probability rules II Multiplication for dependent events P(A and B) = P(B) P(A | B) = P(A) P(B | A) Multiplication for independent events P(A and B) = P(A) P(B) multiplication is used to compute the joint probability of two events 12 / 31
  • 84. Probability tree We can use a probability tree diagram to apply the probability rules to a given problem ª it displays all the outcomes of a sequence of branches Toss of a coin • tail tail1 2 head 1 21 2 head tail1 2 head 1 2 1 2 head head 1 2 · 1 2 = 1 4 head tail 1 2 · 1 2 = 1 4 tail head 1 2 · 1 2 = 1 4 tail tail 1 2 · 1 2 = 1 4 13 / 31
  • 86. The Monty Hall problem In the original TV show game ‘Let’s Make a Deal’ the participant has the choice to open one of three doors. One door is concealing a car and the other two doors a goat ← ← ← Once the participant makes his/her decision the host reveals another door containing a goat and then asks the participant: “Do you want to switch door or stay with your choice? ” For us the main question here is: Does it make any difference to switch from the original choice? 15 / 31
  • 87. Probability tree diagram of the Monty Hall problem • 2 3 2 1 2 3 1 22 3 1 3 1 2 2 1 2 1 3 probability Stay? Switch? 1 3 · 1 2 = 1 6 1 12 1 12 1 3 · 1 2 = 1 6 1 12 1 12 2 3 · 1 2 = 1 3 1 6 1 6 2 3 · 1 2 = 1 3 1 6 1 6 Initial probability for the car is 1 3 , and for a goat 2 3 . However when the host opens a door then the probability for this door becomes 0, and 2 3 for the other door 16 / 31
  • 88. Bayesian probability Bayesian probability is one of the different interpretations of the concept of probability Here a probability is assigned to a hypothesis, whereas under the frequentist view a hypothesis is typically tested without being assigned a probability ª hence Bayesian probability is related with the subjective approach We have already seen that with conditional probability we can measure the chance that an event occur given the occurrence of another event With Bayesian probability we can compute the chance of the possible causes for a particular event to occur 17 / 31
  • 89. Bayes’ Theorem The Bayes’ Theorem is a rule that provides the conditional probability of A occurring given that B already happened P(A | B) = P(A) P(B | A) P(B) Event A constitutes the hypothesis, whereas B is the observation • P(A | B) is the posterior probability of A ª an updated degree of belief • P(A) is the prior probability of A ª that is before the observation (with known probabilities) • P(B | A) is the likelihood of A ª i.e. the probability that the hypothesis confers upon the observation • P(B) is the unconditional probability of B ª i.e. the probability of the observation irrespective of any hypothesis 18 / 31
  • 90. Bayes’ Theorem II More specifically, the Bayes’ Theorem can be restated for a given event B and events A1, A2, . . . , Ak where: • P(A1 | B), P(A2 | B), . . . , P(Ak | B) are the posterior probabilities we seek • P(A1), P(A2), . . . , P(Ak) are the prior probabilities • P(B | A1), P(B | A2), . . . , P(B | Ak) are the likelihoods Bayes Formula P(Ai | B) = P(Ai) P(B | Ai) P(A1) P(B | A1) + P(A2) P(B | A2) + · · · + P(Ak) P(B | Ak) 19 / 31
  • 91. Example Bayes’ Theorem • In a class where 60% were female, the probability of passing the test was 90% for females and 70% for males. What is the probability of someone passing the test being female? • A1 and A2 are the proportions of being female and male respectively, whereas B represents passing the test • We need to calculate P(A1 | B): .9 × .6 (.9 × .6) + (.7 × .4) = .66 the prediction for a female passing the test has increased due to the added information 20 / 31
  • 92. Summarizing Probabilities Correct method When the joint probabilities are given we can compute the marginal probabilities, which allows us to compute conditional probabilities ª with conditional probability we can see whether the events are independent or dependent When the joint probabilities are required we can apply probability rules and probability trees – multiplication for intersection – addition for mutually exclusive events – Bayes formula for new conditional probabilities 21 / 31
  • 93. Discrete Random Variables A random variable is a variable from which its values has not been chosen ª in a fixed variable on the other hand the values are previously selected Examples of random variables are the outcomes of flipping a coin or rolling a die, and such actions are called experiments Two types of random variables: • Discrete: with countable number of values ª e.g. flipping a coin whose values are the number of occurrences for each possible outcome of the random variable • Continuous: when the values are uncountable ª e.g. amount of time to complete a task 22 / 31
  • 94. Discrete Probability Distributions For a discrete random variable the probability distribution describes the associated probability of its possible outcome values Each probability of a random variable is a quantity between 0 and 1, and the sum of the probabilities of all possible values equals 1 That is, for x representing the outcome of a random variable X, and P(x) the probability of that outcome: 0 P(x) 1 and all x P(x) = 1 23 / 31
  • 95. Population and probability distributions Probability distributions can be used as representatives of populations as well Both the population mean and variance have their counterparts on parameters corresponding to a given probability distribution • The mean of a probability distribution for a discrete variable X is called the expected value of X and it is represented by E(X) Mean of a probability distribution E(X) = µ = all x x P(x) 24 / 31
  • 96. Population and probability distributions II Variance of a probability distribution V(X) = σ2 = all x (x − µ)2 P(x) • The standard deviation of a probability distribution (σ) equals to square root the variance, i.e. σ = √ σ2 If the probability distribution is approximately bell shaped then we can apply the Empirical Rule to interpret σ 25 / 31
  • 97. Laws of Expected Value and Variance To quickly determine the expected value and variance of a given constant or a random variable we use specific rules X → a random variable c → a constant Expected Value 1. E(c) = c 2. E(X + c) = E(X) + c 3. E(cX) = cE(X) Variance V(c) = 0 V(X + c) = V(X) V(cX) = c2 V(X) 26 / 31
  • 98. Probability distributions involving two variables Recall that P(x) represents the probability that a random variable X equals x ª in this case we are considering a single variable It is possible to determine the probabilities for combinations involving two variables X and Y, which is represented as: P(x, y) with conditions that the outcome for all pairs of values vary between 0 and 1, and the sum of probabilities in the sample space is 1 i.e. 0 P(x, y) 1 and all x all y P(x, y) = 1 27 / 31
  • 99. Probability distributions involving two variables II While univariate distributions represented the distribution of one variable, with two variables such representation is made by a bivariate distribution It is possible to obtain both the joint probability and the marginal probability of any bivariate probability distribution The marginal distribution will provide us with the expected mean, variance, and SD for each variable However, with the association of two variables we can compute the covariance and the coefficient of correlation as well ª as for the linear relationship (cf. last lecture), but now involving probabilities 28 / 31
  • 100. Covariance and Correlation Covariance Cov(X, Y) = σxy = all x all y (x − µX)(y − µY) P(x, y) i.e. the product of the deviations from the mean for X and Y, and their joint probability Correlation ρ = σxy σxσy as before, the proportion of the covariance of the two variables and their SDs 29 / 31
  • 101. Sum of Two Variables and expected outcomes One combination that has practical applications is the sum of two variables Laws of Expected Value and Variance 1. E(X + Y) = E(X) + E(Y) 2. V(X + Y) = V(X) + V(Y) + 2Cov(X, Y) if the variables are independent, then the covariance is 0 30 / 31
  • 102. Summary Probability deals with computing the chance for particular outcomes in a given set of events, and there are different approaches in assigning probabilities of events With joint probabilities we can calculate marginal and conditional probabilities and hence determine whether or not events are independent Probability trees are useful to compute probability models with sequence of actions Arbitrary phenomena is associated with probability through random variables, and such types of variables are described by probability distributions and rules of expected values 31 / 31
  • 103. BUSINESS STATISTICS I Lecture – Week 38b (39) Antonio Rivero Ostoic School of Business and Social Sciences  September  AARHUS UNIVERSITYAU
  • 104. Today’s Agenda 1. Binomial distribution 2. Poisson distribution 3. Hypergeometric distribution 2 / 27
  • 105. Probability Distributions Recall that we introduced the concept of probability distribution, which describes the associated probability of possible outcomes for a random variable ª a random variable is a numerical outcome of an experiment The distribution of such values is a very important piece of information since provides us with the pattern of characteristics of the variable values over the sample or population Depending on the type of variable and the scale of the data, we find a variety of distributions of populations or sampled data In case we consider a single variable then the data is allocated in univariate distributions 3 / 27
  • 106. Binomial experiment If e.g. we toss a coin n times and count the number of ‘heads’ then we perform a binomial experiment in a set number of trials ª ‘binomial’ because there are two possible outcomes In this case the outcomes of the trials are not affected by the outcomes of other trials, which means that the trials are independent to each other By counting the number of heads, then a head is regarded a success and a tail is considered as a failure ª it can certainly be the other way around The probability of success is denoted by p, and the probability of failure as 1 − p ª we try to estimate the value of p, which is between 0 and 1 4 / 27
  • 107. Binomial random variable The numbers of successes in the binomial experiment is called the binomial random variable ª A binomial random variable is discrete and can take on countable values 0, 1, 2, . . . , n If we represent the number of successes in n trials as a random variable X, then the number of failures becomes n − X The probability for each sequence of branches in the probability tree representing x successes and n − X failures is: px (1 − p)n−x and the number of branch sequences for these outcomes is: n x = n! x!(n − x)! (cancelling) where e.g. n! = n(n − 1)(n − 2) . . . (2)(1) (called ‘n factorial’) (but 0! = 1) 5 / 27
  • 108. Binomial Random Variable Examples Examples of a binomial random variable are: • the number of correct guesses at n true/false questions when you randomly guess all answers • the number of left-handers in a randomly selected sample of n unrelated people with replacement • the number of (...) 6 / 27
  • 109. Binomial distribution Any binomial experiment is described by a binomial distribution The probability of x successes in a binomial experiment is: P(x) = n x px (1 − p)n−x where P(x) is the probability of success for x = 0, 1, 2, . . . , n The formula above is known as the probability mass function (p.m.f.) corresponding to the binomial distribution 7 / 27
  • 110. Binomial distribution n = 30 0 5 10 15 20 25 30 0.00 0.05 0.10 0.15 x P(x) p = .25 p = .5 8 / 27
  • 111. Binomial distribution n = 30 0 5 10 15 20 25 30 0.00 0.05 0.10 0.15 x P(x) p = .25 9 / 27
  • 112. Binomial distribution n = 30 0 5 10 15 20 25 30 0.00 0.05 0.10 0.15 x P(x) p = .25 p = .5 p = .75 10 / 27
  • 113. Binomial distribution n = 30 0 5 10 15 20 25 30 0.00 0.05 0.10 0.15 x P(x) p = .25 p = .5 p = .75 11 / 27
  • 114. Expectations of the Binomial distribution If a random variable X has a binomial distribution, we write X ∼ B(n, p) meaning that X is a binomially distributed random variable Each random variable with a binomial distribution has the same p.m.f., but they may have different values for the parameters The parameters of the distribution are n and p, and both are known, which means that for a binomial random variable we can determine the expected values • E(X) = µ = np • V(X) = σ2 = np(1 − p) • SD(X) = σ = V(X) 12 / 27
  • 115. Example Binomial distribution The testing center (cf. Ex 15) shows that 14% of the new cars have a defect Suppose that the center tests 20 new cars in a daily basis • What is the probability that the center finds just one defect new car in a day? i.e. P(X = 1) = 20 1 .141 (1 − .14)20−1 = .16 That is 16% 13 / 27
  • 116. Cumulative Binomial Probabilities We can use the cumulative probability if we want to calculate the probability that a random variable is less than or equal to a value That is, if we wish to determine P(X x) P(X x) = x i=0 n i pi (1 − p)n−i where x is the greatest integer x (called the ‘floor’ under x) this means that for P(X 1) = P(0) + P(1), and so on... All such values are recorded in tables of binomial probabilities with tabulated scores for different probabilities and trials with diverse size 14 / 27
  • 117. Binomial Table n = 5; p = .01, .05, .10, .20, .25 and x 4 TABLE 1 Binomial Probabilities Tabulated values are . (Values are rounded to four decimal p n ‫؍‬ 5 p k 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0. 0 0.9510 0.7738 0.5905 0.3277 0.2373 0.1681 0.0778 0.0 1 0.9990 0.9774 0.9185 0.7373 0.6328 0.5282 0.3370 0.1 2 1.0000 0.9988 0.9914 0.9421 0.8965 0.8369 0.6826 0.5 3 1.0000 1.0000 0.9995 0.9933 0.9844 0.9692 0.9130 0.8 4 1.0000 1.0000 1.0000 0.9997 0.9990 0.9976 0.9898 0.9 n ‫؍‬ 6 p k 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0. 0 0.9415 0.7351 0.5314 0.2621 0.1780 0.1176 0.0467 0.0 P(X … k) = a k x=0 p(xi) thus for n = 5; p = .1 P(X 2) = .9914 (...) 15 / 27
  • 118. Poisson Random Variable Like in a binomial random variable, a Poisson random variable corresponds to the number of occurrences of events or successes ª named after S. D. Poisson However in a Poisson random variable the number of successes is considered in an interval time or specific region of space in a Poisson experiment Intervals are independent of each other, do not overlap, and the probability of a success in an interval is proportional to its size ª hence intervals with equal size have the same probability and an interval approaches to 0 when it is very small 16 / 27
  • 119. Poisson Random Variable Examples Examples of a Poisson random variable are: • the number of customers’ queueing (in a shop, a call center, a public service) in a unit of time • the number of hits on a Web site in a day • the number of goals scored by a football team in a match 17 / 27
  • 120. Poisson distribution A Poisson experiment is described by a Poisson distribution X ∼ P(µ) µ is the expected value (mean) parameter of the underlying rate of occurrence in an interval or region (also written as λ) ª in this case the rate of occurrence is known and constant The probability mass function for the Poisson distribution is: P(x) = e−µµx x! for a value of x = 0, 1, 2, . . . successes in a given interval or region (theoretically with no upper limit) e is a constant approx. 2.718 (Euler’s number) that is the base of the natural logarithm 18 / 27
  • 121. Poisson distribution 0 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 x P(x) µ = 1 µ = 2 µ = 3 19 / 27
  • 122. Expectations of the Poisson distribution In the Poisson distribution the variance of is equal to its mean, and the standard deviation of is equal to the square root of its mean E(X) = V(X) = µ = σ2 20 / 27
  • 123. Example Poisson distribution Suppose a Website has 1.8 hits on average per minute • What is the probability of receiving 5 hits in a given minute? i.e. P(X = 5) = e−1.81.85 5! = .026 That is 3% 21 / 27
  • 124. Hypergeometric experiment The hypergeometric distribution is a probability distribution used to describe the outcomes produced with a hypergeometric experiment Here a sample of size n is randomly selected without replacement from a population of N items, which means that once a particular outcome has been selected it cannot be picked again ª this contrasts to the binomial experiment where the probability of x successes in the trials is with replacement – Sampling with replacement: it is possible to select the same item again, and the size of the population remains the same ª e.g. tossing a coin – Sampling without replacement: it is not possible to select the same item again, thus the size of the remaining population changes as we remove each item ª e.g. picking a black ball from an urn containing black and white balls 22 / 27
  • 125. Hypergeometric random variable In a given population size N, k items are classified as successes and N − k items are categorised as failures A hypergeometric random variable X corresponds to the number of successes in a sample size n, and it can take one of 0, 1, 2, . . . , n values The probability of x = 0, 1, 2, . . . , n successes is described by the p.m.f. of the hypergeometric distribution of X as: P(x) = (k x)(N−k n−x) (N n) this is the proportion of • the number of samples of n items that contain exactly x successes chosen from k and n − x failures chosen from (N − k) • to the number of possible samples that can be drawn from the population 23 / 27
  • 126. Hypergeometric distribution 0 2 4 6 8 10 0.0 0.1 0.2 0.3 N = 52, n = 10, k = 16 24 / 27
  • 127. Expectations of the hypergeometric distribution • E(X) = µ = n k N • V(X) = σ2 = n k N 1 − k N N−n N−1 • SD(X) = σ = V(X) 25 / 27
  • 128. Example Hypergeometric distribution A graduate statistics course has 7 male and 3 female students. The teacher wants to select 2 students at random to help her conduct a research project. • What is the probability that the two students chosen are female? (solved) • What is the probability that the one student chosen is female? i.e. P(X = 1) = (3 1)(10−3 2−1 ) (10 2 ) = .4666667 that is 21 90 + 21 90 = 47% (c.f. fig.6.2, pp193) 26 / 27
  • 129. Summary Discrete probability distributions The binomial distribution measures the probability of the number of successes over a specific number of trials with replacement The Poisson distribution measures the probability of a number of events occurring within a given time interval The hypergeometric distribution measures the probability of a specified number of successes over a specific number of trials without replacement from a finite population 27 / 27
  • 130. BUSINESS STATISTICS I Lecture – Week 39 (40) Antonio Rivero Ostoic School of Business and Social Sciences  September  AARHUS UNIVERSITYAU
  • 131. Today’s Agenda 1. Review on Distributions 2. Continuous Random Variables 3. Uniform Distribution 4. Normal Distribution 2 / 31
  • 132. Review on Distributions Recall that probability distributions serve to describe random variables From a given probability distribution, there are two important pieces of information that we can obtain: – what are the values that the variable takes – how often the variable takes these values With the depiction of such information with graphical methods then we get a shape that characterizes either the sampling or the population representing the random variable 3 / 31
  • 133. Modality of a distribution An important characteristic of a probability distribution is its modality The data depicted graphically by a bell shaped curve is an example of unimodal distribution ª this is because there is a single peak that represents the mode Hence a data set which has two equally common modes produces a bimodal distribution with two peaks Also, a multimodal distribution is a distribution of scores having more than two modes 4 / 31
  • 134. Skewness and Kurtosis Other important measures of the shape of a probability distribution are: 1) Skewness that measures the degree of asymmetry of a distribution ª each type of probability distribution has its own formula to calculate the skewness of the shape distribution, and perfectly symmetric distributions have zero skewness 2) Kurtosis that measures the degree of ‘peakedness’ of the distribution ª as with the skewness, each distribution has a formula to calculate the kurtosis 5 / 31
  • 135. Continuous (and Discrete) Data Recall that quantitative variables can be discrete or continuous Continuous data is uncountable in the sense that has continuum possible values in a range ª this is opposed to the discrete data that can take relatively few different values Examples of continuous data are time, the height of a person, etc. However continuous variables must be rounded when measuring and we usually think of them as discrete ª we say e.g. that an individual is 20 years old and not between 20 and 21 continuous data has always an interval scale, whereas discrete data can take any scale 6 / 31
  • 136. Continuous Random Variable A continuous random variable serves to represent continuous data, and it takes an infinite number of possible values Probability distributions treat very different the discrete and the continuous random variables Since there are theoretically an infinite number of values in a continuous random variable then ª it is not possible to list all possible values, and ª the probability of each individual value is practically 0 Thus probability distributions for continuous random variables consider just the range of the values Then we estimate the probability that a randomly selected outcome falls within a determinate range, which is the interval 7 / 31
  • 137. Approximation Function for continuous random variables Recall that in the case of discrete distributions a probability mass function was used to approximate the probability distribution for P(x) In the case of continuous random variables the probability distribution is characterized by a curve that is determined by a function as well However such approximation is made by a probability density function (p.d.f.) or just density that is represented as f(x) The conditions for a probability density function with a range a x b is that f(x) 0 for all x, and the total area under the curve between a and b is 1 ª a and b represent the most extreme values of the data 8 / 31
  • 138. Approximation Function II In order to calculate the probability of any interval in a probability continuous distribution, we need to find the area under the curve In such case the integral of the density of the variable over the range provides the probability of the random variable falling within this particular range ª with the use of integral calculus 9 / 31
  • 139. Continuous Uniform distribution A distribution that has constant probability is found in the uniform distribution ª actually there is both a discrete and a continuous version of this distribution Another name for the uniform distribution is multimodal Examples of the uniform distribution are: • the amount of milk distributed daily in a given town • the amount of electricity that a soft drink cooler machine consumes per month • (...) 10 / 31
  • 140. Probability density function: Uniform distribution The probability density function of the uniform distribution for a x b is f(x) = 1 b − a and f(x) = 0 iff x a or x b Besides, the probability that a continuous random variable that is uniformly distributed equals any individual value is 0 The uniform distribution is depicted by a rectangle with height f(x) and for P(x1 X x2) the base is x2 − x1 i.e. P(x1 X x2) = (x2 − x1) × 1 b − a 11 / 31
  • 142. Uniform distribution plot P(x1 X x2) x f(x) x1 x2a 1 b−a b x 13 / 31
  • 143. Example Uniform distribution A vending machine consumes per year between 420 kWh and 500 kWh. • What is the probability that the vending machine consumes at least 480 kWh? P(X 480) = (500 − 480) × ( 1 80 ) = 0.25 • What is the probability that the vending machine consumes at most 480 kWh? P(X 480) = 1 − P(X 480) • What about the probability that the vending machine consumes precisely 500 kWh? P(X = 500) = 0 14 / 31
  • 144. Normal distribution The most important distribution in statistics is the normal distribution ª which is symmetric and it has a bell shaped curve Its importance is partly because it approximates well the distributions of many types of variables ª in such cases the sampled data tend to be approximately bell-shaped The properties of the normal distribution play a crucial role in statistical inference ª that is even when the sample data are not bell-shaped Other names for the normal distribution are the Gaussian distribution, the Z distribution... 15 / 31
  • 145. Normal distribution II Each normal distribution has two parameters, the mean µ and the standard deviation σ, and the exact form of the distribution depends on the values of these parameters ª we know from the empirical rule that most of the scores will fall within e.g. three standard deviations of the mean A special case of the normal distribution is the standard normal distribution Z that is a normal distribution with mean µ = 0 and standard deviation σ = 1 Examples of the normal distribution are: • heights of people • errors in measurements • (...) 16 / 31
  • 146. Normal Density Function The normal distribution is the probability distribution for a normal random variable X ∼ N(µ, σ2 ) The probability density function of a normal random variable −∞ x ∞ is: f(x) = 1 σ √ 2π e−1 2 (x−µ σ ) 2 where e ≈ 2.7183 (Euler’s number), and π ≈ 3.1416 (Pi) And Z ∼ N(0, 1) 17 / 31
  • 148. Normal distributions different means, same σ 3 6 x f(x) the shape remains the same when only the mean changes its value ª increasing [decreasing] mean shifts the curve to the right [left] 19 / 31
  • 149. Normal distributions different variances, same µ µ x f(x) the shape becomes flatter, the bigger the variance (and SD) is 20 / 31
  • 150. Computing normal probabilities Like the uniform distribution, the probability of a normal random variable falls into an interval, and we must calculate the area of the interval under the curve However since the shape of the normal distribution is not a rectangle anymore, in this case the function is more complicated Then we use a probability table to calculate the probability of a normal random variable ª as we did with binomial and Poisson probabilities Fortunately we just need one table of probabilities by standardizing the random variable 21 / 31
  • 151. Standard normal random variable A standard normal random variable Z equals to the difference between the score and the mean in X divided by its standard deviation: Z = X − µ σ A positive standardized score (or z-score) indicates a datum above the mean, and a negative standardized score indicates a datum below the mean A ‘Z transformation’ means that the probabilities for X are now translated into statements for Z We use the cumulative standardized normal probabilities for P(Z z), which indicates the relative frequency of z-scores ª Keller’s book has a table for −3.09 z 3.09 (others by approximation) 22 / 31
  • 152. Normal distribution for P(X (µ + σ)) µ µ + σ x 23 / 31
  • 153. Normal distribution for P(Z 1) 0 1 z 24 / 31
  • 154. Cumulative standardized normal probabilities for P(Z z) B-8 A P P E N D I X B TABLE 3 Cumulative Standardized Normal Probabilities Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 -3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010 -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014 -2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019 -2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026 -2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036 -2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048 -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064 -2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084 -2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110 -2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143 -2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183 P(- q 6 Z 6 z) 0z . Z 0.00 0.01 0.02 0.03 0.04 0.0 0.5000 0.5040 0.5080 0.5120 0.516 0.1 0.5398 0.5438 0.5478 0.5517 0.555 0.2 0.5793 0.5832 0.5871 0.5910 0.594 0.3 0.6179 0.6217 0.6255 0.6293 0.633 0.4 0.6554 0.6591 0.6628 0.6664 0.670 0.5 0.6915 0.6950 0.6985 0.7019 0.705 0.6 0.7257 0.7291 0.7324 0.7357 0.738 0.7 0.7580 0.7611 0.7642 0.7673 0.770 0.8 0.7881 0.7910 0.7939 0.7967 0.799 0.9 0.8159 0.8186 0.8212 0.8238 0.826 1.0 0.8413 0.8438 0.8461 0.8485 0.850 P(- q 6 Z 6 z) 0 z TABLE 3 (Continued) 25 / 31
  • 155. Normal distribution for P(Z 2) 0 2 z P(Z 2) = 1 − P(Z 2) = .0228 26 / 31
  • 156. Normal distribution for P(Z ZA) = A 0 zA z A ZA equals the 100(1 − A)th percentiles of Z 27 / 31
  • 157. Normal distribution for P(−1 Z 1) −1 0 1 z P(−1 Z 1) = P(Z 1) − P(Z −1) = .6826 28 / 31
  • 158. Expectations of the normal distribution Normal distribution • E(X) = µ • V(X) = σ2 • SD(X) = σ Standard normal distribution • E(Z) = 0 • V(Z) = 1 • SD(Z) = 1 29 / 31
  • 159. Example Standard normal distribution • Find z.075 1 − .075 = .925 Z.075 ≈ 1.44 • Find z−.075 Z−.075 ≈ −1.44 30 / 31
  • 160. Summary Probability distributions representing random variables have their own modality, and measures of skewness and kurtosis Continuous random variables represent uncountable data that is infinite in theory, and the data is treated with different intervals Uniform distributions have a constant probability and it is represented by a rectangle where the product of the density is the height and the difference between extreme values is the base The normal distribution has a bell shaped curve and it plays a central role in statistics because it properties By standardizing the normal random variables is possible to obtain cumulated normal probabilities for relative frequency scores 31 / 31
  • 161. BUSINESS STATISTICS I Lecture – Week 40a (41) Antonio Rivero Ostoic School of Business and Social Sciences  September  AARHUS UNIVERSITYAU
  • 162. Today’s Agenda Continuous Random Variable Distributions and Exponential distribution 1. Student t Distribution 2. Chi-Squared Distribution 3. F Distribution 2 / 32
  • 163. Why other continuous distributions? and not just the normal distribution Despite many nice properties of the normal distribution, a major concern is the derivation of the p.d.f. for this distribution ª we need to count with the values of two parameters (µ and σ2 ) However, many times we do not have a variability parameter for the population Even more important we need to have the appropriate statistics for small samples There are other distributions that represent better than N small samples, asymmetric data, or data with many outliers ª even more some distributions only need a single parameter 3 / 32
  • 164. Example: Exponential distribution For instance, the exponential distribution requires one parameter whose reciprocal transformation equals both the mean and the standard deviation of the random variable ª i.e. µ = σ = 1/λ Thus the distribution is completely specified by a known parameter Its probability density function for x 0 and a parameter λ 0 is: f(x) = λe−λx e ≈ 2.7183 The associated probabilities are: • P(X x) = e−λx • P(X x) = 1 − e−λx • P(x1 X x2) = e−λx1 − e−λx2 4 / 32
  • 165. Example: Exponential distribution plot different values of λ x f(x) λ = .5 λ = 1 λ = 2 5 / 32
  • 166. Student t Distribution The t distribution or Student t distribution is a distribution commonly in statistical inference ª ‘Student’ was the pseudonym of W.S. Gosset who derived this distribution, and he used the letter t to represent the random variable The Student t distribution represent the distribution of t values that varies according to the sample size ª the larger sample is then resembles more to a Z or normal distribution ª the smaller the sample is, the flatter the distribution becomes The t distribution depends on a single parameter called the degrees of freedom that is represented by ν or sometimes by df ª and the exact shape of a distribution is determined by this parameter degrees of freedom – loosely speaking – implies values that are ‘free to vary’ or ‘not fixed by any parameter or scores’ 6 / 32
  • 167. t density function The probability density function of the t distribution is: f(t) = Γ ν+1 2 Γ(ν 2 ) √ νπ 1 + t2 ν −ν+1 2 with a parameter ν 0, which is for the degrees of freedom; π ≈ 3.1416, and the gamma function Γ hence T ∼ tν has a t-distribution with ν degrees of freedom 7 / 32
  • 168. Student t distribution plot 0 t 8 / 32
  • 169. t and Z distributions 0 Z t While N is bell shaped, the t-distribution is mound shaped 9 / 32
  • 170. t distribution, different degrees of freedom 0 t ν = 2 ν = 5 ν = 50 The shape resembles more the normal distribution with larger ν values 10 / 32
  • 171. Student t random variable As with the other random variables we have seen so far, we can produce values for a Student t random variable through a statistical experiment ª then we can calculate the probability for such variable The expected value and variance for a Student t random variable with ν degrees of freedom are: E(t) = 0 V(t) = ν ν−2 for ν 2 11 / 32
  • 172. t distribution for tA with a ν 0 tA t A P(t TA,ν) = A 12 / 32
  • 173. Computing t probabilities Computing probabilities implies calculating areas now under the t distribution curve, and to achieve this we use probability tables The probability table for the t distribution gives the probabilities of exceeding critical values that are determined by different ν ª in Keller’s book the table is given for some degrees of freedom from 1 to 200 and ∞ (otherwise use approximation) Only the right tail probability is given but, since the t distribution curve is symmetric around 0, then the left table equals to −tA,ν Notice as well that the different t values approximates z scores when ν approximates to ∞ 13 / 32
  • 174. Applications of the t distribution We can use the t distribution to test and obtain confidence intervals of the mean in a normally distributed population when the variance is unknown ª (cf. lecture week 46) Also we can compare two expected values for normally distributed populations with unknown variances ª (cf. lectures week 49 and 50) We can perform tests and confidence intervals in correlation and regression analyses 14 / 32
  • 175. Chi-squared distribution The Chi-squared or χ2 distribution is another type of distribution commonly used in hypothesis testing ª as with the t distribution, a χ2 distribution depends on a single parameter, which is the number of degrees of freedom that shapes the distribution The χ2 distribution is the sum of squares of ν independent standard normal random variables i.e., for Z1, Z2, . . . , Zν where ν 0, and each Zi ∼ N(0, 1) and are independent to each other, then Z2 1 + Z2 2 + · · · + Z2 ν ∼ χ2 (ν) thus there are ν variables that represent the number of degrees of freedom we can choose independently of ‘freely’ 15 / 32
  • 176. Chi-squared density function The χ2 density function is: f(χ2 ) = 1 Γ ν 2 2 ν 2 (χ2 ) ν 2 −1 e−χ2 2 for χ2 0 with ν 0, e ≈ 2.7183, and the gamma function 16 / 32
  • 178. Chi-squared distribution, different df 0 χ2 f(χ2 ) ν = 1 ν = 4 ν = 5 ν = 10 18 / 32
  • 179. Chi-squared distribution shape In this case – unlike the t distribution – the values of the random variable in the χ2 are not positioned around 0, but they are rather concentrated on positive values The values of the random variable for a particular sample range from 0 to ∞ Although with a large number of degrees of freedom the shape of the chi-squared distribution resembles the normal distribution, it is nevertheless continually skewed to the right In this case the mean is greater than the median, which means that is positively skewed 19 / 32
  • 180. Chi-squared random variable A χ2 random variable is produced by a statistical experiment The expected value and variance for a chi-squared random variable with ν degrees of freedom are: E(χ2) = ν V(χ2) = 2ν 20 / 32
  • 181. Computing χ2 probabilities To calculate probability values for a χ2 random variable implies (again) computing areas under the shape curve χ2 A,ν represents the right area under the chi-squared curve ª i.e. the right tail However, since the shape of the distribution is not symmetric, we cannot apply for the left tail anymore −χ2 A,ν This means that if we compute A, then χ2 1−A,ν represents the point such as the left area is A Critical values of the chi-squared distribution for different probabilities and df are recorded in the χ2 probability tables ª for ν 100 can be approximated by N with µ = ν and σ = √ 2ν 21 / 32
  • 182. Chi-squared critical regions 0 χ2 1−A χ2 A A A χ2 f(χ2 ) 22 / 32
  • 183. Applications of the χ2 distribution The chi-squared distribution allow us to test and compute confidence interval estimators of the variance for a random variable that is normally distributed ª with a sufficiently large sample Goodness-of-fit tests Homogeneity and independence tests 23 / 32
  • 184. F distribution The F distribution is another continuous probability distribution commonly used in statistical inference ª ‘F’ stands for R.A. Fisher who described this distribution In this case the shape of the distribution is determined by two parameters, which are the degrees of freedom That is because the F distribution arises as the proportion of two independent chi-squared variables with their respective df That is: χ2 ν1 ν1 χ2 ν2 ν2 ∼ F(ν1,ν2) 24 / 32
  • 185. F density function The probability density function for the F distribution is: f(F) = Γ ν1+ν2 2 Γ ν1 2 Γ ν2 2 ν1 ν2 ν1 2 F ν1−2 2 1 + ν1F ν2 ν1+ν2 2 for F 0, and where ν1 and ν2 are called respectively as the numerator and denominator degrees of freedom 25 / 32
  • 186. F distribution plot 0 F f(F ) As with the chi-squared distribution, the shape of the F distribution is asymmetric and positive skewed 26 / 32
  • 187. F distribution, different degrees of freedom 0 F f(F ) ν1 = 1, ν2 = 1 ν1 = 3, ν2 = 3 ν1 = 9, ν2 = 9 ν1 = 25, ν2 = 25 27 / 32
  • 188. F random variable The F random variable generates its values through a statistical experiment as well The expected value and variance for the F random variable are: E(F) = ν2 ν2−2 for ν2 2 V(F) = 2ν2 2 (ν1+ν2−2) ν1(ν2−4)(ν2−2)2 for ν2 4 thus the mean parameter depends on the denominator degrees of freedom only, and it approximates to 1 with a large value of ν2 28 / 32
  • 189. Computing F probabilities We can calculate the areas under the distribution shape corresponding to probability values of the F distribution In this case we also have an asymmetric distribution, which means that areas in the two tails are FA,ν1,ν2 and F1−A,ν1,ν2 The following relation exists between these two critical regions: F1−A,ν1,ν2 = 1 FA,ν2,ν1 And we use a different probability table for each value of A with different numerator and denominator degrees of freedom 29 / 32
  • 190. Critical regions of the F distribution 0 F1−A FA A A F f(F ) 30 / 32
  • 191. Applications of the F distribution We can compare two variances from populations that are normally distributed with the F distribution and related statistics Analysis of variance, which is used to compare the means of two or more populations, is based on the F distribution and statistics 31 / 32
  • 192. Summary We have seen various continuous distributions that are important for inferential statistics and for small samples The t distribution is symmetric, around zero, and it depends on a single parameter that is its degree of freedom The χ2 distribution is the sum of independent Z random variables, and it is positively skewed The F distribution has asymmetric shape and it depends on the numerator and the denominator degrees of freedom All these distributions are related to their respective statistics, which have applications in statistical inference ª Remember Ex. 33, questions 2, 6, 7, 8, 10, 12, and Ex. 2 from Re-exam2013 32 / 32
  • 193. BUSINESS STATISTICS I Lecture – Week 40b (42) Antonio Rivero Ostoic School of Business and Social Sciences  October  AARHUS UNIVERSITYAU
  • 194. Today’s Agenda 1. Sampling Distributions: – of the Mean – of a Proportion – of the Difference between Two Means 2 / 32
  • 195. Sampling Distributions We used probability distributions to summarize probabilities of possible outcomes for a random variable However, using sample data from a population we estimate characteristics of the distributions expressed in parameters A sampling distribution is a probability distribution that determines probabilities of the possible values of a sample statistic ª it is obtained either by taking repeated random samples of a particular size from a population, or else by relying on the associated probability rules Each sample statistic has a sampling distribution ª so there is a sampling distribution of a sample mean, a sampling distribution of a sample proportion, a sampling distribution of a sample median, etc. 3 / 32
  • 196. Sampling Distributions II Unlike the distributions we have seen so far, a sampling distribution refers to the values of a statistic computed from observations in sample after sample Sampling distributions play a fundamental role in statistical inference because it allows us to measure how close a sample statistic is to the population parameter In other words, the sampling distribution determines the probability that the statistic falls within any given distance of the parameter it estimates 4 / 32
  • 197. Distribution of the Sample Mean probability rules Recall that the sample space of a random trial is the set of all possible outcomes E.g. sample space for two dice thrown S =    (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)    5 / 32
  • 198. Distribution of the Sample Mean II probability rules Means of samples of size 2: x =    1 1.5 2 2.5 3 3.5 1.5 2 2.5 3 3.5 4 2 2.5 3 3.5 4 4.5 2.5 3 3.5 4 4.5 5 3 3.5 4 4.5 5 5.5 3.5 4 4.5 5 5.5 6    6 / 32
  • 199. Expectations of X When using the probability approach we depend on the laws of expected value and variance for the population parameters of X: µ = xP(x) σ2 = (x − µ)2P(x) σ = √ σ2 7 / 32
  • 200. Sampling Random Variables In a sampling distribution, X constitutes a new random variable created by sampling, and x is a statistic corresponding to the sample mean Even though each sample may have an equal probability, some samples will have identical value of x Thus we can draw the probabilities of the different values of x that corresponds to sampling distribution of the sample mean 8 / 32
  • 201. Expectations of X The expected values and variance of the sampling distribution of X are: µx = xP(x) σ2 x = (x − µx)2P(x) σx = σ2 x thus for n = 2 we have µx = µ, whereas σ2 x = σ2 /2 9 / 32
  • 202. Distribution of X and X 1 2 3 4 5 6 0.14 0.16 0.18 x p(x) Distributions for the score from rolling a single die 1 2 3 4 5 6 0.05 0.10 0.15 x p(x) Distributions for the mean of two dice scores The distribution of X is different from the distribution of X, even though these variables are related 10 / 32
  • 203. Sampling distributions of X different sizes It is possible to obtain sampling distributions of X for different sample sizes and the sample statistic for the mean equals the population parameter µx = µ However in the case of the variance of the sampling distribution, this parameter equals to the proportion of the population variance to the sample mean σ2 x = σ2 n 11 / 32
  • 204. Standard Error of the Mean The standard deviation of a sampling distribution is called the standard error of the mean, and for infinitely large populations is defined as: σx = σ √ n When the size of the population is finite and known there is a finite population correction factor added to the expression, and the standard error becomes: σx = σ √ n N − n N − 1 However, such factor is close to 1 when N is large relative to n (like 20 times larger) ª thus it can be omitted 12 / 32
  • 205. Sampling distributions of X for n = 6 3.51 6 0.0 0.1 0.2 0.3 0.4 0.5 13 / 32
  • 206. Central Limit Theorem We observe that as the sample size gets larger, the sampling distribution of X becomes narrower ª this is because the values are more concentrated around the mean Even more significant, the larger the sample size is the more the distribution is similar to a bell shaped distribtuion The Central Limit Theorem states: For a sufficiently large sample size, the sample distribution of the mean of a random variable drawn from any population is approximately normal Thus the larger the sample size, the more closely the sampling distribution of X resembles N 14 / 32
  • 207. Central Limit Theorem II The approximation given in the central limit theorem applies also if the population is nonnormal distributed with a sufficiently large sample size ª for nonnormal populations a ‘sufficiently large’ sample is n 30 ª for highly skewed populations we need moderately large sample size Because of the central limit theorem we benefit from the properties of the standard normal distribution in order to compute sample probabilities In this case we use with Z the standard error of the mean: Z = X − µ σ/ √ n 15 / 32
  • 208. Example Using X A company’s vending machines consume on average 460 kwh of electricity with a standard deviation of 5 kwh • What is the probability that a vending machine in a given location consumes less than 470 kwh? P(X 470) = P X−µ σ 470−460 5 = P(Z 2) = 0.9772 = 98% • ...and the probability for using more than 470 kwh? P(X 470) = P X−µ σ 470−460 5 = P(Z 2) = 1 − P(Z 2) = 1 − .9772 = 0.0228 = 2% 16 / 32
  • 209. Example II Using X • What is the probability that 3 vending machines consume less than 465 kwh? i.e. P(X 465) Since we assume that X is normally distributed, the standard error of the mean must consider the sample size σx = σ√ n = 5√ 3 = 2.89 P(X 465) = P (X−µx) σx 465−460 2.89 = P(Z 1.73) = .9582 = 96% 17 / 32
  • 210. P(X 470) and P(X 465) last two examples µ = 460 470 x µ = 460 465 x 18 / 32
  • 211. Inference with the Sampling Distribution Many times we do not know the values of µ and σ when we want to calculate probabilities However we can infer such values through the sampling distribution, and for instance the value of µ can be deduced on the basis of the distribution of the sample mean More specifically, we can obtain a particular probability that the sample mean fall between two values by using the properties of Z A general formula for this problem is found in: P(µ − zα/2 σ√ n X zα/2 σ√ n + µ) = 1 − α where α is the probability that X does not fall into the interval 19 / 32
  • 212. Inference of the sample mean Example A sample distribution with n = 3 tells us that a vending machine consumes on average 470 kwh with σ = 5 kwh Then we can compute the 95% probability that the mean is located in a certain range from the sample mean Since z.025 = 1.96, then P(−1.96 Z 1.96) = .95 By adding µ and by multiplying by σ/ √ n to all terms in the probability statement we get: P(µ − 1.96 σ√ n X 1.96 σ√ n + µ) = .95 P(470 − 1.96 5√ 3 X 1.96 5√ 3 + 470) = .95 P(464.3 X 475.7) = .95 ª hence the sample mean will fall between 464.3 and 475.7 with 95% probability, which means that the computed sample mean is supported by the sample statistic 20 / 32
  • 213. Sampling Distribution of a Proportion Recall that the binomial distribution depends on a parameter p that represents the probability of success in any trial As with the previous example with the mean, typically the value of this parameter is unknown and it needs to be estimated For that, we conduct a binomial experiment where we count in n samples the number of successes X, and this random variable is binomially distributed 21 / 32
  • 214. Sampling Distribution of a Proportion II The estimator of p is the proportion of the number of successes to the sample size ˆP = X n For instance, for a sample size n and a probability of success p, we can find the probability that X is at most x by using a binomial probability table as: P(ˆP x n ) = P(X x) However as we have seen with quantitative variables, there exists a normal approximation to the binomial distribution from which can benefit in our calculations 22 / 32
  • 215. Normal approximation to the binomial Any X ∼ B(n, 0.5) is symmetric distributed and it produces a bell shape curve by smoothing the ends of the rectangles To calculate probabilities of X using the normal distribution requires find the area under the normal curve by applying a continuity correction factor that equals .5 to each interval x • Hence P(X = x) ≈ P(x − .5 Y x + .5) where Y is a random normal variable approximating X • Also P(X x) ≈ P(Y x + .5) and P(X x) ≈ P(Y x − .5) In the case of a range of values of X we can omit the correction factor, but the accuracy of the estimation is decreased 23 / 32
  • 216. Binomial distribution and normal approximation n = 100, p = .5 x P(x) 30 40 50 60 70 0 .02 .04 .06 .08 24 / 32
  • 217. Sampling distribution of a sample proportion The expectations of ˆP assumes that the population is normally distributed E(ˆP) = p V(ˆP) = σ2 ˆp = p(1−p) n σˆp = p(1−p) n The standard deviation of ˆP is known as the standard error of the proportion Here we omitted the finite population correction factor (cf. definition of σx) assuming that the sample size is relatively large 25 / 32
  • 218. Example finding ˆP Last year 30% of the schools in town have installed our vending machine cooler, and we want to see whether or not a proportion of schools will continue using our machine in the next year • If we make a random sample of 25 schools, what is the probability that more than 35% of the sample schools will choose our machine? Since we have just a success or failure we have a binomial experiment with p = .30 with n = 25 We want to find P(ˆP .35) σˆp = p(1 − p)/n = (.30)(.70)/25 = .0917 P(ˆP .35) = ˆP−p √ p(1−p)/n .35−.30 .0917 = P(Z .545) = 1 − P(Z .545) ≈ 1 − .705 = .295 = 30% 26 / 32
  • 219. Difference between Two Means Sampling distribution of We can use a sampling distribution to calculate the difference between two means The assumptions are that the random samples are independent from each other and they represent populations normally distributed Since both populations have a normal distribution, the difference between two sample means is also normally distributed 27 / 32
  • 220. Expectations of X1 − X2 The expectations of the sampling distribution of the difference between two sample means are: µx1−x2 = µ1 − µ2 and the variance is: σ2 x1−x2 = σ2 1 n1 + σ2 2 n2 The standard error of the difference between two means is: σx1−x2 = σ2 1 n1 + σ2 2 n2 for nonnormal populations we need sufficiently large sample sizes ( 30) 28 / 32
  • 221. Example X1 − X2 Our company’s vending machines electricity consumption is normally distributed with mean of 460 kwh and standard deviation of 5 kwh. A rival company produces vending machine coolers with normally distributed consumption of electricity with 455 kwh on average and 10 kwh as standard deviation. • What is the probability that the average of electricity consumption of our company’s machines exceed the rival machines if we take random samples of size 30 and 10 respectively? i.e. P(X1 − X2 0) with µ1 − µ2 = 460 − 455 = 5 and σx1−x2 = σ2 1 n1 + σ2 2 n2 = 52 30 + 102 10 = √ 10.833 = 3.29 P(X1 − X2 0) = P (X1−X2)−(µ1−µ2) σ2 1 n1 + σ2 2 n2 0−5 3.29 = P(Z −1.52) = 1 − P(Z −1.52) = 1 − .0643 = .9357 29 / 32
  • 222. Sampling distribution and Inference Sampling distributions – rather probability distributions – are commonly used for statistical inference While a probability distribution refers to individual observations, a sampling distribution refers to the values of a statistic computed from those observations Statistics computed through sampling distributions allows us making inferences about the population parameters that usually are unknown 30 / 32
  • 223. Sampling distribution and Inference II Population Parameters Individual Probability distribution Population Parameters Statistic Sampling distribution Statistic Parameter Sampling distribution 31 / 32
  • 224. Summary Sampling distributions determines the probabilities for sample statistics Mean, proportion, and the difference between two means are illustrations of sample statistics with their own type of sampling distributions We can make inferences about population parameters though sample statistics computed by sampling distributions 32 / 32