Inferential Statistics
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-2
Statistics:
Descriptive and Inferential
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-3
Descriptive and Inferential
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-4
Inferential Statistics
 In inferential statistics and hypothesis testing, our goal
is to find systematic reasons for differences and rule out
random chance as the cause.
 Consider crop yield vs fertilizer use.
 For a given quantity of fertilizer (10 grm) crop yield
varies (20, 30, 40 Kg) though it should be one
number.
 The variation is due to many other factors that are
random (water, temperature etc.).
 We want to find the true (systematic) relationship
between fertilizer and yield and eliminates random
causes.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-5
Relationships
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-6
Measuring Relationship
Between 2 Variables
Coefficient of Correlation
 Measures the relative strength of the linear
relationship between two variables
 Sample coefficient of correlation:
where
Y
X S
S
Y)
,
(X
cov
r 
1
n
)
X
(X
S
n
1
i
2
i
X





1
n
)
Y
)(Y
X
(X
Y)
,
(X
cov
n
1
i
i
i






1
n
)
Y
(Y
S
n
1
i
2
i
Y





Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-7
R between depression and anxiety
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-8
Estimating r
Y
X S
S
Y)
,
(X
cov
r 
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-9
Features of
Correlation Coefficient, r
 Unit free
 Ranges between –1 and 1
 The closer to –1, the stronger the negative linear
relationship
 The closer to 1, the stronger the positive linear
relationship
 The closer to 0, the weaker the linear relationship
 R gives:
 form (linear), direction (+ or -) and magnitude (-1 to +1)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-10
Scatter Plots of Data with Various
Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3
r = +1
Y
X
r = 0
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-11
Correlation between family size and stress
3 Correlation Family size & stress levels.xlsx
Family Size
Sress Level (0-
100)
Family Size 1
Stress Level
(0-100) 0.68 1
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-12
Pearson vs Spearman Correlation
 Pearson Correlation Coefficient (r):
 Pearson correlation coefficient measures the linear relationship between
two continuous variables.
 It assesses the degree to which the relationship between the variables can
be described by a straight line.
 Pearson correlation assumes that the variables are normally distributed
and have a linear relationship.
 Suitable for continuous data.
 Spearman Rank Correlation Coefficient (ρ or rs):
 Spearman rank correlation coefficient measures the strength and direction
of association between two variables, regardless of the linearity of the
relationship.
 It is suitable for both continuous and ordinal variables.
 It is robust to outliers and non-normal distributions.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-13
Hypothesis Testing and
Relationship Testing
 Hypothesis testing
 T-Test (2 variables)
 ANOVA (several variables)
 Relationship testing
 Regression (dependent vs in dependent variables)

Simple

Multiple

Dummy

Binary dependent …
Intuition on Test of Hypothesis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-15
 Purpose of inferential statistical analysis:
 To use sample data and make inferences about
what's truly (saying with confidence/ objectively)
happening in the population.
 We will attempt to understand intuitively the
theoretical basis on how we achieve above
purpose.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-16
Types of Inferences
 1. Estimation and inferring of population parameters through a
sample.
 Estimate the average income of a Sri Lankan houshehold.
 2. Testing and inferring whether parameters of populations differ,
using samples.
 Test whether average household income between households
in Jaffna and Colombo districts are different.
 3. Regression inferring which independent variables significantly
explain a dependent variable and predicting the dependent
variable using independent variable/s, using a sample.
 What are the factors that explain household income?

Income f Education, Assets owned, political affiliations … Xn
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-17
ESTIMATION
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-18
Inferring on Mean of population from a sample
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-19
 Any phenomena has variation (thus referred to as
variable).
 Ex: household income of Sri Lankan population, academic
performance among CIRP students, stress levels among
CIRP students.
 Variation is that each individual score (observation)
will differ from its mean. Mean expected true value.
 Consider the income of households in Colombo and
observe above (variation, mean and difference of
observations to mean).
 6 Basis of statistical testing.xlsx
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-20
 Generate 1 random sample of the sample size 10 from a
population of 1 … 1868 Household Incomes in Colombo.
This data is from HIES (2019) Sri Lanka.
 Is your sample mean equal to the population mean?
 Ask your friends whether they got the same sample mean?
 Why is it that the sample mean not equal to population
mean?
 Just like an individual score will differ from its mean, an
individual sample mean will differ from the true population
mean.
 This deviation of sample mean from population mean is
called sampling error.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-21
Sampling Distribution of Sample Means
 Take 10 more samples of 10 sample size on the population
of Colombo household income and estimate the sample
means for all 10 samples.
 Are the sample means the same? Why are they not the
same?
 Now estimate the mean of sample means (Distribution of
sample means) and check whether the mean of sample
means is equal to the population mean.
 It is getting closer to being equal to the population mean.
This is the basis of CENTRAL LIMIT THEOREM
 Note sampling distribution can be estimated for any
statistic as Mean, Correlation coefficient, T-stat etc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-22
CENTRAL LIMIT THEOREM
 The central limit theorem states:
 For samples of a single size n, drawn from a
population with a given mean μ and variance σ2, the
sampling distribution of sample means will have a
mean 𝜇𝑋 = μ and variance
̅ 𝜎𝑋
2
= σ2
/n.
 This distribution will approach normality as n
increases.
 Above is true even if the population distribution is not
normally distributed.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-23
Implications of CLT
 Confidence Interval Estimation: The CLT is used to construct
confidence intervals for population parameters, such as the
population mean or proportion. Confidence intervals provide a
range of plausible values for the population parameter and are
widely used in inferential statistics to quantify the uncertainty
associated with sample estimates.
 Hypothesis Testing: The CLT is fundamental for hypothesis
testing, especially when dealing with large sample sizes. It justifies
the use of parametric tests, such as the t-test and z-test, even
when the population distribution is not normal, as long as the
sample size is sufficiently large. Hypothesis tests rely on the
assumption of normality or the CLT to make inferences about
population parameters.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-24
Confidence Interval on
Sample Estimates
 6 Basis of statistical testing.xlsx CLT Simple spread
sheet
 Confidence Level: The confidence level is the
probability that the true population parameter falls
within the calculated confidence interval.
 i.e. 95% confidence level means that if one takes
many samples 95% of the standard errors of the
samples will contain the population mean. (pg 92
Gujarati)
 It is often expressed as a percentage (e.g., 95%
confidence level). Commonly used confidence levels
include 90%, 95%, and 99%.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-25
Estimating a mean of a
population and reporting
 1. Take a sample (n > 30)
 2. Estimate sample mean and standard
error
 3. Report sample mean with standard
error
 If Mean +/- 2SE means that it is 95%
assured that the population mean is
between +/-2SE
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-26
TESTING
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-27
Normal Distribution
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-28
Converting ND to SND (Z)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-29
Conversion to Z Value (Standard Normal Distribution)
 Any normal
distribution can be
converted to
Standard Normal
Distribution (Z
distribution).
 So the sampling
distribution can be
converted to a Z
distribution.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-30
(Z Distribution) Has Known
Characteristics
 Mean (μ): The mean of
the Z-distribution is
always 0. This means
that the central value of
the distribution is located
at 0.
 Standard Deviation (σ):
The standard deviation of
the Z-distribution is
always 1.
 68% (probability) of the
observations will be
between +&- 1 SD etc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-31
Probability and Z values
 6 Basis of statistical testing.xlsx
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-32
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-33
Using Z distribution to
Test Hypothesis
 If a Z value is > 1.96
then probability of
that value coming
from that distribution
is less than 0.05.
 So if an estimated Z
value is more than
1.96 (say 2) then it is
not from that
distribution.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-34
Z Value Table
 Using the z-table to find the area in the body to the left of z = 1.62
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-35
Example of Using the Z Distribution to
Test a Hypothesis
 Consider a psychologist, based on research knows that, the
average stress level of a person on a working day is 8.00
stress units. It is also known that the stress level varies by a
standard deviation σ = 0.50. That is, the known population
mean is μ = 8.00 and the known population standard
deviation is σ = 0.50.
 The psychologist wants to know (test) whether a new patient
is stressed. Hence he/she observes the patient for 25 days
(n=25) and takes measures of stress and found the average
stress to be = 7.75.
X
̅
 Test whether he is stressed?
 If stressed his stress level should fall within population stress levels
[with confidence level of 95%, i.e. estimated Z < 2 (from z table)]
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-36
Stating the Hypothesis
 Null hypothesis (H0) is typically the opposite of the researcher's hypothesis.
Null hypothesis is the idea that nothing is going on (no difference).
 H0 μ = 0
 Alternative hypothesis (HA ) is simply the reverse of the null hypothesis.
 There are three options, depending on where we expect the difference to lie
(direction).
 HA μ ≠ 0
 HA μ < 0
 HA μ > 0
 Can have numbers other than zero.
 Thus hypothesis on patients stress
 H0 μ = 8 (Stressed)
 HA μ < 8 (not stressed)
 Where μ = to patience average stress and 8 is population average stress.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-37
Estimating the Z Score and
Testing Hypothesis
 Estimate the Z score and compare with table Z
value (for p=.05 Z is -1.96)
 Since Z estimated > Z table
 we reject the null hypothesis (patient stressed) and
accept the alternate hypothesis and conclude that
the patient is not stressed.
 Rule of thumb (first look at results)
 Estimated Z >2
 Estimated P < 0.05
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-38
T Distribution
 T distribution is used
when: Population
variance is not known
and sample size is
small.
 Different statistics
have different
distributions, which
are used for
hypothesis testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-39
 Df= n-1 (n is sample size)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-40
F Distribution (p=0.05)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-41
Hypothesis Testing
Procedure
 Step 1: State the Hypotheses
 Step 2: Decide the Critical Values (99%, 95%,
the same p=0.01, p=0.05) and find the table
values.
 Step 3: Calculate the Test Statistic
 Step 4: Make the Decision: Compare the
estimated value with table value
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-42
Statistical Tests
7 Inferential Stat data.xlsx
 Test of means
 One sample (t test)

Two samples (t test)

Paired

Unpaired

Many samples (Anova F test)
 Test of difference of categorical measures
(frequency)
 Chi Square
 Regression: Test of association and quantification
of change
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-43
Single sample (called a 1-sample t-test)
 It’s known that patients with mean (µ)
depression scores > 30 needs clinical
attention. You have data of depression
scores of 4 check-ups (X) of a patient.
You want to infer whether the patient
needs clinical attention.
 Hypothesis: H0: X = 30 HA: X > 30
 Standard error:
 Standard deviation:
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-44
 Hypothesis:
 H0: X = 30 HA: X > 30
 Estimated t = 3.46
 Critical t = 2.35
 3.46 > 2.35
 Thus reject null
hypothesis
 Infer that the patient
needs treatment.
 Use STATA 7 Inferential Stat data.xlsx
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-45
Two sample paired t test
Depression scores/10
BeforeTrea
tment
After
Treatment Difference
Ranil 3 2 -1
Sajith 3 6 3
Anura 5 3 -2
Hirunika 8 4 -4
Namal 3 9 6
Patali 1 2 1
Diana 4 5 1
Mean 0.57
Std Dev 3.31
N 7
Sq Rt N 2.65
Std Error SD/SqRtN 1.25
t value 0.46
 Hypothesis:
 BT mean = AT mean
 Or BT mean-AT mean=0 or
 Mean of difference µD =0

Estimated t (0.46) < table t (2)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-46
Two sample unpaired t test
 7 Inferential Stat data.xlsx
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-47
Test of difference of means of
many samples: ANOVA
 Scores on a job entrance test shown below, where applicants have different
qualifications. Want to infer whether different qualifications impact test result,
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-48
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-49
 Observe difference between groups (systematic variance) and also within groups
(unsystematic variance/random error). Our interest is to test the difference between
groups.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-50
 Between group variance:
 Within group variance:
 Total Variance:
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-51
ANOVA Table: F interpretation
 If we are expecting a
large difference
between groups then
MSB > MSW
 i.e. a large F value
 Significance of F
value can be checked
over F Table.
 Hypothesis:
 Means of ND = RD= UR
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-52
 Hypothesis:
 H0: Means of ND = RD=
UR
 Consider table F = 2
 Estimated F (36) > table F (2)
 Reject H0
 Therefore infer there is
difference in performance
between degrees.
 We do not however know
the difference between
which degrees.
 7 Inferential Stat data.xlsx
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-53
F Table
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-54
Test of difference of categorical measures
(frequency measures): Chi Square (X2
) test
 Preference on pets given by a sample of people is given
below. The data is number of people preferring each
pet.
 We want to infer whether peoples preference differ.
 Expected values are what would be found if each
category had equal representation (no difference in
preference).
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-55
 Hypothesis: Pet
preference is equal
(not quantified)
 Estimated X2 is 6.49
 Table value is 5.99
 Est (6.49) > Tab
(5.99)
 Its inferred that there
is difference in pet
preference.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-56
REGRESSION: TEST OF ASSOCIATION AND
QUANTIFICATION OF CHANGE
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-57
Regression: Test of association and
quantification of change
 Regression is estimating relationship (line) that “best fit” observations.
 How to estimate the line and how do we judge it’s the best fit?
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-58
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-59
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-60
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-61
 https://www.youtube.com/watch?v=PaFPbb66D
xQ
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-62
Estimating coefficients
 𝑐𝑜𝑣𝑋𝑌 is the covariance of X and Y we learned about with
correlations; and 𝑠𝑋
2
is the variance of X.
 Significance of b can be tested.
 Generally used estimation methods include Ordinary Least
Squares (OLS), Method of Moments (MoM), and Maximum
Likelihood Estimate (MLE).
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-63
Strength of the regression
 The "strength" of a regression model typically refers to how well the model fits the data or how
strong the relationship is between the independent variable(s) and the dependent variable.
There are several measures commonly used to assess the strength of a regression model:
 1. **Significance of Regression Coefficients**: The significance of individual regression
coefficients indicates whether each independent variable contributes significantly to the model's
prediction of the dependent variable. A higher significance level (typically p-value less than
0.05) suggests a stronger relationship.
 2. **Coefficient of Determination ((R^2))**: (R^2) measures the proportion of the variance in
the dependent variable that is explained by the independent variable(s) in the regression model.
It ranges from 0 to 1, where 1 indicates a perfect fit and 0 indicates no relationship. Higher
values of (R^2) indicate stronger relationships.
 3. **Adjusted (R^2)**: Adjusted (R^2) is a modified version of (R^2) that penalizes the
inclusion of additional independent variables that do not improve the model's fit. It adjusts for
the number of predictors in the model and is often preferred when comparing models with
different numbers of predictors.
 4. **F-statistic**: The F-statistic tests the overall significance of the regression model. A larger
F-statistic indicates a stronger relationship between the independent variable(s) and the
dependent variable.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-64
Significance of Regression
Coefficients and signs
 regression coefficients is assessed through
hypothesis testing, typically using t-tests or
F-tests, and interpreting the associated p-
values. Significant coefficients indicate that
the corresponding independent variables
have a statistically significant impact on the
dependent variable.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-65
Coefficient of Determination
R^2
 The coefficient of determination, often
denoted as �2R2, is a statistical measure
that represents the proportion of the
variance in the dependent variable that is
explained by the independent variables in a
regression model. It's a key metric used to
assess the goodness-of-fit of a regression
model.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-66
Strength of the regression: F
Value
 SST (Total) the variance (deviation from
mean) we want to explain with the
regression line.
 SSR (Model) is deviation between models.
 SSE (Error) is the residual not explained by
the regression line. Which is to be
minimized. Then F will be large. The
significance of the F value can be tested.
Total Residual
Model
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-67
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-68
Estimating regressions and
test of hypothesis
 Simple regression 7 Inferential Stat data.xlsx
 Multiple regression
 Simple
 Multiple
 Categorical independent variables (dummy variable)
 Categorical dependent variable (Logistic/
Multinomial)
 …. ?
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-69
Parametric vs Nonparametric Tests
Parametric
 1. Assumptions:
 Parametric tests assume that the data are drawn from populations
with specific distributional characteristics, typically the normal
distribution.
 They also assume that the data have a specific level of measurement
(e.g., interval or ratio) and that the variances of the groups being
compared are equal.
 2. Examples: Parametric tests include:
 t-tests (e.g., independent samples t-test, paired samples t-test)
 Analysis of Variance (ANOVA)
 Pearson correlation
 Linear regression
 3. Advantages:
 Parametric tests are generally more powerful (i.e., have higher
statistical power) than nonparametric tests when the assumptions are
met.
 They often provide more precise estimates and smaller standard
errors.
 Parametric tests allow for more precise estimation of effect sizes
and confidence intervals.
 4. Limitations:
 Parametric tests are sensitive to violations of their assumptions. If
the data do not meet the assumptions (e.g., non-normality, unequal
variances), the results may be inaccurate.
 They may not be suitable for small sample sizes or data that do not
follow a normal distribution.
Nonparametric
 1. Assumptions:
 Nonparametric tests make fewer assumptions about the distribution
of the data. They are often referred to as distribution-free tests.
 They do not assume a specific probability distribution for the data
and are robust to violations of assumptions such as normality.
 2. Examples: Nonparametric tests include:
 Mann-Whitney U test (equivalent to the independent samples t-test)
 Wilcoxon signed-rank test (equivalent to the paired samples t-test)
 Kruskal-Wallis test (nonparametric alternative to ANOVA)
 Spearman rank correlation
 Dummy variables and logistic regression
 3. Advantages:
 Nonparametric tests are robust to violations of assumptions about
the distribution of the data.
 They can be used with ordinal or non-normally distributed data, as
well as with small sample sizes.
 Nonparametric tests are often simpler and more straightforward to
interpret.
 4. Limitations:
 Nonparametric tests are generally less powerful than parametric
tests when the assumptions of the latter are met.
 They may provide less precise estimates and wider confidence
intervals.
 Nonparametric tests may have lower efficiency (i.e., require larger
sample sizes) compared to parametric tests.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-70
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 2-71
Degrees of freedom
 https://www.youtube.com/watch?v=92s7IVS6A3
4&t=2s
 https://www.youtube.com/watch?v=ke8nSbXUJj
Q
 So the degree of freedom is considered to give
more precision to estimate (standard deviation
etc.)

3 Inferential Statistics Osychology.pptx

  • 1.
  • 2.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-2 Statistics: Descriptive and Inferential
  • 3.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-3 Descriptive and Inferential
  • 4.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-4 Inferential Statistics  In inferential statistics and hypothesis testing, our goal is to find systematic reasons for differences and rule out random chance as the cause.  Consider crop yield vs fertilizer use.  For a given quantity of fertilizer (10 grm) crop yield varies (20, 30, 40 Kg) though it should be one number.  The variation is due to many other factors that are random (water, temperature etc.).  We want to find the true (systematic) relationship between fertilizer and yield and eliminates random causes.
  • 5.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-5 Relationships
  • 6.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-6 Measuring Relationship Between 2 Variables Coefficient of Correlation  Measures the relative strength of the linear relationship between two variables  Sample coefficient of correlation: where Y X S S Y) , (X cov r  1 n ) X (X S n 1 i 2 i X      1 n ) Y )(Y X (X Y) , (X cov n 1 i i i       1 n ) Y (Y S n 1 i 2 i Y     
  • 7.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-7 R between depression and anxiety
  • 8.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-8 Estimating r Y X S S Y) , (X cov r 
  • 9.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-9 Features of Correlation Coefficient, r  Unit free  Ranges between –1 and 1  The closer to –1, the stronger the negative linear relationship  The closer to 1, the stronger the positive linear relationship  The closer to 0, the weaker the linear relationship  R gives:  form (linear), direction (+ or -) and magnitude (-1 to +1)
  • 10.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-10 Scatter Plots of Data with Various Correlation Coefficients Y X Y X Y X Y X Y X r = -1 r = -.6 r = 0 r = +.3 r = +1 Y X r = 0
  • 11.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-11 Correlation between family size and stress 3 Correlation Family size & stress levels.xlsx Family Size Sress Level (0- 100) Family Size 1 Stress Level (0-100) 0.68 1
  • 12.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-12 Pearson vs Spearman Correlation  Pearson Correlation Coefficient (r):  Pearson correlation coefficient measures the linear relationship between two continuous variables.  It assesses the degree to which the relationship between the variables can be described by a straight line.  Pearson correlation assumes that the variables are normally distributed and have a linear relationship.  Suitable for continuous data.  Spearman Rank Correlation Coefficient (ρ or rs):  Spearman rank correlation coefficient measures the strength and direction of association between two variables, regardless of the linearity of the relationship.  It is suitable for both continuous and ordinal variables.  It is robust to outliers and non-normal distributions.
  • 13.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-13 Hypothesis Testing and Relationship Testing  Hypothesis testing  T-Test (2 variables)  ANOVA (several variables)  Relationship testing  Regression (dependent vs in dependent variables)  Simple  Multiple  Dummy  Binary dependent …
  • 14.
    Intuition on Testof Hypothesis
  • 15.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-15  Purpose of inferential statistical analysis:  To use sample data and make inferences about what's truly (saying with confidence/ objectively) happening in the population.  We will attempt to understand intuitively the theoretical basis on how we achieve above purpose.
  • 16.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-16 Types of Inferences  1. Estimation and inferring of population parameters through a sample.  Estimate the average income of a Sri Lankan houshehold.  2. Testing and inferring whether parameters of populations differ, using samples.  Test whether average household income between households in Jaffna and Colombo districts are different.  3. Regression inferring which independent variables significantly explain a dependent variable and predicting the dependent variable using independent variable/s, using a sample.  What are the factors that explain household income?  Income f Education, Assets owned, political affiliations … Xn
  • 17.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-17 ESTIMATION
  • 18.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-18 Inferring on Mean of population from a sample
  • 19.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-19  Any phenomena has variation (thus referred to as variable).  Ex: household income of Sri Lankan population, academic performance among CIRP students, stress levels among CIRP students.  Variation is that each individual score (observation) will differ from its mean. Mean expected true value.  Consider the income of households in Colombo and observe above (variation, mean and difference of observations to mean).  6 Basis of statistical testing.xlsx
  • 20.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-20  Generate 1 random sample of the sample size 10 from a population of 1 … 1868 Household Incomes in Colombo. This data is from HIES (2019) Sri Lanka.  Is your sample mean equal to the population mean?  Ask your friends whether they got the same sample mean?  Why is it that the sample mean not equal to population mean?  Just like an individual score will differ from its mean, an individual sample mean will differ from the true population mean.  This deviation of sample mean from population mean is called sampling error.
  • 21.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-21 Sampling Distribution of Sample Means  Take 10 more samples of 10 sample size on the population of Colombo household income and estimate the sample means for all 10 samples.  Are the sample means the same? Why are they not the same?  Now estimate the mean of sample means (Distribution of sample means) and check whether the mean of sample means is equal to the population mean.  It is getting closer to being equal to the population mean. This is the basis of CENTRAL LIMIT THEOREM  Note sampling distribution can be estimated for any statistic as Mean, Correlation coefficient, T-stat etc.
  • 22.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-22 CENTRAL LIMIT THEOREM  The central limit theorem states:  For samples of a single size n, drawn from a population with a given mean μ and variance σ2, the sampling distribution of sample means will have a mean 𝜇𝑋 = μ and variance ̅ 𝜎𝑋 2 = σ2 /n.  This distribution will approach normality as n increases.  Above is true even if the population distribution is not normally distributed.
  • 23.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-23 Implications of CLT  Confidence Interval Estimation: The CLT is used to construct confidence intervals for population parameters, such as the population mean or proportion. Confidence intervals provide a range of plausible values for the population parameter and are widely used in inferential statistics to quantify the uncertainty associated with sample estimates.  Hypothesis Testing: The CLT is fundamental for hypothesis testing, especially when dealing with large sample sizes. It justifies the use of parametric tests, such as the t-test and z-test, even when the population distribution is not normal, as long as the sample size is sufficiently large. Hypothesis tests rely on the assumption of normality or the CLT to make inferences about population parameters.
  • 24.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-24 Confidence Interval on Sample Estimates  6 Basis of statistical testing.xlsx CLT Simple spread sheet  Confidence Level: The confidence level is the probability that the true population parameter falls within the calculated confidence interval.  i.e. 95% confidence level means that if one takes many samples 95% of the standard errors of the samples will contain the population mean. (pg 92 Gujarati)  It is often expressed as a percentage (e.g., 95% confidence level). Commonly used confidence levels include 90%, 95%, and 99%.
  • 25.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-25 Estimating a mean of a population and reporting  1. Take a sample (n > 30)  2. Estimate sample mean and standard error  3. Report sample mean with standard error  If Mean +/- 2SE means that it is 95% assured that the population mean is between +/-2SE
  • 26.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-26 TESTING
  • 27.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-27 Normal Distribution
  • 28.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-28 Converting ND to SND (Z)
  • 29.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-29 Conversion to Z Value (Standard Normal Distribution)  Any normal distribution can be converted to Standard Normal Distribution (Z distribution).  So the sampling distribution can be converted to a Z distribution.
  • 30.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-30 (Z Distribution) Has Known Characteristics  Mean (μ): The mean of the Z-distribution is always 0. This means that the central value of the distribution is located at 0.  Standard Deviation (σ): The standard deviation of the Z-distribution is always 1.  68% (probability) of the observations will be between +&- 1 SD etc.
  • 31.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-31 Probability and Z values  6 Basis of statistical testing.xlsx
  • 32.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-32
  • 33.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-33 Using Z distribution to Test Hypothesis  If a Z value is > 1.96 then probability of that value coming from that distribution is less than 0.05.  So if an estimated Z value is more than 1.96 (say 2) then it is not from that distribution.
  • 34.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-34 Z Value Table  Using the z-table to find the area in the body to the left of z = 1.62
  • 35.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-35 Example of Using the Z Distribution to Test a Hypothesis  Consider a psychologist, based on research knows that, the average stress level of a person on a working day is 8.00 stress units. It is also known that the stress level varies by a standard deviation σ = 0.50. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ = 0.50.  The psychologist wants to know (test) whether a new patient is stressed. Hence he/she observes the patient for 25 days (n=25) and takes measures of stress and found the average stress to be = 7.75. X ̅  Test whether he is stressed?  If stressed his stress level should fall within population stress levels [with confidence level of 95%, i.e. estimated Z < 2 (from z table)]
  • 36.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-36 Stating the Hypothesis  Null hypothesis (H0) is typically the opposite of the researcher's hypothesis. Null hypothesis is the idea that nothing is going on (no difference).  H0 μ = 0  Alternative hypothesis (HA ) is simply the reverse of the null hypothesis.  There are three options, depending on where we expect the difference to lie (direction).  HA μ ≠ 0  HA μ < 0  HA μ > 0  Can have numbers other than zero.  Thus hypothesis on patients stress  H0 μ = 8 (Stressed)  HA μ < 8 (not stressed)  Where μ = to patience average stress and 8 is population average stress.
  • 37.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-37 Estimating the Z Score and Testing Hypothesis  Estimate the Z score and compare with table Z value (for p=.05 Z is -1.96)  Since Z estimated > Z table  we reject the null hypothesis (patient stressed) and accept the alternate hypothesis and conclude that the patient is not stressed.  Rule of thumb (first look at results)  Estimated Z >2  Estimated P < 0.05
  • 38.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-38 T Distribution  T distribution is used when: Population variance is not known and sample size is small.  Different statistics have different distributions, which are used for hypothesis testing.
  • 39.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-39  Df= n-1 (n is sample size)
  • 40.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-40 F Distribution (p=0.05)
  • 41.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-41 Hypothesis Testing Procedure  Step 1: State the Hypotheses  Step 2: Decide the Critical Values (99%, 95%, the same p=0.01, p=0.05) and find the table values.  Step 3: Calculate the Test Statistic  Step 4: Make the Decision: Compare the estimated value with table value
  • 42.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-42 Statistical Tests 7 Inferential Stat data.xlsx  Test of means  One sample (t test)  Two samples (t test)  Paired  Unpaired  Many samples (Anova F test)  Test of difference of categorical measures (frequency)  Chi Square  Regression: Test of association and quantification of change
  • 43.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-43 Single sample (called a 1-sample t-test)  It’s known that patients with mean (µ) depression scores > 30 needs clinical attention. You have data of depression scores of 4 check-ups (X) of a patient. You want to infer whether the patient needs clinical attention.  Hypothesis: H0: X = 30 HA: X > 30  Standard error:  Standard deviation:
  • 44.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-44  Hypothesis:  H0: X = 30 HA: X > 30  Estimated t = 3.46  Critical t = 2.35  3.46 > 2.35  Thus reject null hypothesis  Infer that the patient needs treatment.  Use STATA 7 Inferential Stat data.xlsx
  • 45.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-45 Two sample paired t test Depression scores/10 BeforeTrea tment After Treatment Difference Ranil 3 2 -1 Sajith 3 6 3 Anura 5 3 -2 Hirunika 8 4 -4 Namal 3 9 6 Patali 1 2 1 Diana 4 5 1 Mean 0.57 Std Dev 3.31 N 7 Sq Rt N 2.65 Std Error SD/SqRtN 1.25 t value 0.46  Hypothesis:  BT mean = AT mean  Or BT mean-AT mean=0 or  Mean of difference µD =0  Estimated t (0.46) < table t (2)
  • 46.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-46 Two sample unpaired t test  7 Inferential Stat data.xlsx
  • 47.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-47 Test of difference of means of many samples: ANOVA  Scores on a job entrance test shown below, where applicants have different qualifications. Want to infer whether different qualifications impact test result,
  • 48.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-48
  • 49.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-49  Observe difference between groups (systematic variance) and also within groups (unsystematic variance/random error). Our interest is to test the difference between groups.
  • 50.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-50  Between group variance:  Within group variance:  Total Variance:
  • 51.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-51 ANOVA Table: F interpretation  If we are expecting a large difference between groups then MSB > MSW  i.e. a large F value  Significance of F value can be checked over F Table.  Hypothesis:  Means of ND = RD= UR
  • 52.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-52  Hypothesis:  H0: Means of ND = RD= UR  Consider table F = 2  Estimated F (36) > table F (2)  Reject H0  Therefore infer there is difference in performance between degrees.  We do not however know the difference between which degrees.  7 Inferential Stat data.xlsx
  • 53.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-53 F Table
  • 54.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-54 Test of difference of categorical measures (frequency measures): Chi Square (X2 ) test  Preference on pets given by a sample of people is given below. The data is number of people preferring each pet.  We want to infer whether peoples preference differ.  Expected values are what would be found if each category had equal representation (no difference in preference).
  • 55.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-55  Hypothesis: Pet preference is equal (not quantified)  Estimated X2 is 6.49  Table value is 5.99  Est (6.49) > Tab (5.99)  Its inferred that there is difference in pet preference.
  • 56.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-56 REGRESSION: TEST OF ASSOCIATION AND QUANTIFICATION OF CHANGE
  • 57.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-57 Regression: Test of association and quantification of change  Regression is estimating relationship (line) that “best fit” observations.  How to estimate the line and how do we judge it’s the best fit?
  • 58.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-58
  • 59.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-59
  • 60.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-60
  • 61.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-61  https://www.youtube.com/watch?v=PaFPbb66D xQ
  • 62.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-62 Estimating coefficients  𝑐𝑜𝑣𝑋𝑌 is the covariance of X and Y we learned about with correlations; and 𝑠𝑋 2 is the variance of X.  Significance of b can be tested.  Generally used estimation methods include Ordinary Least Squares (OLS), Method of Moments (MoM), and Maximum Likelihood Estimate (MLE).
  • 63.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-63 Strength of the regression  The "strength" of a regression model typically refers to how well the model fits the data or how strong the relationship is between the independent variable(s) and the dependent variable. There are several measures commonly used to assess the strength of a regression model:  1. **Significance of Regression Coefficients**: The significance of individual regression coefficients indicates whether each independent variable contributes significantly to the model's prediction of the dependent variable. A higher significance level (typically p-value less than 0.05) suggests a stronger relationship.  2. **Coefficient of Determination ((R^2))**: (R^2) measures the proportion of the variance in the dependent variable that is explained by the independent variable(s) in the regression model. It ranges from 0 to 1, where 1 indicates a perfect fit and 0 indicates no relationship. Higher values of (R^2) indicate stronger relationships.  3. **Adjusted (R^2)**: Adjusted (R^2) is a modified version of (R^2) that penalizes the inclusion of additional independent variables that do not improve the model's fit. It adjusts for the number of predictors in the model and is often preferred when comparing models with different numbers of predictors.  4. **F-statistic**: The F-statistic tests the overall significance of the regression model. A larger F-statistic indicates a stronger relationship between the independent variable(s) and the dependent variable.
  • 64.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-64 Significance of Regression Coefficients and signs  regression coefficients is assessed through hypothesis testing, typically using t-tests or F-tests, and interpreting the associated p- values. Significant coefficients indicate that the corresponding independent variables have a statistically significant impact on the dependent variable.
  • 65.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-65 Coefficient of Determination R^2  The coefficient of determination, often denoted as �2R2, is a statistical measure that represents the proportion of the variance in the dependent variable that is explained by the independent variables in a regression model. It's a key metric used to assess the goodness-of-fit of a regression model.
  • 66.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-66 Strength of the regression: F Value  SST (Total) the variance (deviation from mean) we want to explain with the regression line.  SSR (Model) is deviation between models.  SSE (Error) is the residual not explained by the regression line. Which is to be minimized. Then F will be large. The significance of the F value can be tested. Total Residual Model
  • 67.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-67
  • 68.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-68 Estimating regressions and test of hypothesis  Simple regression 7 Inferential Stat data.xlsx  Multiple regression  Simple  Multiple  Categorical independent variables (dummy variable)  Categorical dependent variable (Logistic/ Multinomial)  …. ?
  • 69.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-69 Parametric vs Nonparametric Tests Parametric  1. Assumptions:  Parametric tests assume that the data are drawn from populations with specific distributional characteristics, typically the normal distribution.  They also assume that the data have a specific level of measurement (e.g., interval or ratio) and that the variances of the groups being compared are equal.  2. Examples: Parametric tests include:  t-tests (e.g., independent samples t-test, paired samples t-test)  Analysis of Variance (ANOVA)  Pearson correlation  Linear regression  3. Advantages:  Parametric tests are generally more powerful (i.e., have higher statistical power) than nonparametric tests when the assumptions are met.  They often provide more precise estimates and smaller standard errors.  Parametric tests allow for more precise estimation of effect sizes and confidence intervals.  4. Limitations:  Parametric tests are sensitive to violations of their assumptions. If the data do not meet the assumptions (e.g., non-normality, unequal variances), the results may be inaccurate.  They may not be suitable for small sample sizes or data that do not follow a normal distribution. Nonparametric  1. Assumptions:  Nonparametric tests make fewer assumptions about the distribution of the data. They are often referred to as distribution-free tests.  They do not assume a specific probability distribution for the data and are robust to violations of assumptions such as normality.  2. Examples: Nonparametric tests include:  Mann-Whitney U test (equivalent to the independent samples t-test)  Wilcoxon signed-rank test (equivalent to the paired samples t-test)  Kruskal-Wallis test (nonparametric alternative to ANOVA)  Spearman rank correlation  Dummy variables and logistic regression  3. Advantages:  Nonparametric tests are robust to violations of assumptions about the distribution of the data.  They can be used with ordinal or non-normally distributed data, as well as with small sample sizes.  Nonparametric tests are often simpler and more straightforward to interpret.  4. Limitations:  Nonparametric tests are generally less powerful than parametric tests when the assumptions of the latter are met.  They may provide less precise estimates and wider confidence intervals.  Nonparametric tests may have lower efficiency (i.e., require larger sample sizes) compared to parametric tests.
  • 70.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-70
  • 71.
    Basic Business Statistics,10e © 2006 Prentice-Hall, Inc. Chap 2-71 Degrees of freedom  https://www.youtube.com/watch?v=92s7IVS6A3 4&t=2s  https://www.youtube.com/watch?v=ke8nSbXUJj Q  So the degree of freedom is considered to give more precision to estimate (standard deviation etc.)