2. 2
STATITICAL DATA ANALYSIS
COMMON TYPES OF ANALYSIS?
1. Examine Strength and Direction of Relationships
a. Bivariate (e.g., Pearson Correlation—r)
Between one variable and another:
rxy or Y = a + b1 x1
b. Multivariate (e.g., Multiple Regression Analysis)
Between one dep. var. and each of several indep. variables, while
holding all other indep. variables constant:
Y = a + b1 x1 + b2 x2 + b3 x3 + . . . + bk xk
2. Compare Groups
a. Compare Proportions (e.g., Chi-Square Test—2)
H0: P1 = P2 = P3 = … = Pk
b. Compare Means (e.g., Analysis of Variance)
H0: µ1 = µ2 = µ3 = …= µk
3. • To compare the mean values of a certain characteristic among
two or more groups.
• To see whether two or more groups are equal (or different) on
a given metric characteristic.
3
ANOVA was developed in 1919 by Sir Ronald Fisher, a
British statistician and geneticist/evolutionary biologist
When Do You Use ANOVA?
Sir Ronald Fisher
(1890-1962)
4. 4
H0: There are no differences among the mean values of the
groups being compared (i.e., the group means are all equal)–
H0: µ1 = µ2 = µ3 = …= µk
H1 (Conclusion if H0 rejected)?
Not all group means are equal
(i.e., at least one group mean is different from the rest).
H0 in ANOVA?
5. • Scenario 1. When comparing 2 groups, a one-step test :
2 Groups: A B
Step 1: Check to see if the two groups are different or not, and if so,
how.
• Scenario 2. When comparing >3 groups, if H0 is rejected, it is
a two-step test: >3 Groups: A B C
Step 1: Overall test that examines if all groups are equal or not.
And, if not all are equal (H0 rejected), then:
Step 2: Pair-wise (post-hoc) comparison tests to see where (i.e., 5
So, the number of steps involved in ANOVA depend on
if we are comparing 2 groups or > 2 groups:
6. Sum of
Squares df Mean Squares F-Ratio
SSB
(Between
Groups Sum Of
Squares)
K – 1 MSB = SSB / K-1 F = MSB / MSW
corresponding
SSW
(Within Groups
Sum of Squares)
N – K MSW = SSW / N-K
SST
(Total Sum of
Squares)
N – 1
Kn
xxxxxx kkiii
222
)(...)()(
MSW 2211
1
22
22
2
11 )(...)()(
K
kknnn xxxxxx
MSB
6
Typical solution presented in statistics classes require…
• Constructing an ANOVA TABLE
Test Statistic
7. • Sample Data: A random sample of 9 banks, 10 retailers, and 10 utilities.
• Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries
Banking Retailing Utility
6.42 3.52 3.55
2.83 4.21 2.13
8.94 4.36 3.24
6.80 2.67 6.47
5.70 3.49 3.06
4.65 4.68 1.80
6.20 3.30 5.29
2.71 2.68 2.96
8.34 7.25 2.90
----- 0.16 1.73
nB = 9 nR = 10 nU = 10 n = 29
H0: There were no differences in average EPS of Banks, Utilities, and Retailers.
First logical thing you do?
_ _ _ =
xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21
7
EXAMPLE: Whether or not average earnings per share (EPS) for commercial
banks, retailing operations, & utility companies (variable Industry) was the same last
year.
9. 9
Why is it called ANOVA?
• Differences in EPS (Dep. Var.) among all 29 firms has
two components--differences among the groups and
differences within the groups. That is,
a. There are some differences in EPS among the three groups of
firms (Banks vs. Retailers vs. Utilities), and
b. There are also some differences/variations in EPS of the firms
within each of these groups (among banks themselves, among
retailers themselves, and among utilities themselves).
• ANOVA will partition/analyze the variance of the dependent
variable (i.e., the differences in EPS) and traces it to its two
components/sources--i.e., to differences between groups vs.
differences within groups.
10. • Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries
Banking Retailing Utility
6.42 3.52 3.55
2.83 4.21 2.13
8.94 4.36 3.24
6.80 2.67 6.47
5.70 3.49 3.06
4.65 4.68 1.80
6.20 3.30 5.29
2.71 2.68 2.96
8.34 7.25 2.90
----- 0.16 1.73
nB = 9 nR = 10 nU = 10 n = 29
_ _ _ =
xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21
Total WITHIN Group Variance (or Mean Square WITHIN)?
10)310109(
)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42
MSW
222222
14. 14
Mean Square Between Groups = MSB = 17.698
MSB represents the portion of the total differences/variations in
EPS (the dependent variable) that is attributable to (or explained
by) differences BETWEEN groups (e.g., industries)
• That is, the part of differences in companies’ EPS that result
from whether they are banks, retailers, or utilities.
15. 15
Mean Square Within Groups (MS Residual/Error) =
MSW = 3.35
MSW represents:
a. The differences in EPS (the dependent variable) that are
due to all other factors that are not examined and not controlled for
in the study (e.g., diversification level, firm size, etc.)
Plus . . .
b. The natural variability of EPS (the dependent variable) among
members within each of the comparison groups (Note that even
banks with the same size and same level of diversification would
have different EPS levels).
16. 16
Now, let’s compare MSB & MSW:
MSB = 17.6 and MSW = 3.35.
QUESTION:
Based on the logic of ANOVA, when would we consider two (or
more) groups as different/unequal?
When MSB is significantly larger than MSW.
QUESTION:
What would be a reasonable index (a single number) that will
show how large MSB is compared to MSW?
(i.e., a single number that will show if MSB is larger than, equal to,
or smaller than MSW)?
17. • Ratio of MSB and MSW (Call it F-Ratio):
• What can we infer when F-ratio is close to 1?
• MSB and MSW are likely to be equal and, thus,
there is a strong likelihood that NO difference exists
among the comparison groups.
• How about when F-ratio is significantly larger than 1?
• The more F-ratio exceeds 1, the larger MSB is
compared to MSW and, thus, the stronger would be
the likelihood/evidence that group difference(s)
exist.
• Results of the above computations are usually
summarized
in an ANOVA TABLE such as the one that follows: 17
282.5
350.3
698.17
MSW
MSB
F
18. Source Sum of
Squares
df Mean Squares F
Between
Groups
35.397 K – 1 = 2 35.39 / 2 = 17.698 17.698 / 3.35 = 5.282
Within
Groups
87.112 N – K = 26 87.11 / 26 = 3.350
Total 122.509 N – 1 = 28
698.17
2
397.35
13
)21.431.3(10)21.463.3(10)21.484.5(9 222
MSB
18
350.3
26
112.87
MSW
)310109(
)31.373.1()31.355.3()63.316.0(...)63.352.3()84.534.8(...5.84)-(6.42
MSW
222222
19. For our sample companies, EPS difference across the three
industries (MSB) is more than 5 times the EPS difference
among firms within the industries (MSW)
• QUESTION: What is our null Hypothesis?
• QUESTION: Is the above F-ratio of 5.28 large
enough to warrant rejecting the null?
• ANSWER: It would be if the chance of being wrong (in
rejecting the null) does not exceed 5%.
• So, look up the F-value in the table of F-distribution (under
appropriate degrees of freedom) to find out what the -level
will be if, given this F-value, we decide to reject the null.
• Degrees of Freedom: v1 = k – 1 = 2
v2 = n – k = 26
19
Interpretation and Conclusion:
QUESTION: What does the F = 5.28 mean, intuitively?
20. 20F = 3.37 is significant at = 0.05 (If F=3.37 and we reject H0, 5% chance of being wrong)
11
21. 21
F = 4.27 is significant at = 0.025.
That is, if F=4.27 and we reject H0, we would face 5% chance of being wrong.
But, our F = 5.28 > 4.27
So, what can we say about our -level? Will it be larger or smaller than 0.025?
22. 22
• The odds of being wrong, if we decide to reject the null, would be
less than 2.5% (i.e., < 0.025) .
Would rejecting the null be a safe bet?
Conclusion?
Reject the null and conclude that the average EPS is NOT EQUAL
FOR ALL GROUPS (industries) being compared.
• Our F = 5.28 > 4.27