Sumit Kumar Das
PhD Student
Dept of Biostatistics
NIMHANS
Analysis of qualitative data:
Contingency tables; Chi-square
test, Kappa measure of agreement
(Inference IV)
11/26/2019 1
Types of Variables
Continuous variables:
• Always numeric
• Can be any number, positive or negative
• Examples: age in years, weight, blood pressure
readings, temperature, concentrations of
pollutants and other measurements
Categorical variables:
• Information that can be sorted into categories
• Types of categorical variables – ordinal, nominal
and dichotomous (binary)
11/26/2019 2
Categorical data
Age (years) (< 15, 15 -30, 30-45, 45 +)
Gender (M, F)
Diagnosis (Normal, Abnormal)
Improvement (Mild, Moderate, Fair)
SES (Low, Medium, High)
Locality (Rural / Urban)
Anxiety score (< 13, 13 -23, 24-40, 41+)
Ordinal
Nominal
Nominal
Ordinal
Ordinal
Nominal
Ordinal
11/26/2019 3
Usually we want to study associations between two or
more variables
 Quantitative var’s : show data using scatterplots,
correlation
Categorical var’s : show data using contingency tables
Mixture of categorical var. and quantitative var : can
give numerical summaries (mean, standard deviation) or
side-by-side box plots for the groups
 Men: mean = 7.0, s = 8.4
Women: mean = 5.9, s = 6.0
Bivariate Description
11/26/2019 4
 Cross classifications of categorical variables in which
 rows (typically) represent categories of explanatory variable and
 columns represent categories of response variable.
 Counts in “cells” of the table give the numbers of individuals
at the corresponding combination of levels of the two
variables
 Contingency tables enable us to compare one characteristic
of the sample, e.g. Smoking defined by another categorical
variable, e.g. gender
Contingency Tables
11/26/2019 5
Sl
No
Gend
er
Smokin
g
1 M Y
2 F Y
3 M N
4 F Y
5 M N
6 F N
7 M Y
8 F N
9 F N
10 M Y
Contingency table (Bivariate)
Smoking
Status
Gender
Yes No Total
Male 3 2 5
Female 2 3 5
Total 5 5 10
11/26/2019 6
Row and column totals are called Marginal counts
Happiness and Family Income
Income Happiness
Very Pretty Not too Row
Total
Above
Average
164 233 26 423
Average 293 473 117 883
Below
Average
132 383 172 687
Col Total 589 1089 315 1993
11/26/2019 7
Can summarize by percentages on response variable (happiness)
Happiness
Income Very Pretty Not too Total
--------------------------------------------
Above 164 (39%) 233 (55%) 26 (6%) 423
Average 293 (33%) 473 (54%) 117 (13%) 883
Below 132 (19%) 383 (56%) 172 (25%) 687
----------------------------------------------
Example: Percentage “very happy” is
39% for above aver. income (164/423 = 0.39)
33% for average income (293/883 = 0.33)
19% for below average income (??)
What can a contingency table do ?
11/26/2019 8
Association between two categorical variables.
 For example, you want to know if there is any
association between gender and smoking.
Is there any association between taking aspirin and
risk of heart attack in the population.
To test whether lung cancer is associated with smoking
or not.
Diabetes associated with type of occupation or not.
What can a contingency table do ?
11/26/2019 9
Observed frequencies
• Depending on the subjects’ response, the data could be
summarized in a table.
• The observed numbers or counts in the table are the
observed frequencies.
This is what we have observed in the random sample of 80 subjects.
Gender
Headache Marginal
total (row)
Yes No
Men 10 30 40
Women 23 17 40
Marginal total
(column)
33 47 80
11/26/2019 10
Sample to population …
 Knowing the incidence of headache of this 80 subjects, with
great certainty is of limited use to us.
 On the basis of observed frequencies (or %), we can make
claims about the sample itself, but we cannot generalize to
make claims about the population from which we drew our
sample, unless we submit our results to a test of statistical
significance.
 A test of statistical significance tells us how confidently we
can generalize to a larger (unmeasured) population from a
(measured) sample of that population.
11/26/2019 11
Pearson’s Chi-Square test is used to test for a relationship
between 2 Categorical variables.
Ho: There is no association between the variables.
Ha: There is an association between the variables.
Relation between two categorical variables
What does it mean for two categorical variables to be
related?
• Determine the being male/female makes an individual more
likely to get headache
• If that is the case, then we can say that sex and headache
are related or associated.
• But does not show any causality.
11/26/2019 12
Steps in test of hypothesis . .
1. Find out the type of problem and the question to be answered.
2. State the null & alternative hypotheses.
3. Calculate the standard error.
4. Calculate the critical ratio. Generally this is given by
Difference between the means (proportions)
---------------------------------------
Standard error of the difference
5. Compare the value observed in the experiment with
that given by the table, at a predetermined significance
level.
6. Make inferences.
11/26/2019 13
Testing of hypothesis
We need to measure how different our observed results are
from the null hypothesis.
How does chi-square do this?
It compares the observed frequencies in our sample with the
expected frequencies.
What are expected frequencies ?
11/26/2019 14
Expected frequencies
 If the Null Hypothesis was true, what would be the frequencies for each cell?
 Numbers of men and women with and without headache we would expect to
be same, if there is no relation between gender & headache.
 i.e., if men and women were equally affected by headache, we would have
expected these numbers in our sample of 80 people.
Gender Headache TOTAL
Yes No
Men 10 30 40
Women 23 17 40
TOTAL 33 47 80
11/26/2019 15
 Under the assumption of no association between gender
and headache, the expected numbers or counts in the table
are
Expected number = (row total)(column total)
(table total)
Gender Headache TOTAL
Yes No
Men 10 16.5 30 23.5 40
Women 23 16.5 17 23.5 40
TOTAL 33 47 80
Expected frequencies (Cont..)
11/26/2019 16
Chi-square value
2
cal = (10 – 16.5)2/16.5 + (30 – 23.5)2/23.5 + (23 – 16.5)2/16.5 +(17– 23.5)2/23.5
The Chi-square value is a single number that adds up all the
differences between the observed data and the expected
data.
Gender Headache TOTAL
Yes No
Men 10 16.5 30 23.5 40
Women 23 16.5 17 23.5 40
TOTAL 33 47 80
11/26/2019 17
𝑐𝑕𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 =
(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 )2
𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑
𝑛
𝑖=1
𝑓(χ2 ) =
(
1
2
)
𝑛
2
Γ( 𝑛
2)
[exp −
χ2
2 ](χ2)( 𝑛
2−1)
; 0 ≤ χ2 ≤ ∞
Chi-square distribution and Level of significance (α)
predetermined value
11/26/2019 18
Chi-square table
11/26/2019 19
Theoretical Chi-square value
Look up the theoretical Chi-square value in 2 distribution
table with d.f = (r-1)*(c-1), to see if it is big enough to
indicate a significant association of headache & gender.
For a 2 x 2 table like this, d.f is (2-1)*(2-1) =1.
Critical value 2 1, 0.05 = 3.841.
11/26/2019 20
Degrees of freedom
Degrees of freedom are the number of independent pieces of
information in the data set.
In a contingency table, the degrees of freedom are calculated as the
product of the # of rows -1 and the # of columns -1, or (r-1)(c-1).
Gender Shirt Colour Total
Black Blue White
Men 40
Women 40
TOTAL 26 37 17 80
Gender Headache TOTAL
Yes No
Men 40
Women 40
TOTAL 33 47 80
10 30
23 17
10 14
11/26/2019 21
Chi-square values….
If the observed data and expected data are identical (i.e., if
there is no difference), the Chi-square value is 0.
Greater differences between expected and observed data
produce a larger Chi-square value.
Larger the Chi-square value, greater the probability that there
really is a significant association.
11/26/2019 22
Assumptions of 2 test
 The sample must be randomly drawn from the population.
 Data must be reported in raw frequencies (not percentages).
 Categories of the variables must be mutually exclusive &
exhaustive.
 Expected frequencies cannot be too small, expected frequency
should be more than 5 in at least 80% of the cells and all
individual expected count is 1 or greater (Yates, More and
McCabe, 1999, p734).
11/26/2019 23
Tables of higher dimensions
Chi-square test can be employed for tables of higher dimensions too.
Happiness
Income Very Pretty Not too Total
-------------------------------
Above Aver. 164 233 26 423
Average 293 473 117 883
Below Aver. 132 383 172 687
------------------------------
Total 589 1089 315 1993
For higher dimensional table post-hoc comparison can be using bonferroni
Correction And taking
All three pairs categories (corrected α = 0.05/3 compared pairs = 0.017)
Considering only two categories (corrected α = 0.05/2 =0.025
Considering all the cells (corrected α = 0.05/9 = 0.00556
11/26/2019 24
Higher dimension tables..
OCCUPATION KNOWLEDGE
Poor Good
Govt sector 3 (20) - -
Pvt sector - - 6 (40)
Business 5 (33) 6 (40)
Unemployed 7 (47) 3 (20)
OCCUPATION KNOWLEDGE
Poor Good
Govt / Pvt 3 (20) 6 (40)
Business 5 (33) 6 (40)
Unemployed 7 (47) 3 (20)
Chi-Square
Value df
Asymp. Sig.
(2-
sided)
2.691(a) 2 .260
Chi-Square
Value df
Asymp. Sig.
(2-
sided)
10.691(a) 3 .014
(a) 4 cells (50.0%) have expected count < 5.
(a) 2 cells (33.3%) have expected count < 5.
11/26/2019 25
Family
type
Depression
Normal Borderline Abnormal
Nuclear
(n=39)
1 (50) 2 (67) 36 (95)
Joint
(n=4)
1 (50) 1 (33) 2 (5)
Chi-Square
Value
Df Asymp. Sig.
(2-sided)
6.715(a) 2 0.035
Higher dimension tables..
(a) 5 cells (83.3%) have expected count < 5
M-W U test would be appropriate here
11/26/2019 26
 Can often combine columns/rows to increase expected counts
that are too low
 may increase or reduce interpretability
 may create or destroy structure in the table
 There is no clear guidelines
 avoid simply trying to identify the combination of cells that
produces a “significant” result.
Collapsing tables
11/26/2019 27
 chi-square is basically a measure of significance.
 it is not a good measure of strength of association.
 It can help you decide if an association exists,
but not to tell how strong it is.
11/26/2019 28
In such cases, Yates Correction for continuity is applied to
maintain the character of continuity of the distribution.
The formula for Chi-square test with Yates correction is:
N ( |observed – expected| - 0.5 )2
Chi-square = -------------------------------------
expected
Yate’s Correction
Chi-square distribution is a continuous distribution and it fails
to maintain its character of continuity even if any one of the
expected frequencies is less than 5.
11/26/2019 29
Fisher’s Exact Test
Column 1 Column 2 Total
Row 1 a b R1= a+b
Row 2 c d R2= c+d
Total C1= a+c C2= b+d N = R1+R2
The test can be used in case of 2x2 contingency tables when the
sample sizes are small.
Fisher’s exact probability is given by
R1! R2! C1! C2!
p = -------------------, where N! = 1x2x3x…….x(N-1)xN
N! a! b! c! d!
11/26/2019 30
p = (a+b)! (a+c)! (b+d)! (c+d)! / (N! a! b! c! d!)
p = 5! 5! 6! 6! / 11! 4! 1! 1! 5!
p = .065
Gender
Age group
<20 yrs > 20 yrs
Male 4 1
Female 1 5
11
If the p value is less than or equal to 0.05, the null hypothesis
is rejected and the difference between the rows (or columns)
is considered significant.
11/26/2019 31
Cochran (1954) suggests :
The decision regarding use of Chi-square should
be guided by the following considerations:
1. When N > 40, use Chi-square corrected for continuity.
2. When N is between 20 and 40, the Chi-square test may
be used if all the expected frequencies are 5 or more.
If any expected frequency is less than 5, use the Fisher’s
Exact probability test.
3. When N < 20, use Fisher’s test in all cases.
11/26/2019 32
McNemar’s Test
Used in case of two related samples or there are repeated
measurements
Can be used to test for significance of changes in “before-after”
designs in which each person is used as his own control.
Thus the test can be used
 to test the effectiveness of a treatment /training program /
therapy /intervention…. or
 to compare the ratings of two judges on the same set of
individuals .
11/26/2019 33
PRE POST Total
Normal Abnormal
Normal a b R1
Abnormal c d R2
Total C1 C2 N
Pre Rx
Post Rx
Total
Mild Severe
Mild 40 8 48
Severe 45 7 52
Total 85 15 100
McNemar’s Test …
I
W
PRE Rx
POST - Rx Tot
Severe Mild
Severe 7 19 26
Mild 4 20 24
Tot. 11 39 50
I
W
11/26/2019 34
The McNemar’s test statistic is given by:
 𝟐
=
( 𝒃 − 𝒄 − 𝟏) 𝟐
(𝒃 + 𝒄)
(Edwards, 1948)
• This follows a 2 of degree of freedom 1 for large sample size (b+c>25)
• For sample size less than 25, P value is computed using binomial
probability
(McNemar, 1947)
Corrected for continuity:
𝑃 = 2 ∗
𝑛
𝑥
0.5𝑖
(1 − 0.5) 𝑛−𝑖
𝑛
𝑖=𝑏
• Where n= b+c, and P is used to obtain two-sided P value
 𝟐
=
(𝒃 − 𝒄) 𝟐
(𝒃 + 𝒄)
𝑏 ≥ 𝑐
11/26/2019 35
Phi Coefficient
• Only used on 2X2 contingency tables.
• Interpreted as a measure of the relative (strength) of an association
Between two variables ranging from 0 to 1
𝑃𝑕𝑖 𝜙 =
𝜒2
𝑛
=
𝑎𝑑−𝑏𝑐
(𝑎+𝑏)(𝑎+𝑐)(𝑐+𝑑)(𝑏+𝑑)
n = total number of observation
Sex Smoking Total
Yes No
Male a b a+b
Female c d C+d
Total a+c b+d n
11/26/2019 45
Pearson’s Contingency Coefficient (C)
• It is interpreted as a measure of relative (strength) of an
association between two variables
• The coefficient will always be less than 1 and varies according to
the number of rows and columns.
• This can be used for general rXc tables.
• It ranges between 0 to 1
𝐶 =
𝜒2
𝑛 + 𝜒2
=
𝜙
1 + 𝜙
11/26/2019 46
Cramer’s V Coefficient (V)
• Useful for comparing multiple 𝜒2 test statistics and is generalizable
across tables of varying size.
• It is not affected by sample size and therefore is very useful in situations
where you expect statistically significant chi-square was the result of
large sample size instead of any substantive relationship between the variables.
• It is interpreted as a measure of the relative (strength) of an association
between variables.
• The coefficient ranges from 0 to 1 (perfect association).
• In practice, you may find that a Cramer’s V of 0.10 provides a good minimum
threshold for suggesting there is a substantive relationship two variables
𝐶 =
𝜒2
𝑛(𝑞 − 1)
Where, q= smaller number of rows or columns
11/26/2019 47
• There are lots of other measures of association.
• When both variables are nominal the previous measures are fine
and there are certainly many more.
• For cases where both variables are ordinal common measures
include Kendall’s tau or spearman’s rank correlation or Goodman-
Kruskal gamma.
In some cases we wish to measure the degree of exact agreement between
two nominal or ordinal variables measured using the same levels or scales in
which case we generally use Cohen’s Kappa (k).
0 to 0.1 Little if any association
0.1 to 0.3 Low association
0.3 to 0.5 Moderate association
> 0.5 High assocation
Describing strength of association (Characterizations)
Other types of association measures
11/26/2019 48
Measurement of agreement
• Agreement between measurements refers to the degree of concordance
between two (or more) sets of measurements.
• This is one method to measure Examiner Reliability.
• the percentage of judgements where the two examiners have
agreed compared to the total number of judgements made
Percentage agreement
11/26/2019 49
Percent agreement (Example)
Resident 1-
Lecture helpful?
Yes No Total
Resident
2-
Lecture
helpful?
Yes 1 6 7
No 9 84 93
Total 10 90 100
Percentage Agreement is equal to
the sum of the diagonal values
divided by the overall total and
multiplied by 100.
11/26/2019 50
Number of agreements = sum of diagonals
= 61
Total number of cases = overall total
= 100
Percentage agreement = 61%
Percent agreement (Example)
𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
𝑋100
11/26/2019 51
11/26/2019 52
Kappa Statistic
Kappa Statistic
 The Kappa Statistic measures the agreement between the evaluations of
two examiners when both are rating the same objects.
 It describes agreement achieved beyond chance, as a proportion of that
agreement which is possible beyond chance.
 The value of the Kappa Statistic ranges from -1 to 1, with larger values
indicating better reliability.
A value of 1 indicates perfect agreement.
A value of 0 indicates that agreement is no better than chance.
 Generally, a Kappa > 0.60 is considered satisfactory.
11/26/2019 53
0.00 Agreement is no better than chance
0.01-0.20 Slight agreement
0.21-0.40 Fair agreement
0.41-0.60 Moderate agreement
0.61-0.80 Substantial agreement
0.81-0.99 Almost perfect agreement
1.00 Perfect agreement
Kappa Statistic
𝐾𝑎𝑝𝑝𝑎 =
𝑃0 − 𝑃𝐸
1 − 𝑃𝐸
Where:
𝑃0 = proportion of observed agreement
𝑃𝐸 = proportion of expected agreement by chance
11/26/2019 54
Resident 1-
Lecture helpful?
Yes No Total
Resident
2-
Lecture
helpful?
A 15 5 20
B 10 70 80
Total 25 75 100
PO is the sum of the diagonals
divided by the overall total.
Kappa Statistic
11/26/2019 55
56
Example - Kappa Statistic
Resident 1-
Lecture helpful?
Yes No Total
Resident
2-
Lecture
helpful?
Yes 15 5 20
No 10 70 80
Total 25 75 100
PE is the sum of each row total multiplied by
the corresponding column total divided by square
of the overall total
11/26/2019
57
Example - Kappa Statistic
 Number of agreements = sum of diagonals = 85
 Total number of cases = overall total = 100
 PO = 0.85
𝑃𝑜 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
𝑃𝐸 =
𝑆𝑢𝑚 𝑜𝑓 𝑒𝑎𝑐𝑕 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑏𝑦 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
(𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠)2
𝑃𝐸 =
20∗25 +(75∗80)
(100)2
𝑃𝐸 =
500+6000
10000
= 0.65
𝐾𝑎𝑝𝑝𝑎 =
𝑃0 − 𝑃𝐸
1 − 𝑃𝐸
=
0.85 − 0.65
1 − 0.65
= 0.57
11/26/2019
11/26/2019 58
Weighted Kappa
• agreement across major categories in which there is
meaningful difference
𝑙𝑖𝑛𝑒𝑎𝑟 𝑤𝑒𝑖𝑔𝑕𝑡 = 1 −
|𝑖 − 𝑗|
𝑘 − 1 quadratic 𝑤𝑒𝑖𝑔𝑕𝑡 = 1 − (
𝑖−𝑗
𝑘−1
)2
No pain Mild Pain Moderate Pain Severe pain
Linear weight 1 0.67 0.33 0
Quadratic weight 1 0.89 0.56 0
11/26/2019 59
Weighted Kappa
Weighted Kappa
𝑀𝑜𝑑𝑖𝑓𝑖𝑒𝑑 𝐾𝑎𝑝𝑝𝑎 =
𝐼𝐶𝑉𝐼 − 𝑃𝐶
1 − 𝑃𝐶
ICVI = Item level content validity index
Percent agreement due to chance
𝑃𝐶 =
𝑁!
𝐴! (𝑁 − 𝐴)!
(0.5) 𝑁
N=14; A= Agreed
𝐼𝐶𝑉𝐼 =
𝑁𝑜. 𝑜𝑓 𝑟𝑎𝑡𝑒𝑟 𝑟𝑎𝑡𝑒𝑑 𝑖𝑡𝑒𝑚 𝑖𝑠 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑡𝑒𝑟
11/26/2019 60
 Chi-square Test for Trend in Binomial Proportions tests whether or
not p1 < p2 < p3 < … < pk where 1, 2, …, k are levels of an ordinal variable,
i.e. 2 X k table.
 Chi-square Goodness-of-Fit Tests – used test whether observations
come from some hypothesized distribution.
 Cochran-Mantel-Haenszel Test – Looks at whether or not there is a
relationship in a 2 X 2 table situation adjusting for the level of a third
factor. For example, is there a relationship between heavy drinking (Y or
N) and lung cancer (Y or N) adjusting for smoking status.
 Log-linear model - describing independence, interactions or associations
between two, three or more categorical variables
Other Tests for Categorical Data
11/26/2019 61
11/26/2019 62

Categorical data analysis

  • 1.
    Sumit Kumar Das PhDStudent Dept of Biostatistics NIMHANS Analysis of qualitative data: Contingency tables; Chi-square test, Kappa measure of agreement (Inference IV) 11/26/2019 1
  • 2.
    Types of Variables Continuousvariables: • Always numeric • Can be any number, positive or negative • Examples: age in years, weight, blood pressure readings, temperature, concentrations of pollutants and other measurements Categorical variables: • Information that can be sorted into categories • Types of categorical variables – ordinal, nominal and dichotomous (binary) 11/26/2019 2
  • 3.
    Categorical data Age (years)(< 15, 15 -30, 30-45, 45 +) Gender (M, F) Diagnosis (Normal, Abnormal) Improvement (Mild, Moderate, Fair) SES (Low, Medium, High) Locality (Rural / Urban) Anxiety score (< 13, 13 -23, 24-40, 41+) Ordinal Nominal Nominal Ordinal Ordinal Nominal Ordinal 11/26/2019 3
  • 4.
    Usually we wantto study associations between two or more variables  Quantitative var’s : show data using scatterplots, correlation Categorical var’s : show data using contingency tables Mixture of categorical var. and quantitative var : can give numerical summaries (mean, standard deviation) or side-by-side box plots for the groups  Men: mean = 7.0, s = 8.4 Women: mean = 5.9, s = 6.0 Bivariate Description 11/26/2019 4
  • 5.
     Cross classificationsof categorical variables in which  rows (typically) represent categories of explanatory variable and  columns represent categories of response variable.  Counts in “cells” of the table give the numbers of individuals at the corresponding combination of levels of the two variables  Contingency tables enable us to compare one characteristic of the sample, e.g. Smoking defined by another categorical variable, e.g. gender Contingency Tables 11/26/2019 5
  • 6.
    Sl No Gend er Smokin g 1 M Y 2F Y 3 M N 4 F Y 5 M N 6 F N 7 M Y 8 F N 9 F N 10 M Y Contingency table (Bivariate) Smoking Status Gender Yes No Total Male 3 2 5 Female 2 3 5 Total 5 5 10 11/26/2019 6
  • 7.
    Row and columntotals are called Marginal counts Happiness and Family Income Income Happiness Very Pretty Not too Row Total Above Average 164 233 26 423 Average 293 473 117 883 Below Average 132 383 172 687 Col Total 589 1089 315 1993 11/26/2019 7
  • 8.
    Can summarize bypercentages on response variable (happiness) Happiness Income Very Pretty Not too Total -------------------------------------------- Above 164 (39%) 233 (55%) 26 (6%) 423 Average 293 (33%) 473 (54%) 117 (13%) 883 Below 132 (19%) 383 (56%) 172 (25%) 687 ---------------------------------------------- Example: Percentage “very happy” is 39% for above aver. income (164/423 = 0.39) 33% for average income (293/883 = 0.33) 19% for below average income (??) What can a contingency table do ? 11/26/2019 8
  • 9.
    Association between twocategorical variables.  For example, you want to know if there is any association between gender and smoking. Is there any association between taking aspirin and risk of heart attack in the population. To test whether lung cancer is associated with smoking or not. Diabetes associated with type of occupation or not. What can a contingency table do ? 11/26/2019 9
  • 10.
    Observed frequencies • Dependingon the subjects’ response, the data could be summarized in a table. • The observed numbers or counts in the table are the observed frequencies. This is what we have observed in the random sample of 80 subjects. Gender Headache Marginal total (row) Yes No Men 10 30 40 Women 23 17 40 Marginal total (column) 33 47 80 11/26/2019 10
  • 11.
    Sample to population…  Knowing the incidence of headache of this 80 subjects, with great certainty is of limited use to us.  On the basis of observed frequencies (or %), we can make claims about the sample itself, but we cannot generalize to make claims about the population from which we drew our sample, unless we submit our results to a test of statistical significance.  A test of statistical significance tells us how confidently we can generalize to a larger (unmeasured) population from a (measured) sample of that population. 11/26/2019 11
  • 12.
    Pearson’s Chi-Square testis used to test for a relationship between 2 Categorical variables. Ho: There is no association between the variables. Ha: There is an association between the variables. Relation between two categorical variables What does it mean for two categorical variables to be related? • Determine the being male/female makes an individual more likely to get headache • If that is the case, then we can say that sex and headache are related or associated. • But does not show any causality. 11/26/2019 12
  • 13.
    Steps in testof hypothesis . . 1. Find out the type of problem and the question to be answered. 2. State the null & alternative hypotheses. 3. Calculate the standard error. 4. Calculate the critical ratio. Generally this is given by Difference between the means (proportions) --------------------------------------- Standard error of the difference 5. Compare the value observed in the experiment with that given by the table, at a predetermined significance level. 6. Make inferences. 11/26/2019 13
  • 14.
    Testing of hypothesis Weneed to measure how different our observed results are from the null hypothesis. How does chi-square do this? It compares the observed frequencies in our sample with the expected frequencies. What are expected frequencies ? 11/26/2019 14
  • 15.
    Expected frequencies  Ifthe Null Hypothesis was true, what would be the frequencies for each cell?  Numbers of men and women with and without headache we would expect to be same, if there is no relation between gender & headache.  i.e., if men and women were equally affected by headache, we would have expected these numbers in our sample of 80 people. Gender Headache TOTAL Yes No Men 10 30 40 Women 23 17 40 TOTAL 33 47 80 11/26/2019 15
  • 16.
     Under theassumption of no association between gender and headache, the expected numbers or counts in the table are Expected number = (row total)(column total) (table total) Gender Headache TOTAL Yes No Men 10 16.5 30 23.5 40 Women 23 16.5 17 23.5 40 TOTAL 33 47 80 Expected frequencies (Cont..) 11/26/2019 16
  • 17.
    Chi-square value 2 cal =(10 – 16.5)2/16.5 + (30 – 23.5)2/23.5 + (23 – 16.5)2/16.5 +(17– 23.5)2/23.5 The Chi-square value is a single number that adds up all the differences between the observed data and the expected data. Gender Headache TOTAL Yes No Men 10 16.5 30 23.5 40 Women 23 16.5 17 23.5 40 TOTAL 33 47 80 11/26/2019 17 𝑐𝑕𝑖 − 𝑠𝑞𝑢𝑎𝑟𝑒 = (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 )2 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑛 𝑖=1
  • 18.
    𝑓(χ2 ) = ( 1 2 ) 𝑛 2 Γ(𝑛 2) [exp − χ2 2 ](χ2)( 𝑛 2−1) ; 0 ≤ χ2 ≤ ∞ Chi-square distribution and Level of significance (α) predetermined value 11/26/2019 18
  • 19.
  • 20.
    Theoretical Chi-square value Lookup the theoretical Chi-square value in 2 distribution table with d.f = (r-1)*(c-1), to see if it is big enough to indicate a significant association of headache & gender. For a 2 x 2 table like this, d.f is (2-1)*(2-1) =1. Critical value 2 1, 0.05 = 3.841. 11/26/2019 20
  • 21.
    Degrees of freedom Degreesof freedom are the number of independent pieces of information in the data set. In a contingency table, the degrees of freedom are calculated as the product of the # of rows -1 and the # of columns -1, or (r-1)(c-1). Gender Shirt Colour Total Black Blue White Men 40 Women 40 TOTAL 26 37 17 80 Gender Headache TOTAL Yes No Men 40 Women 40 TOTAL 33 47 80 10 30 23 17 10 14 11/26/2019 21
  • 22.
    Chi-square values…. If theobserved data and expected data are identical (i.e., if there is no difference), the Chi-square value is 0. Greater differences between expected and observed data produce a larger Chi-square value. Larger the Chi-square value, greater the probability that there really is a significant association. 11/26/2019 22
  • 23.
    Assumptions of 2test  The sample must be randomly drawn from the population.  Data must be reported in raw frequencies (not percentages).  Categories of the variables must be mutually exclusive & exhaustive.  Expected frequencies cannot be too small, expected frequency should be more than 5 in at least 80% of the cells and all individual expected count is 1 or greater (Yates, More and McCabe, 1999, p734). 11/26/2019 23
  • 24.
    Tables of higherdimensions Chi-square test can be employed for tables of higher dimensions too. Happiness Income Very Pretty Not too Total ------------------------------- Above Aver. 164 233 26 423 Average 293 473 117 883 Below Aver. 132 383 172 687 ------------------------------ Total 589 1089 315 1993 For higher dimensional table post-hoc comparison can be using bonferroni Correction And taking All three pairs categories (corrected α = 0.05/3 compared pairs = 0.017) Considering only two categories (corrected α = 0.05/2 =0.025 Considering all the cells (corrected α = 0.05/9 = 0.00556 11/26/2019 24
  • 25.
    Higher dimension tables.. OCCUPATIONKNOWLEDGE Poor Good Govt sector 3 (20) - - Pvt sector - - 6 (40) Business 5 (33) 6 (40) Unemployed 7 (47) 3 (20) OCCUPATION KNOWLEDGE Poor Good Govt / Pvt 3 (20) 6 (40) Business 5 (33) 6 (40) Unemployed 7 (47) 3 (20) Chi-Square Value df Asymp. Sig. (2- sided) 2.691(a) 2 .260 Chi-Square Value df Asymp. Sig. (2- sided) 10.691(a) 3 .014 (a) 4 cells (50.0%) have expected count < 5. (a) 2 cells (33.3%) have expected count < 5. 11/26/2019 25
  • 26.
    Family type Depression Normal Borderline Abnormal Nuclear (n=39) 1(50) 2 (67) 36 (95) Joint (n=4) 1 (50) 1 (33) 2 (5) Chi-Square Value Df Asymp. Sig. (2-sided) 6.715(a) 2 0.035 Higher dimension tables.. (a) 5 cells (83.3%) have expected count < 5 M-W U test would be appropriate here 11/26/2019 26
  • 27.
     Can oftencombine columns/rows to increase expected counts that are too low  may increase or reduce interpretability  may create or destroy structure in the table  There is no clear guidelines  avoid simply trying to identify the combination of cells that produces a “significant” result. Collapsing tables 11/26/2019 27
  • 28.
     chi-square isbasically a measure of significance.  it is not a good measure of strength of association.  It can help you decide if an association exists, but not to tell how strong it is. 11/26/2019 28
  • 29.
    In such cases,Yates Correction for continuity is applied to maintain the character of continuity of the distribution. The formula for Chi-square test with Yates correction is: N ( |observed – expected| - 0.5 )2 Chi-square = ------------------------------------- expected Yate’s Correction Chi-square distribution is a continuous distribution and it fails to maintain its character of continuity even if any one of the expected frequencies is less than 5. 11/26/2019 29
  • 30.
    Fisher’s Exact Test Column1 Column 2 Total Row 1 a b R1= a+b Row 2 c d R2= c+d Total C1= a+c C2= b+d N = R1+R2 The test can be used in case of 2x2 contingency tables when the sample sizes are small. Fisher’s exact probability is given by R1! R2! C1! C2! p = -------------------, where N! = 1x2x3x…….x(N-1)xN N! a! b! c! d! 11/26/2019 30
  • 31.
    p = (a+b)!(a+c)! (b+d)! (c+d)! / (N! a! b! c! d!) p = 5! 5! 6! 6! / 11! 4! 1! 1! 5! p = .065 Gender Age group <20 yrs > 20 yrs Male 4 1 Female 1 5 11 If the p value is less than or equal to 0.05, the null hypothesis is rejected and the difference between the rows (or columns) is considered significant. 11/26/2019 31
  • 32.
    Cochran (1954) suggests: The decision regarding use of Chi-square should be guided by the following considerations: 1. When N > 40, use Chi-square corrected for continuity. 2. When N is between 20 and 40, the Chi-square test may be used if all the expected frequencies are 5 or more. If any expected frequency is less than 5, use the Fisher’s Exact probability test. 3. When N < 20, use Fisher’s test in all cases. 11/26/2019 32
  • 33.
    McNemar’s Test Used incase of two related samples or there are repeated measurements Can be used to test for significance of changes in “before-after” designs in which each person is used as his own control. Thus the test can be used  to test the effectiveness of a treatment /training program / therapy /intervention…. or  to compare the ratings of two judges on the same set of individuals . 11/26/2019 33
  • 34.
    PRE POST Total NormalAbnormal Normal a b R1 Abnormal c d R2 Total C1 C2 N Pre Rx Post Rx Total Mild Severe Mild 40 8 48 Severe 45 7 52 Total 85 15 100 McNemar’s Test … I W PRE Rx POST - Rx Tot Severe Mild Severe 7 19 26 Mild 4 20 24 Tot. 11 39 50 I W 11/26/2019 34
  • 35.
    The McNemar’s teststatistic is given by:  𝟐 = ( 𝒃 − 𝒄 − 𝟏) 𝟐 (𝒃 + 𝒄) (Edwards, 1948) • This follows a 2 of degree of freedom 1 for large sample size (b+c>25) • For sample size less than 25, P value is computed using binomial probability (McNemar, 1947) Corrected for continuity: 𝑃 = 2 ∗ 𝑛 𝑥 0.5𝑖 (1 − 0.5) 𝑛−𝑖 𝑛 𝑖=𝑏 • Where n= b+c, and P is used to obtain two-sided P value  𝟐 = (𝒃 − 𝒄) 𝟐 (𝒃 + 𝒄) 𝑏 ≥ 𝑐 11/26/2019 35
  • 36.
    Phi Coefficient • Onlyused on 2X2 contingency tables. • Interpreted as a measure of the relative (strength) of an association Between two variables ranging from 0 to 1 𝑃𝑕𝑖 𝜙 = 𝜒2 𝑛 = 𝑎𝑑−𝑏𝑐 (𝑎+𝑏)(𝑎+𝑐)(𝑐+𝑑)(𝑏+𝑑) n = total number of observation Sex Smoking Total Yes No Male a b a+b Female c d C+d Total a+c b+d n 11/26/2019 45
  • 37.
    Pearson’s Contingency Coefficient(C) • It is interpreted as a measure of relative (strength) of an association between two variables • The coefficient will always be less than 1 and varies according to the number of rows and columns. • This can be used for general rXc tables. • It ranges between 0 to 1 𝐶 = 𝜒2 𝑛 + 𝜒2 = 𝜙 1 + 𝜙 11/26/2019 46
  • 38.
    Cramer’s V Coefficient(V) • Useful for comparing multiple 𝜒2 test statistics and is generalizable across tables of varying size. • It is not affected by sample size and therefore is very useful in situations where you expect statistically significant chi-square was the result of large sample size instead of any substantive relationship between the variables. • It is interpreted as a measure of the relative (strength) of an association between variables. • The coefficient ranges from 0 to 1 (perfect association). • In practice, you may find that a Cramer’s V of 0.10 provides a good minimum threshold for suggesting there is a substantive relationship two variables 𝐶 = 𝜒2 𝑛(𝑞 − 1) Where, q= smaller number of rows or columns 11/26/2019 47
  • 39.
    • There arelots of other measures of association. • When both variables are nominal the previous measures are fine and there are certainly many more. • For cases where both variables are ordinal common measures include Kendall’s tau or spearman’s rank correlation or Goodman- Kruskal gamma. In some cases we wish to measure the degree of exact agreement between two nominal or ordinal variables measured using the same levels or scales in which case we generally use Cohen’s Kappa (k). 0 to 0.1 Little if any association 0.1 to 0.3 Low association 0.3 to 0.5 Moderate association > 0.5 High assocation Describing strength of association (Characterizations) Other types of association measures 11/26/2019 48
  • 40.
    Measurement of agreement •Agreement between measurements refers to the degree of concordance between two (or more) sets of measurements. • This is one method to measure Examiner Reliability. • the percentage of judgements where the two examiners have agreed compared to the total number of judgements made Percentage agreement 11/26/2019 49
  • 41.
    Percent agreement (Example) Resident1- Lecture helpful? Yes No Total Resident 2- Lecture helpful? Yes 1 6 7 No 9 84 93 Total 10 90 100 Percentage Agreement is equal to the sum of the diagonal values divided by the overall total and multiplied by 100. 11/26/2019 50
  • 42.
    Number of agreements= sum of diagonals = 61 Total number of cases = overall total = 100 Percentage agreement = 61% Percent agreement (Example) 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡𝑠 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝑋100 11/26/2019 51
  • 43.
  • 44.
    Kappa Statistic  TheKappa Statistic measures the agreement between the evaluations of two examiners when both are rating the same objects.  It describes agreement achieved beyond chance, as a proportion of that agreement which is possible beyond chance.  The value of the Kappa Statistic ranges from -1 to 1, with larger values indicating better reliability. A value of 1 indicates perfect agreement. A value of 0 indicates that agreement is no better than chance.  Generally, a Kappa > 0.60 is considered satisfactory. 11/26/2019 53
  • 45.
    0.00 Agreement isno better than chance 0.01-0.20 Slight agreement 0.21-0.40 Fair agreement 0.41-0.60 Moderate agreement 0.61-0.80 Substantial agreement 0.81-0.99 Almost perfect agreement 1.00 Perfect agreement Kappa Statistic 𝐾𝑎𝑝𝑝𝑎 = 𝑃0 − 𝑃𝐸 1 − 𝑃𝐸 Where: 𝑃0 = proportion of observed agreement 𝑃𝐸 = proportion of expected agreement by chance 11/26/2019 54
  • 46.
    Resident 1- Lecture helpful? YesNo Total Resident 2- Lecture helpful? A 15 5 20 B 10 70 80 Total 25 75 100 PO is the sum of the diagonals divided by the overall total. Kappa Statistic 11/26/2019 55
  • 47.
    56 Example - KappaStatistic Resident 1- Lecture helpful? Yes No Total Resident 2- Lecture helpful? Yes 15 5 20 No 10 70 80 Total 25 75 100 PE is the sum of each row total multiplied by the corresponding column total divided by square of the overall total 11/26/2019
  • 48.
    57 Example - KappaStatistic  Number of agreements = sum of diagonals = 85  Total number of cases = overall total = 100  PO = 0.85 𝑃𝑜 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡𝑠 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝑃𝐸 = 𝑆𝑢𝑚 𝑜𝑓 𝑒𝑎𝑐𝑕 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑏𝑦 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 (𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠)2 𝑃𝐸 = 20∗25 +(75∗80) (100)2 𝑃𝐸 = 500+6000 10000 = 0.65 𝐾𝑎𝑝𝑝𝑎 = 𝑃0 − 𝑃𝐸 1 − 𝑃𝐸 = 0.85 − 0.65 1 − 0.65 = 0.57 11/26/2019
  • 49.
    11/26/2019 58 Weighted Kappa •agreement across major categories in which there is meaningful difference 𝑙𝑖𝑛𝑒𝑎𝑟 𝑤𝑒𝑖𝑔𝑕𝑡 = 1 − |𝑖 − 𝑗| 𝑘 − 1 quadratic 𝑤𝑒𝑖𝑔𝑕𝑡 = 1 − ( 𝑖−𝑗 𝑘−1 )2 No pain Mild Pain Moderate Pain Severe pain Linear weight 1 0.67 0.33 0 Quadratic weight 1 0.89 0.56 0
  • 50.
  • 51.
    Weighted Kappa 𝑀𝑜𝑑𝑖𝑓𝑖𝑒𝑑 𝐾𝑎𝑝𝑝𝑎= 𝐼𝐶𝑉𝐼 − 𝑃𝐶 1 − 𝑃𝐶 ICVI = Item level content validity index Percent agreement due to chance 𝑃𝐶 = 𝑁! 𝐴! (𝑁 − 𝐴)! (0.5) 𝑁 N=14; A= Agreed 𝐼𝐶𝑉𝐼 = 𝑁𝑜. 𝑜𝑓 𝑟𝑎𝑡𝑒𝑟 𝑟𝑎𝑡𝑒𝑑 𝑖𝑡𝑒𝑚 𝑖𝑠 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑡𝑒𝑟 11/26/2019 60
  • 52.
     Chi-square Testfor Trend in Binomial Proportions tests whether or not p1 < p2 < p3 < … < pk where 1, 2, …, k are levels of an ordinal variable, i.e. 2 X k table.  Chi-square Goodness-of-Fit Tests – used test whether observations come from some hypothesized distribution.  Cochran-Mantel-Haenszel Test – Looks at whether or not there is a relationship in a 2 X 2 table situation adjusting for the level of a third factor. For example, is there a relationship between heavy drinking (Y or N) and lung cancer (Y or N) adjusting for smoking status.  Log-linear model - describing independence, interactions or associations between two, three or more categorical variables Other Tests for Categorical Data 11/26/2019 61
  • 53.