SlideShare a Scribd company logo
1 of 72
Contingency Table Analysis
• contingency tables show frequencies
produced by cross-classifying observations
• e.g., pottery described simultaneously
according to vessel form & surface
decoration
polished burnished matte
bowl 47 28 3
jar 30 42 8
olla 6 45 25
• most statistical tests for tables are designed
for analyzing 2-dimensions
– only examine the interaction of two variables at
one time…
• most efficient when used with nominal data
– using ratio data means recoding data to a lower
scale of measurement (ordinal)
– means ignoring some of the information
originally available…
• still, you might do this, particularly if you
are interested in association between metric
and non-metric variables
• e.g.: variation in pot size vs. surface
decoration…
• may decide to divide pot size into ordinal
classes…
large
medium
small
large
small
small large
specular 4 13
non-specular 15 18
rim diameter:
slip:
• other options may let you retain more of the
original information content
non-specular
slip
specular
slip
• could use a “t-test”
to test the equality
of the means
• makes full use of
ratio data…
• why do we work with contingency
tables??
polished burnished matte
bowl 47 28 3
jar 30 42 8
olla 6 45 25
• because we think there may be some kind
of interaction between the variables…
• basic question: can the state of one
variable be predicted from the state of
another variable?
• if not, they are independent
polished burnished matte
bowl 47 28 3
jar 30 42 8
olla 6 45 25
expected counts
• a baseline to which observed counts can be
compared
• counts that would occur by random chance
if the variables are independent, over the
long run
• for any cell
E = (col total * row total)/table total
M F
PP 4 1 5 45%  2.3 2.7 5
Pot 1 5 6 55% 2.7 3.3 6
Total 5 6 11 5 6 11
45% 55%
significance
• = probability of getting, by chance, a table
as or more deviant than the observed table,
if the variables are independent
– ‘deviant’ defined in terms of expected table
• no causality is necessarily implied by the
outcome
– but, causality may well be the reason for
observed association…
– e.g.: grave goods and sex
Fisher’s Exact Test
• just for 2 x 2 tables
• useful where chi-square test is inappropriate
• gives the exact probability of all tables with
• the same marginal totals
• as or more deviant
than the observed table…
P = (a+b)!(a+c)!(b+d)!(c+d)! / (N!a!b!c!d!)
P = 5!5!6!6! / 11!4!1!1!5! = 5*6!6! / 11!
P = 5*6!6! / 11! = 5*6! / 11*10*9*8*7
P = 5*6! / 11*10*9*8*7 = 3600 / 55440
P = .065
a b
c d
4 1
1 5
P = .065
use R (or Excel) if the counts aren’t too large…
> fisher.test(x)
a b
c d
4 1
1 5
0 5 5
5 1 6
5 6 11
1 4 5
4 2 6
5 6 11
2 3 5
3 3 6
5 6 11
3 2 5
2 4 6
5 6 11
4 1 5
1 5 6
5 6 11
5 0 5
0 6 6
5 6 11
0 5
5 1
1 4
4 2
2 3
3 3
3 2
2 4
4 1
1 5
5 0
0 6
2.3 2.7
2.7 3.3
0.013
0.162
0.433
0.325
0.065
0.002
• P = 0.065+0.002 = 0.067 or
• P = 0.067+0.013 = 0.080
(observed)
(expected)
• 2-tailed test = 0.067+0.013 = 0.080
• 1-tailed test = 0.065+0.002 = 0.067
M F
PP 4 1 5
Pot 1 5 6
5 6 11
> fisher.test(x, alt = "two.sided")
> fisher.test(x, alt = “greater”)
[i.e.: H1: odds ratio > 1]
in R:
Chi-square Statistic
• an aggregate measure (i.e., based on the
entire table)
• the greater the deviation from expected
values, the larger (exponentially!) the chi-
square statistic…
• one could devise others that would place
less emphasis on large deviations
 |o-e|/e
 




k
i i
i
i
E
E
O
1
2
2

• X2 is distributed approximately in accord
with the X2 probability distribution
• X2 probabilities are traditionally found in a
table showing threshold values from a CPD
– need degrees of freedom
– df = (r-1)*(c-1)
• just use R…
Status:
low intermed. high
Ritual arch.: altar 7 20 16 43
no altar 18 22 8 48
25 42 24 91
low intermed. high
altar 11.8 19.8 11.3 43
no altar 13.2 22.2 12.7 48
25 42 24 91
low intermed. high
altar 2.0 0.0 1.9 3.9
no altar 1.8 0.0 1.7 3.5
3.7 0.0 3.6 7.3
(43*24)
91
(7-11.8)2
11.8
=2
  .025
X2 assumptions & problems
• must be based on counts:
– not percentages, ratios or weighted data
• fails to give reliable results if expected
counts are too low:
2 3
3 3
2.27 2.72
2.72 3.27
obs. exp.
X2=0.74
P(Fishers)=1.0
5
6
6
5
rules of thumb
1. no expected counts less than 5
– almost certainly too stringent
2. no exp. counts less than 2, and 80% of
counts > 5
– more relaxed (but more realistic)
collapsing tables
• can often combine columns/rows to increase
expected counts that are too low
– may increase or reduce interpretability
– may create or destroy structure in the table
• no clear guidelines
– avoid simply trying to identify the combination
of cells that produces a “significant” result
8 3 6 2 19
6 1 6 5 18
6 4 5 4 19
3 12 8 3 26
23 20 25 14 82
5.3 4.6 5.8 3.2 19
5.0 4.4 5.5 3.1 18
5.3 4.6 5.8 3.2 19
7.3 6.3 7.9 4.4 26
23 20 25 14 82
11 8 19
7 11 18
10 9 19
15 11 26
43 39 82
10.0 9.0 19
9.4 8.6 18
10.0 9.0 19
13.6 12.4 26
43 39 82
obs. counts
exp. counts
obs. counts
exp. counts
• chi-square is basically a measure of
significance
• it is not a good measure of strength of
association
• can help you decide if a relationship exists,
but not how strong it is
17 13
13 17
60
34 26
26 34
120
X2=1.07
alpha=.30
X2=2.13
alpha=.14
• also, chi-square is a ‘global statistic’
• says nothing (directly) about which parts of
a table may be ‘driving’ a large chi-square
statistic
• ‘chi-square contributions’ from individual
cells can help:
low intermed. high
altar 2.0 0.0 1.9 3.9
no altar 1.8 0.0 1.7 3.5
3.7 0.0 3.6 7.3
Monte Carlo test of X2 significance
• based on simulated generation of cell-counts
under imposed conditions of independence
• randomly assign counts to cells:
23 14 8 45
15 6 13 34
38 20 21 79
• significance is simply the proportion of
outcomes that produced a X2 statistic >=
observed
• not based on any assumptions about the
distribution of the X2 statistic
• overcomes the problems associated with
small expected frequencies
G Test
• a measure of significance for any r x c table
• look up critical values of G2 in an ordinary
chi-square table; figure out degrees of
freedom the same way
• conforms to chi-square distribution better
than the chi-square statistic

















k
i i
i
e
i
E
O
O
G
1
2
log
*
2
an R function for G2
gsq.test  function(obs) {
df  (nrow(obs)-1) * (ncol(obs)-1)
exp  chisq.test(obs)$expected
G  2*sum(obs*log(obs/exp))
2*dchisq(G, df)
}
Measures of Association
Phi-Square (2)
• an attempt to remove the effects of sample
size that makes chi-square inappropriate for
measuring association
• divide chi-square by n
• 2=X2/n
• limits:
0: variables are independent
1: perfect association in a 2x2 table;
 no upper limit in larger tables
17 13
13 17
60
34 26
26 34
120
2=0.18
2=0.18
Cramer’s V
• also a measure of strength of association
• an attempt to standardize phi-square
(i.e., control the lack of an upper boundary in tables
larger than 2x2 cells)
• V= 2/m
where m=min(r-1,c-1) ; i.e., the smaller of rows-1 or
columns-1)
• limits: 0-1 for any size table; 1=highest possible
association
Yule’s Q
• for 2x2 tables only
• Q = (ad-bc)/(ad+bc)
a b
c d
Yule’s Q
• often used to assess the strength of presence /
absence association
• range is –1 (perfect negative association) to 1
(perfect positive association); values near 0
indicate a lack of association
Bone needles
+ -
Male burial + 12 14
- 16 3
Q = -.72
Yule’s Q
• not sensitive to marginal changes (unlike Phi2)
• multiply a row or column by a constant;
cancels out…
jars ollas
Source A 19 10
Source B 6 15
jars ollas
Source A 19 20
Source B 6 30
(Q=.65 for both tables)
Yule’s Q
• can’t distinguish between different degrees of ‘complete’
association
• can’t distinguish between ‘complete’ and ‘absolute’
association
M F
RHS 60 20
LHS 0 20
100
M F
RHS 60 10
LHS 0 30
100
M F
RHS 60 0
LHS 0 40
100
“odds” ratio
• easiest with 2 x 2 tables
• what are the ‘odds’ of a man being buried
on his right side, compared to those of a
woman??
• if there is a strong level of association
between sex and burial position, the odds
should be quite different…
a b
c d
a
c
b
d
odds ratio =
29/11=2.64
14/33=0.42
2.64/0.42=6.21
if there is no association, the odds ratio=1
departures from 1 range between 0 and infinity
>1 =‘positive association’
<1 =‘negative association’
M F
RHS 29 14 43
LHS 11 33 44
40 47 87
Goodman and Kruskal’s Tau ()
• “proportional reduction of error”
• how are the probabilities of correctly
assigning cases to one set of categories
improved by the knowledge of another set
of categories??
Goodman and Kruskal’s Tau ()
• limits are 0-1; 1=perfect association
• same results as Phi2 w/ 2x2 table
• sensitive to margin differences
• asymmetric
– get different results predicting row assignments
based on columns than from column
assignments based on rows
• =[P(error|rule 1)-P(error|rule 2)] / P(error|rule 1)
• rule 1: random assignments to one variable are
made with no knowledge of 2nd variable
• rule 2: random assignments to one variable are
made with knowledge of 2nd variable
B1 B2
A1
A2
6 14 20
B1 B2
A1 6 0
A2 0 14
6 14 20
Table Standardization
• even very large and highly significant X2 (or G2)
statistics don’t necessarily mean that all parts of the
table are equally “deviant” (and therefore interesting)
• usually need to do other things to highlight loci of
association or ‘interaction’
• which cells diverge the most from expected values?
• very difficult to decide when both row and column
totals are unequal…
Percent standardization
• highly intuitive approach, easy to interpret
• often used to control the effects of sample-
size variation
• have to decide if it makes better sense to
standardize based on rows, or on columns
• usually, you want to standardize whatever it
is you want to compare
– i.e., if you want to compare columns, base
percents on column totals
• you may decide to make two tables, one
standardized on rows, the other on
columns…
Site
Fauna A B C
bear 2 1 0 3
moose 15 5 10 30
coyote 2 0 0 2
rabbit 16 8 12 36
dog 2 3 0 5
deer 16 8 7 31
53 25 29 107
Site
Fauna A B C
bear 3.8 4.0 0.0
moose 28.3 20.0 34.5
coyote 3.8 0.0 0.0
rabbit 30.2 32.0 41.4
dog 3.8 12.0 0.0
deer 30.2 32.0 24.1
100 100 100
MNIs
Site
Fauna A B C
bear 66.7 33.3 0.0 100
moose 50.0 16.7 33.3 100
coyote 100.0 0.0 0.0 100
rabbit 44.4 22.2 33.3 100
dog 40.0 60.0 0.0 100
deer 51.6 25.8 22.6 100
Binomial Probabilities
• P(n,k,p):
“probability of k successes in n trials, with p
probability of success in any one trial”
5 3
1 4
13
3.7 4.3
2.3 2.7
13
n = 13
k = 5
p = 3.7/13
Binomial Probabilities
• in R:
> pbinom(k, n, p)
• easy to build into a function…
10
20
30
40
50
60
70
80
90
100
percent
10
20
30
40
50
60
70
80
90
100
cumulative
percent
K-S test for cumulative percents
10
20
30
40
50
60
70
80
90
100
cumulative
percent
10
20
30
40
50
60
70
80
90
100
cumulative
percent
• some useful
statistical
measures
(ordinal or ratio scale)
• can be
misleading
when used with
nominal data
• good for
comparing data
sets
Cumulative Percent Graph
Percentages
Sites
A B C
Types 1 5 5 5
2 45 0 30
3 5 48 5
4 5 5 5
5 5 5 5
6 5 5 5
7 20 5 35
8 5 22 5
9 5 5 5
100 100 100
Cumulative Percents
Sites
A B C
Types 1 5 5 5
2 50 5 35
3 55 53 40
4 60 58 45
5 65 63 50
6 70 68 55
7 90 73 90
8 95 95 95
9 100 100 100
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9
A
B
C
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9
A
B
C
0
20
40
60
80
100
120
1 5 3 4 2 6 7 8 9
A
B
C
K-S test
• find Dmax:
– maximum difference between 2 cumulative
proportion distributions
– compare to critical value for chosen sig. level
• C*((n1+n2)/(n1n2))^.5
– alpha =.05, C=1.36
– alpha =.01, C=1.63
– alpha =.001, C=1.95
example 2
• mortuary data (Shennan, p. 56+)
• burials characterized according to 2 wealth
(poor vs. wealthy) and 6 age categories
(infant to old age)
Rich Poor
Infans I 6 23
Infans II 8 21
Juvenilis 11 25
Adultus 29 36
Maturus 19 27
Senilis 3 4
Total 76 136
• burials for younger age-classes appear to be
more numerous among the poor
• can this be explained away as an example of
random chance?
or
• do poor burials constitute a different
population, with respect to age-classes, than
rich burials?
• we can get a visual sense of the problem
using a cumulative frequency plot:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Infans
I
Infans
II
Juvenilis
Adultus
Maturus
Senilis
rich
poor
• K-S test (Kolmogorov-Smirnov test) assesses the
significance of the maximum divergence between two
cumulative frequency curves
H0:dist1=dist2
• an equation based on the theoretical distribution of
differences between cumulative frequency curves
provides a critical value for a specific alpha level
• observed differences beyond this value can be regarded
as significant at that alpha level
• if alpha = .05, the critical value =
1.36*(n1+n2)/n1n2
1.36*(76+136)/76*136 = 0.195
• the observed value = 0.178
• 0.178 < 0.195; don’t reject H0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Infans
I
Infans
II
Juvenilis
Adultus
Maturus
Senilis
rich
poor
Dmax=.178
age <= 30 age > 30
strongly disagree 8 7
mildly disagree 5 9
disagree 6 6
no opinion 0 1
agree 2 2
mildly agree 1 3
strongly agree 2 3
statement/question: “Oil exploration should be allowed in
coastal California…”
example 2
example 3
• survey data  100 sites
• broken down by location and time:
early late Total
piedmont 31 19 50
plain 19 31 50
Total 50 50 100
• we can do a chi-square test of independence
of the two variables time and location
• H0:time & location are independent
• alpha = .05
time
location
H0
location
time
H1
• 2 values reflect accumulated differences between
observed and expected cell-counts
• expected cell counts are based on the assumptions
inherent in the null hypothesis
• if the H0 is correct, cell values should reflect an
“even” distribution of marginal totals
early late Total
piedmont 50
plain 50
Total 50 50 100
25
• chi-square = ((o-e)2/e)
• observed chi-square = 4.84
• we need to compare it to the “critical value”
in a chi-square table:
• chi-square = ((o-e)2/e)
• observed chi-square = 4.84
• chi-square table:
 critical value (alpha = .05, 1 df) is 3.84
 observed chi-square (4.84) > 3.84
• we can reject H0
• H1: time & location are not independent
• what does this mean?
early late Total
piedmont 31 19 50
plain 19 31 50
Total 50 50 100

More Related Content

Similar to contingency tables.ppt

optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptxRichard Gill
 
Admission in India
Admission in IndiaAdmission in India
Admission in IndiaEdhole.com
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptxRichard Gill
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstonsandymartin
 
Module 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample CaseModule 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample CaseStats Statswork
 
C2 st lecture 12 the chi squared-test handout
C2 st lecture 12   the chi squared-test handoutC2 st lecture 12   the chi squared-test handout
C2 st lecture 12 the chi squared-test handoutfatima d
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsBurak Mızrak
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarAzmi Mohd Tamil
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15Shani729
 
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).pptaibakimito
 
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).pptRAJESHKUMAR428748
 
Lecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxLecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxssuser378d7c
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.pptSameeraasif2
 
Probability Distributions
Probability Distributions Probability Distributions
Probability Distributions Anthony J. Evans
 

Similar to contingency tables.ppt (20)

optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptx
 
PCA.ppt
PCA.pptPCA.ppt
PCA.ppt
 
Admission in India
Admission in IndiaAdmission in India
Admission in India
 
Random Error Theory
Random Error TheoryRandom Error Theory
Random Error Theory
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptx
 
Statisticsforbiologists colstons
Statisticsforbiologists colstonsStatisticsforbiologists colstons
Statisticsforbiologists colstons
 
Module 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample CaseModule 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample Case
 
week3a.ppt
week3a.pptweek3a.ppt
week3a.ppt
 
Stats chapter 3
Stats chapter 3Stats chapter 3
Stats chapter 3
 
C2 st lecture 12 the chi squared-test handout
C2 st lecture 12   the chi squared-test handoutC2 st lecture 12   the chi squared-test handout
C2 st lecture 12 the chi squared-test handout
 
Chisquare
ChisquareChisquare
Chisquare
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemar
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
 
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
 
Lecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptxLecture 3 Dispersion(1).pptx
Lecture 3 Dispersion(1).pptx
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
 
Probability Distributions
Probability Distributions Probability Distributions
Probability Distributions
 
Chisquared test.pptx
Chisquared test.pptxChisquared test.pptx
Chisquared test.pptx
 

Recently uploaded

HAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future ProspectsHAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future ProspectsRajesh Gupta
 
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...prakheeshc
 
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdfProgress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdfHolger Mueller
 
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© رnafizanafzal
 
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...CIO Look Magazine
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledCaitlinCummins3
 
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...drm1699
 
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...BabaJohn3
 
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptxGoal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptxNetapsFoundationAdmi
 
Should Law Firms Outsource their Bookkeeping
Should Law Firms Outsource their BookkeepingShould Law Firms Outsource their Bookkeeping
Should Law Firms Outsource their BookkeepingYourLegal Accounting
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...srcw2322l101
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.daisycvs
 
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deckPitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deckHajeJanKamps
 
Powerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metricsPowerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metricsCaitlinCummins3
 
What are the differences between an international company, a global company, ...
What are the differences between an international company, a global company, ...What are the differences between an international company, a global company, ...
What are the differences between an international company, a global company, ...AbhishekSharma823325
 
The Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and UncertaintyThe Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and Uncertaintycapivisgroup
 
hyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statementshyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statementsirhcs
 

Recently uploaded (20)

HAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future ProspectsHAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future Prospects
 
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
A BUSINESS PROPOSAL FOR SLAUGHTER HOUSE WASTE MANAGEMENT IN MYSORE MUNICIPAL ...
 
Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di DepokObat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
 
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdfProgress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
 
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![©  ر
00971508021841 حبوب الإجهاض في دبي | أبوظبي | الشارقة | السطوة |❇ ❈ ((![© ر
 
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
Most Visionary Leaders in Cloud Revolution, Shaping Tech’s Next Era - 2024 (2...
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
 
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
#Mtp-Kit Prices » Qatar. Doha (+27737758557) Abortion Pills For Sale In Doha,...
 
Contact +971581248768 for 100% original and safe abortion pills available for...
Contact +971581248768 for 100% original and safe abortion pills available for...Contact +971581248768 for 100% original and safe abortion pills available for...
Contact +971581248768 for 100% original and safe abortion pills available for...
 
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
 
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptxGoal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
Goal Presentation_NEW EMPLOYEE_NETAPS FOUNDATION.pptx
 
Should Law Firms Outsource their Bookkeeping
Should Law Firms Outsource their BookkeepingShould Law Firms Outsource their Bookkeeping
Should Law Firms Outsource their Bookkeeping
 
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di SurabayaObat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
 
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deckPitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
 
Powerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metricsPowerpoint showing results from tik tok metrics
Powerpoint showing results from tik tok metrics
 
What are the differences between an international company, a global company, ...
What are the differences between an international company, a global company, ...What are the differences between an international company, a global company, ...
What are the differences between an international company, a global company, ...
 
The Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and UncertaintyThe Art of Decision-Making: Navigating Complexity and Uncertainty
The Art of Decision-Making: Navigating Complexity and Uncertainty
 
hyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statementshyundai capital 2023 consolidated financial statements
hyundai capital 2023 consolidated financial statements
 

contingency tables.ppt

  • 2. • contingency tables show frequencies produced by cross-classifying observations • e.g., pottery described simultaneously according to vessel form & surface decoration polished burnished matte bowl 47 28 3 jar 30 42 8 olla 6 45 25
  • 3. • most statistical tests for tables are designed for analyzing 2-dimensions – only examine the interaction of two variables at one time… • most efficient when used with nominal data – using ratio data means recoding data to a lower scale of measurement (ordinal) – means ignoring some of the information originally available…
  • 4. • still, you might do this, particularly if you are interested in association between metric and non-metric variables • e.g.: variation in pot size vs. surface decoration… • may decide to divide pot size into ordinal classes…
  • 6. small large specular 4 13 non-specular 15 18 rim diameter: slip: • other options may let you retain more of the original information content
  • 7. non-specular slip specular slip • could use a “t-test” to test the equality of the means • makes full use of ratio data…
  • 8. • why do we work with contingency tables?? polished burnished matte bowl 47 28 3 jar 30 42 8 olla 6 45 25
  • 9. • because we think there may be some kind of interaction between the variables… • basic question: can the state of one variable be predicted from the state of another variable? • if not, they are independent polished burnished matte bowl 47 28 3 jar 30 42 8 olla 6 45 25
  • 10. expected counts • a baseline to which observed counts can be compared • counts that would occur by random chance if the variables are independent, over the long run • for any cell E = (col total * row total)/table total
  • 11. M F PP 4 1 5 45%  2.3 2.7 5 Pot 1 5 6 55% 2.7 3.3 6 Total 5 6 11 5 6 11 45% 55%
  • 12. significance • = probability of getting, by chance, a table as or more deviant than the observed table, if the variables are independent – ‘deviant’ defined in terms of expected table • no causality is necessarily implied by the outcome – but, causality may well be the reason for observed association… – e.g.: grave goods and sex
  • 13. Fisher’s Exact Test • just for 2 x 2 tables • useful where chi-square test is inappropriate • gives the exact probability of all tables with • the same marginal totals • as or more deviant than the observed table…
  • 14. P = (a+b)!(a+c)!(b+d)!(c+d)! / (N!a!b!c!d!) P = 5!5!6!6! / 11!4!1!1!5! = 5*6!6! / 11! P = 5*6!6! / 11! = 5*6! / 11*10*9*8*7 P = 5*6! / 11*10*9*8*7 = 3600 / 55440 P = .065 a b c d 4 1 1 5
  • 15. P = .065 use R (or Excel) if the counts aren’t too large… > fisher.test(x) a b c d 4 1 1 5
  • 16. 0 5 5 5 1 6 5 6 11 1 4 5 4 2 6 5 6 11 2 3 5 3 3 6 5 6 11 3 2 5 2 4 6 5 6 11 4 1 5 1 5 6 5 6 11 5 0 5 0 6 6 5 6 11
  • 17. 0 5 5 1 1 4 4 2 2 3 3 3 3 2 2 4 4 1 1 5 5 0 0 6 2.3 2.7 2.7 3.3 0.013 0.162 0.433 0.325 0.065 0.002 • P = 0.065+0.002 = 0.067 or • P = 0.067+0.013 = 0.080 (observed) (expected)
  • 18. • 2-tailed test = 0.067+0.013 = 0.080 • 1-tailed test = 0.065+0.002 = 0.067 M F PP 4 1 5 Pot 1 5 6 5 6 11 > fisher.test(x, alt = "two.sided") > fisher.test(x, alt = “greater”) [i.e.: H1: odds ratio > 1] in R:
  • 19. Chi-square Statistic • an aggregate measure (i.e., based on the entire table) • the greater the deviation from expected values, the larger (exponentially!) the chi- square statistic… • one could devise others that would place less emphasis on large deviations  |o-e|/e       k i i i i E E O 1 2 2 
  • 20. • X2 is distributed approximately in accord with the X2 probability distribution • X2 probabilities are traditionally found in a table showing threshold values from a CPD – need degrees of freedom – df = (r-1)*(c-1) • just use R…
  • 21. Status: low intermed. high Ritual arch.: altar 7 20 16 43 no altar 18 22 8 48 25 42 24 91 low intermed. high altar 11.8 19.8 11.3 43 no altar 13.2 22.2 12.7 48 25 42 24 91 low intermed. high altar 2.0 0.0 1.9 3.9 no altar 1.8 0.0 1.7 3.5 3.7 0.0 3.6 7.3 (43*24) 91 (7-11.8)2 11.8 =2   .025
  • 22.
  • 23. X2 assumptions & problems • must be based on counts: – not percentages, ratios or weighted data • fails to give reliable results if expected counts are too low: 2 3 3 3 2.27 2.72 2.72 3.27 obs. exp. X2=0.74 P(Fishers)=1.0 5 6 6 5
  • 24. rules of thumb 1. no expected counts less than 5 – almost certainly too stringent 2. no exp. counts less than 2, and 80% of counts > 5 – more relaxed (but more realistic)
  • 25. collapsing tables • can often combine columns/rows to increase expected counts that are too low – may increase or reduce interpretability – may create or destroy structure in the table • no clear guidelines – avoid simply trying to identify the combination of cells that produces a “significant” result
  • 26. 8 3 6 2 19 6 1 6 5 18 6 4 5 4 19 3 12 8 3 26 23 20 25 14 82 5.3 4.6 5.8 3.2 19 5.0 4.4 5.5 3.1 18 5.3 4.6 5.8 3.2 19 7.3 6.3 7.9 4.4 26 23 20 25 14 82 11 8 19 7 11 18 10 9 19 15 11 26 43 39 82 10.0 9.0 19 9.4 8.6 18 10.0 9.0 19 13.6 12.4 26 43 39 82 obs. counts exp. counts obs. counts exp. counts
  • 27. • chi-square is basically a measure of significance • it is not a good measure of strength of association • can help you decide if a relationship exists, but not how strong it is
  • 28. 17 13 13 17 60 34 26 26 34 120 X2=1.07 alpha=.30 X2=2.13 alpha=.14
  • 29. • also, chi-square is a ‘global statistic’ • says nothing (directly) about which parts of a table may be ‘driving’ a large chi-square statistic • ‘chi-square contributions’ from individual cells can help: low intermed. high altar 2.0 0.0 1.9 3.9 no altar 1.8 0.0 1.7 3.5 3.7 0.0 3.6 7.3
  • 30. Monte Carlo test of X2 significance • based on simulated generation of cell-counts under imposed conditions of independence • randomly assign counts to cells: 23 14 8 45 15 6 13 34 38 20 21 79
  • 31. • significance is simply the proportion of outcomes that produced a X2 statistic >= observed • not based on any assumptions about the distribution of the X2 statistic • overcomes the problems associated with small expected frequencies
  • 32. G Test • a measure of significance for any r x c table • look up critical values of G2 in an ordinary chi-square table; figure out degrees of freedom the same way • conforms to chi-square distribution better than the chi-square statistic                  k i i i e i E O O G 1 2 log * 2
  • 33. an R function for G2 gsq.test  function(obs) { df  (nrow(obs)-1) * (ncol(obs)-1) exp  chisq.test(obs)$expected G  2*sum(obs*log(obs/exp)) 2*dchisq(G, df) }
  • 35. Phi-Square (2) • an attempt to remove the effects of sample size that makes chi-square inappropriate for measuring association • divide chi-square by n • 2=X2/n • limits: 0: variables are independent 1: perfect association in a 2x2 table;  no upper limit in larger tables
  • 36. 17 13 13 17 60 34 26 26 34 120 2=0.18 2=0.18
  • 37. Cramer’s V • also a measure of strength of association • an attempt to standardize phi-square (i.e., control the lack of an upper boundary in tables larger than 2x2 cells) • V= 2/m where m=min(r-1,c-1) ; i.e., the smaller of rows-1 or columns-1) • limits: 0-1 for any size table; 1=highest possible association
  • 38. Yule’s Q • for 2x2 tables only • Q = (ad-bc)/(ad+bc) a b c d
  • 39. Yule’s Q • often used to assess the strength of presence / absence association • range is –1 (perfect negative association) to 1 (perfect positive association); values near 0 indicate a lack of association Bone needles + - Male burial + 12 14 - 16 3 Q = -.72
  • 40. Yule’s Q • not sensitive to marginal changes (unlike Phi2) • multiply a row or column by a constant; cancels out… jars ollas Source A 19 10 Source B 6 15 jars ollas Source A 19 20 Source B 6 30 (Q=.65 for both tables)
  • 41. Yule’s Q • can’t distinguish between different degrees of ‘complete’ association • can’t distinguish between ‘complete’ and ‘absolute’ association M F RHS 60 20 LHS 0 20 100 M F RHS 60 10 LHS 0 30 100 M F RHS 60 0 LHS 0 40 100
  • 42. “odds” ratio • easiest with 2 x 2 tables • what are the ‘odds’ of a man being buried on his right side, compared to those of a woman?? • if there is a strong level of association between sex and burial position, the odds should be quite different…
  • 44. 29/11=2.64 14/33=0.42 2.64/0.42=6.21 if there is no association, the odds ratio=1 departures from 1 range between 0 and infinity >1 =‘positive association’ <1 =‘negative association’ M F RHS 29 14 43 LHS 11 33 44 40 47 87
  • 45. Goodman and Kruskal’s Tau () • “proportional reduction of error” • how are the probabilities of correctly assigning cases to one set of categories improved by the knowledge of another set of categories??
  • 46. Goodman and Kruskal’s Tau () • limits are 0-1; 1=perfect association • same results as Phi2 w/ 2x2 table • sensitive to margin differences • asymmetric – get different results predicting row assignments based on columns than from column assignments based on rows
  • 47. • =[P(error|rule 1)-P(error|rule 2)] / P(error|rule 1) • rule 1: random assignments to one variable are made with no knowledge of 2nd variable • rule 2: random assignments to one variable are made with knowledge of 2nd variable B1 B2 A1 A2 6 14 20 B1 B2 A1 6 0 A2 0 14 6 14 20
  • 48. Table Standardization • even very large and highly significant X2 (or G2) statistics don’t necessarily mean that all parts of the table are equally “deviant” (and therefore interesting) • usually need to do other things to highlight loci of association or ‘interaction’ • which cells diverge the most from expected values? • very difficult to decide when both row and column totals are unequal…
  • 49. Percent standardization • highly intuitive approach, easy to interpret • often used to control the effects of sample- size variation • have to decide if it makes better sense to standardize based on rows, or on columns
  • 50. • usually, you want to standardize whatever it is you want to compare – i.e., if you want to compare columns, base percents on column totals • you may decide to make two tables, one standardized on rows, the other on columns…
  • 51. Site Fauna A B C bear 2 1 0 3 moose 15 5 10 30 coyote 2 0 0 2 rabbit 16 8 12 36 dog 2 3 0 5 deer 16 8 7 31 53 25 29 107 Site Fauna A B C bear 3.8 4.0 0.0 moose 28.3 20.0 34.5 coyote 3.8 0.0 0.0 rabbit 30.2 32.0 41.4 dog 3.8 12.0 0.0 deer 30.2 32.0 24.1 100 100 100 MNIs Site Fauna A B C bear 66.7 33.3 0.0 100 moose 50.0 16.7 33.3 100 coyote 100.0 0.0 0.0 100 rabbit 44.4 22.2 33.3 100 dog 40.0 60.0 0.0 100 deer 51.6 25.8 22.6 100
  • 52. Binomial Probabilities • P(n,k,p): “probability of k successes in n trials, with p probability of success in any one trial” 5 3 1 4 13 3.7 4.3 2.3 2.7 13 n = 13 k = 5 p = 3.7/13
  • 53. Binomial Probabilities • in R: > pbinom(k, n, p) • easy to build into a function…
  • 56. 10 20 30 40 50 60 70 80 90 100 cumulative percent • some useful statistical measures (ordinal or ratio scale) • can be misleading when used with nominal data • good for comparing data sets Cumulative Percent Graph
  • 57. Percentages Sites A B C Types 1 5 5 5 2 45 0 30 3 5 48 5 4 5 5 5 5 5 5 5 6 5 5 5 7 20 5 35 8 5 22 5 9 5 5 5 100 100 100 Cumulative Percents Sites A B C Types 1 5 5 5 2 50 5 35 3 55 53 40 4 60 58 45 5 65 63 50 6 70 68 55 7 90 73 90 8 95 95 95 9 100 100 100 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 A B C
  • 58. 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 A B C 0 20 40 60 80 100 120 1 5 3 4 2 6 7 8 9 A B C
  • 59. K-S test • find Dmax: – maximum difference between 2 cumulative proportion distributions – compare to critical value for chosen sig. level • C*((n1+n2)/(n1n2))^.5 – alpha =.05, C=1.36 – alpha =.01, C=1.63 – alpha =.001, C=1.95
  • 60. example 2 • mortuary data (Shennan, p. 56+) • burials characterized according to 2 wealth (poor vs. wealthy) and 6 age categories (infant to old age) Rich Poor Infans I 6 23 Infans II 8 21 Juvenilis 11 25 Adultus 29 36 Maturus 19 27 Senilis 3 4 Total 76 136
  • 61. • burials for younger age-classes appear to be more numerous among the poor • can this be explained away as an example of random chance? or • do poor burials constitute a different population, with respect to age-classes, than rich burials?
  • 62. • we can get a visual sense of the problem using a cumulative frequency plot: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Infans I Infans II Juvenilis Adultus Maturus Senilis rich poor
  • 63. • K-S test (Kolmogorov-Smirnov test) assesses the significance of the maximum divergence between two cumulative frequency curves H0:dist1=dist2 • an equation based on the theoretical distribution of differences between cumulative frequency curves provides a critical value for a specific alpha level • observed differences beyond this value can be regarded as significant at that alpha level
  • 64. • if alpha = .05, the critical value = 1.36*(n1+n2)/n1n2 1.36*(76+136)/76*136 = 0.195 • the observed value = 0.178 • 0.178 < 0.195; don’t reject H0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Infans I Infans II Juvenilis Adultus Maturus Senilis rich poor Dmax=.178
  • 65. age <= 30 age > 30 strongly disagree 8 7 mildly disagree 5 9 disagree 6 6 no opinion 0 1 agree 2 2 mildly agree 1 3 strongly agree 2 3 statement/question: “Oil exploration should be allowed in coastal California…” example 2
  • 66. example 3 • survey data  100 sites • broken down by location and time: early late Total piedmont 31 19 50 plain 19 31 50 Total 50 50 100
  • 67. • we can do a chi-square test of independence of the two variables time and location • H0:time & location are independent • alpha = .05 time location H0 location time H1
  • 68. • 2 values reflect accumulated differences between observed and expected cell-counts • expected cell counts are based on the assumptions inherent in the null hypothesis • if the H0 is correct, cell values should reflect an “even” distribution of marginal totals early late Total piedmont 50 plain 50 Total 50 50 100 25
  • 69. • chi-square = ((o-e)2/e) • observed chi-square = 4.84 • we need to compare it to the “critical value” in a chi-square table:
  • 70.
  • 71. • chi-square = ((o-e)2/e) • observed chi-square = 4.84 • chi-square table:  critical value (alpha = .05, 1 df) is 3.84  observed chi-square (4.84) > 3.84 • we can reject H0 • H1: time & location are not independent
  • 72. • what does this mean? early late Total piedmont 31 19 50 plain 19 31 50 Total 50 50 100