1. SPSS:
SPSS
Commands
and
Interpreting
Statistics
Frequency
Distributions
We
use
frequency
distributions
to
determine
the
frequency
or
number
of
people
that
fall
into
a
certain
category.
For
example,
if
we
classified
those
running
for
Senator
or
governor
as
Democratic
and
Republican,
a
frequency
distribution
would
allow
us
to
determine
the
percent
that
were
Democrat
and
Republican.
In
our
data
file,
the
variable
we
used
to
list
candidates
as
Republican
or
Democrat
was
“party.”
1.
Go
to
Analyze—Descriptive
Statistics—Frequency
2.
Double
click
on
party
and
then
click
OK.
Interpreting
Frequency
Distributions
1. As you can see, the two parties are listed below: Democrat and Republican. The
“Missing” category simply reflect the candidates whose party affiliation we could not
determine.
2. Under the “Frequency” column, we have the number of candidates that were Democrat
(186), Republican (280), or unclassified or missing (5).
3. Finally, we typically use the “Valid Percent” column in deterring the frequency
distribution of Democrats and Republicans because it does not take into consideration
those cases where we could not assign a category. In this case, 39.9 percent were
Democrat and 61.1 percent were Republican. Clearly, there is a greater number of
Republican than Democratic candidates.
2. Political Party
Frequency Percent Valid Percent
Cumulative
Percent
Democrat 186 39.5 39.9 39.9
Republican 280 59.4 60.1 100.0
Valid
Total 466 98.9 100.0
Missing 9.00 5 1.1
Total 471 100.0
Chi
Square
Test
Often
we
have
two
nominal
level
variables
(gender,
party
affiliation,
or
ethnicity
for
example)
and
we
need
to
determine
if
a
relationship
exists
between
them.
For
example,
we
may
want
to
know
if
ethnicity
is
related
to
party
affiliation.
We
suspect
it
is
the
case
and
we
hypothesize
because
that
minorities
are
associated
with
the
Democratic
Party
and
whites
with
the
Republican
Party.
Using
a
crosstab
table
and
Chi
Square
test,
we
can
determine
if
there
is
a
relationship
between
two
variable
that
IS
NOT
DUE
TO
CHANCE.
1.
To
do
this,
go
to
Analyze—Descriptive
Statistics—Crosstabs.
We
put
the
“Political
Party”
in
the
Row
because
the
dependent
variable
ALWAYS
goes
in
the
Row
box.
We
put
“Ethnicity”
in
the
Column
because
the
independent
(explanatory
variable)
ALWAYS
goes
in
the
“Column”
box.
3. 2.
Next
we
click
the
“Statistics”
button
and
click
“Chi
Square”,
“Phi
and
Cramer’s
V”,
and
“Lambda.”
Click
the
“Continue”
button.
4. 3.
Next,
click
the
“Cell”
button.
Under
“Counts”,
check
“Observed”
and
under
“Percentages”
click
“Row”,
“Column”,
and
“Total”.
Then
click
the
“Continue”
Button.
4. Click the “OK” Button to run your crosstab.
5. Interpreting Your Crosstab
1. Reading a crosstabulation can be confusing. Over the years, I have found the following
to be helpful in reading them. First, we always begin with the dependent variable that is
listed in the column. In this case it is ethnicity, and since we are looking at ethnicity, we
will read the cell associated with “% within Ethnicity (2)”. Here is how we read this
table. If we are interested in what party Non-whites support, we say:
“Of those who are non-white, 72.1% are Democrats.” And “of those people who are
non-white, 27.9% are Republicans.”
If we are interested in the white respondents, we say:
“Of those who are white, 35.5% are Democrat AND 64.5% are Republican.”
If you use this phrase and fill-in the blanks, you can interpret this table properly every
time!
“Of those who are _____, ____% are _______ AND _____% are _______.
Political Party * Ethnicity (2) Crosstabulation
Ethnicity (2)
White
Non-
White Totl
Count 146 31 177
% within Political
Party
82.5% 17.5% 100.0%
% within Ethnicity (2) 35.5% 72.1% 39.0%
Democrat
% of Total 32.2% 6.8% 39.0%
Count 265 12 277
% within Political
Party
95.7% 4.3% 100.0%
% within Ethnicity (2) 64.5% 27.9% 61.0%
Political
Party
Republican
% of Total 58.4% 2.6% 61.0%
Count 411 43 454
% within Political
90.5% 9.5% 100.0%
Party
% within Ethnicity (2) 100.0% 100.0% 100.0%
Total
% of Total 90.5% 9.5% 100.0%
6. 2.
We
thought,
hypothesized,
that
ethnicity
was
related
to
party
affiliation:
Non-‐
whites
were
more
likely
to
be
Democrat
and
Whites
more
likely
to
be
Republican.
As
you
can
see
from
the
table
above,
this
is
true.
72%
of
non-‐whites
called
themselves
Democrats
and
65%
of
whites
called
themselves
Republicans.
So
our
statistics
bear
out
our
hypothesis.
3.
However,
is
there
a
possibility
that
the
relationship
between
ethnicity
and
party
affiliation
is
due
to
chance—that
is
to
say,
there
really
is
no
statistically
significant
reason
to
believe
these
variables
are
related
to
one
another.
To
answer
this
question,
we
use
the
Pearson
Chi-‐Square
test.
Look
at
the
table
below.
In
the
Pearson
Chi-‐Square
row,
there
are
numbers
under
three
“Sig.”
columns.
Disregard
the
column
for
the
time
being.
If
the
number
is
between
.000
and
.050,
we
can
say
that
the
relationship
between
the
independent
variable
(ethnicity
in
this
case)
is
significantly
related
to
the
dependent
variable
(party
affiliation).
This
is
another
way
of
saying
that
the
relationship
is
not
due
to
chance
and
really
exists!
As
you
can
see
below,
the
Chi-‐Square
coefficient
(number)
is
.000
under
the
“Asymp.
Sign
(2-‐sided)”
column.
Therefore,
ethnicity
is
definitely
related
to
party
affiliation
If
the
number
is
.051
or
above,
the
significance
is
due
to
“chance”
and
we
say
that
we
are
not
confident
that
the
ethnicity
and
party
affiliation
are
related.
Our
hypothesis
that
ethnicity
is
related
to
party
affiliation
is
rejected.
Chi-Square Tests
Value df
Asymp. Sig. (2-
sided)
Exact Sig. (2-
sided)
Exact Sig. (1-
sided)
Pearson Chi-Square 21.886a 1 .000
Continuity Correctionb 20.375 1 .000
Likelihood Ratio 21.438 1 .000
Fisher's Exact Test .000 .000
Linear-by-Linear
21.838 1 .000
Association
N of Valid Cases 454
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is
16.76.
b. Computed only for a 2x2 table
7. 4.
How
strong
is
the
relationship
between
the
independent
variable
(ethnicity)
and
the
dependent
variable
(party
affiliation).
The
are
two
measures
of
association
and
for
our
purposes
use
Cramer’s
V
unless
SPSS
spits
out
only
a
Phi
statistic.
Under
the
“Value”
column,
a
number
is
listed.
The
higher
the
number,
the
greater
the
strength
of
association.
Let’s
use
the
following
scale:
0-‐.30=no
relationship
(0)
to
weak
relationship
.31-‐.70=moderate
relationship
.71-‐1.0=strong
relationship
A
strong
relationship
means
that
knowing
the
ethnicity
of
a
person
will
give
us
very
good
reason
to
guess
the
political
party
with
which
they
are
affiliated.
A
weak
relationship,
means
that
knowing
the
ethnicity
of
a
person
gives
does
not
give
us
much
confidence
is
guessing
the
person’s
political
party
affiliation.
In
this
case,
the
association
is
weak
(.220).
If
I
guess
the
person’s
political
affiliation
based
on
a
person’s
apparent
race,
I
would
likely
be
wrong!
Symmetric Measures
Value Approx. Sig.
Nominal by Phi -.220 .000
Nominal Cramer's
V
.220 .000
N of Valid Cases 454
8. Pearson
Correlation
A correlation is a powerful way to determine the association between two interval level
variables. An interval level variable is one whose values are an equal distance apart. For
example, income (dollars), ages (years), experience in politics measured in years (years),
and percent of the vote (percentages). Male and female are not interval level variables,
because they are not expressed in values equal distance apart. They are categorical
variables.
For example, we may be interested in determining if political experience as measured by
the number of years a person has served in office is related to campaign funds raised. We
suspect that the longer the incumbent is in office, the more campaign funds s/he will
raise. After all, an incumbent has political power and is likely to be reelected: we would
want to contribute to the incumbent.
1. To do a correlation analysis, go to Analyze—Correlation—Bivariate
2. Find and double click the variables “Political Experience” and “Money Raised”. This
will put these two variables in the variable window.
3. Click the “OK” button to run your correlation.
9. Interpreting Your Pearson Correlation
1. A correlation coefficient (number) represents the strength of an association between to
variables. The
higher
the
number,
the
greater
the
strength
of
association.
Let’s
use
the
following
scale:
0-‐.30=no
relationship
(0)
to
weak
relationship
.31-‐.70=moderate
relationship
.71-‐1.0=strong
relationship
2.
In
this
case
the
correlation
between
“Political
Experience”
and
“Money
Raised”
is
.331**
This
would
be
moderate
relationship.
3.
The
“Sig.
(2-‐tailed)”
is
important.
It
tells
us
if
the
relationship
is
due
to
chance.
If
the
correlation
coefficient
(number)
is
between
.000
and
.050,
we
can
say
that
the
political
experience
and
money
raised
are
significantly
related
and
we
can
say
that
an
increase
in
political
experience
will
lead
to
an
increase
in
campaign
contributions.
If
the
coefficient
is
.051
or
more,
we
say
that
we
cannot
be
confident
that
political
experience
and
money
raised
are
related
or
associated.
In
this
case,
we
can
say
that
there
is
a
“moderate,
significant
relationship
between
political
experience
and
money
raised.
Correlations
Political Experience
(Years)
Money
Raised
Pearson
Correlation
1 .331**
Sig. (2-tailed) .000
Political Experience
(Years)
N 462 414
Pearson
.331** 1
Correlation
Sig. (2-tailed) .000
Money Raised
N 414 421
**. Correlation is significant at the 0.01 level (2-tailed).
10. Multiple
Regression
A
very
powerful
way
to
analyze
data
is
by
using
a
“multiple
regression.”
For
our
purposes,
a
multiple
regression
allow
us
to
look
at
several
factors
that
affect
a
dependent
variable
and
determine
what
factors
exert
a
greater
influence
on
the
dependent
variable.
For
example,
we
may
suspect
that
the
size
of
a
person’s
vote
is
determined
by
the
quality
of
the
candidate
AND
the
amount
of
money
raised.
After
all,
better
Senate
candidates
will
win
a
greater
percentage
of
the
vote
than
poorer
Senate
candidates
and
candidates
with
more
money
will
be
able
to
spend
more
to
get
elected.
With
more
money
to
spend,
they
should
get
a
greater
percent
of
the
vote.
But,
which
factor
is
more
important:
candidate
quality
or
money
raised.
To
answer
this
question,
we
do
a
multiple
regression.
1.
Go
to
Analyze—Regression—Linear
2.
Since
the
dependent
variable
is
the
percentage
of
the
vote
a
candidate
received,
we
put
“Vote:
Primary
or
Convention”
in
the
“Dependent”
variable
box.
The
two
independent
variables
we
expect
to
influence
the
dependent
variable
goe
in
the
“Independent(s)”
variable
box.
It
should
look
like
this:
3.
Click
the
“OK”
button.
Interpreting
Your
Multiple
Regression
1.
Your
output
produces
a
number
of
tables.
Let’s
look
at
the
most
important
tables.
11. 1.
The
first
table,
“Variables
Entered/Removed”,
tells
you
what
variables
were
used
in
the
analysis.
As
you
can
see,
“Money
Raised”
and
“Political
Experience”
were
used.
Under
the
table,
you
can
see
that
the
dependent
variable
was
“Vote:
Primary
or
Convention.”
Variables Entered/Removedb,c
Model Variables Entered
Variables
Removed Method
1 Money Raised, Political Experience
(Years)
. Enter
a. All requested variables entered.
b. Dependent Variable: Vote: Primary or Convention
c. Models are based only on cases for which Office = Senate
2.
There
are
two
“coefficients”
or
numbers
that
are
important:
the
“R”
and
“R
Square.”
The
“R”
is
the
combined
effect
of
all
the
independent
variables
on
the
dependent
variable.
In
this
case
there
is
a
moderate,
positive
association
between
money
raised
and
candidate
quality
(.662).
The
“R
Square”
simply
means
that
these
two
variables
explain
43.8
percent
of
the
variance
in
the
dependent
variable:
the
vote.
This
is
a
technical
way
of
saying
that
there
are
other
factors
(variables)
that
explain
the
remaining
56.2
percent
of
the
variance.
What
might
they
be?
How
about
incumbency
or
candidate
quality?
Model Summary
R
Model
Office =
Senate
(Selected) R Square
Adjusted R
Square
Std. Error of
the Estimate
1 .662a .438 .432 20.33142
a. Predictors: (Constant), Money Raised, Political Experience (Years)
12. 3.
In
the
ANOVA
table,
look
only
at
the
“Sig.”
column.
If
the
number
is
between
.000-‐
.05
inclusive,
then
we
can
say
that
the
relationship
between
the
independent
variables
(money
raised
and
candidate
quality
in
this
case)
and
the
dependent
variable
(share
of
the
vote)
is
not
due
to
chance—which
is
the
case
here.
This
means
that
we
are
confident
that
money
raised
and
candidate
quality
influence
the
vote.
If
it
is
greater
than
.05
(for
example
.051
or
.60
or
.154),
then
the
relationship
MIGHT
BE
DUE
TO
CHANCE
and
we
should
say
we
are
not
confident
that
money
raised
and
candidate
quality
are
linked
to
the
percentage
of
the
vote.
ANOVAb,c
Model Sum of Squares df Mean Square F Sig.
1
Regression 62539.414 2 31269.707 75.646 .000a
Residual 80193.138 194 413.367
Total 142732.552 196
a. Predictors: (Constant), Money Raised, Political Experience (Years)
b. Dependent Variable: Vote: Primary or Convention
c. Selecting only cases for which Office = Senate
4.
A
very
important
table
is
the
“Coefficients”
table.
This
table
tell
us,
among
other
things,
how
much
influence
each
independent
variable
exerts
on
the
depend
variable.
Note
the
following
columns.
a.
Under
“Model”
are
listed
the
two
independent
variables—Political
Experience”
and
“Money
Raised.”
b.
Really
important
are
the
coefficients
(numbers)
under
the
column
“Standardized
Coefficients,
Beta”.
The
higher
the
number
the
more
influence
this
variable
influences
the
dependent
variable,
the
percentage
of
the
vote.
In
this
case,
you
can
see
that
“Political
Experience”
(.398)
is
more
important
than
“Money
Raised”
(.370)—but
not
much
more.
Thus,
we
can
say
that
political
experience
is
more
important
than
money
in
explaining
voting
for
Senate
candidates—but
not
by
much!
In
some
cases
the
Beta
coefficient
will
have
a
negative
sign
in
front
of
it.
Disregard
this
sign
in
interpreting
which
variable
exerts
the
most
influence
over
the
dependent
variable.
The
larger
the
number,
regardless
of
the
sign,
exerts
more
influence.
c.
The
“Sig.”
column
simply
states
whether
the
independent
variables
(political
experience
and
money
raised)
are
significantly
related
to
the
dependent
variable
(percent
of
the
vote).
If
the
number
is
between
.000
and
.050,
we
can
say
that
the
relationship
is
NOT
due
to
chance:
that
there
is
a
significant
relationship
between
this
variable
and
the
dependent
variable.
As
you
can
see,
the
relationship
is
13. significant
and
we
can
say
that
“political
experience
and
money
raised
are
significantly
related
to
the
vote.”
Coefficientsa,b
Unstandardized
Coefficients
Standardized
Coefficients
Model B Std. Error Beta t Sig.
1
(Constant) 16.224 1.698 9.556 .000
Political Experience
1.077 .166 .398 6.485 .000
(Years)
Money Raised 2.470E-6 .000 .370 6.024 .000
a. Dependent Variable: Vote: Primary or Convention
b. Selecting only cases for which Office = Senate