8 Statistical Significance
OK, measures of association are one important thing to examine in data. There is another important thing to consider. If you find associations, are they a result of associations that exist in the population or are those associations simply a result of sampling error? Tests of statistical significance estimate the chances that your associations are a result of the population, and not simply sampling error.
Chi-Square Tests
Chi-Square tests are appropriate for nominal and ordinal variables. When you calculate Chi-Square, you determine the probability that your answer is a result of your sample and not the population. So, a probability of .05 (p = .05) means that the association that you found in your analysis would occur only 5 out of 100 times if there actually was no association in the population. If you had a p = .001, this means that out of 1,000 samples, you would find the association simply as a result of your sampling error 1 time. Convention suggests that probabilities of .05, 01, and .001 support a differing levels of statistical significance of your conclusions.
Let’s go back to our variables SEX and HAPPY. You actually do the same thing that you did when you calculated lambda, except you also check chi-square in the statistics box. Here are the exact steps:
· Analyze > Descriptive Statistics > Crosstabs
· Dependent variable as the Row variable (Mneumonic suggestion: remember DR, dependent belongs on the row)
· Independent variable as the column variable
· Statistics: Lambda or Gamma AND Chi-Square
· Continue > OK
Most of your output will look be the same as when we calculated lambda. There is one new box:
Look at the Asymp. Sig (2 sided). This means that out of 1,000 chances, 883 times you would get the lambda of 0 totally by accident of your sample! Is that statistically significant? Well, not for scientific research. People play the lottery with far worse odds than this, but remember, for a result to be statistically significant in social science research, the probability must be .05 or less.
Let’s think about our query about the relationship between gender and happiness. We have discovered that there is no association between the two and there is no statistical significance. What does that mean? Well, I guess that is a good thing for men and women. It is not a good finding however if you were expecting to find an association between the variables!
T Tests
T tests are used to determine statistical significance of scale (ratio/interval) variables. If you want to examine the associations of nominal/scale variables, you may be quickly overwhelmed with data. If you do discover that you would like to discover the statistical significance of scale variables, I would suggest that you use an Independent-sample t test. You can use your output for the Pearson’s r and it will tell you your level of significance.
Here is the output for our analysis of the association between AGE and SIBS:
SPSS has cal.
8 Statistical SignificanceOK, measures of association are one .docx
1. 8 Statistical Significance
OK, measures of association are one important thing to examine
in data. There is another important thing to consider. If you find
associations, are they a result of associations that exist in the
population or are those associations simply a result of sampling
error? Tests of statistical significance estimate the chances that
your associations are a result of the population, and not simply
sampling error.
Chi-Square Tests
Chi-Square tests are appropriate for nominal and ordinal
variables. When you calculate Chi-Square, you determine the
probability that your answer is a result of your sample and not
the population. So, a probability of .05 (p = .05) means that the
association that you found in your analysis would occur only 5
out of 100 times if there actually was no association in the
population. If you had a p = .001, this means that out of 1,000
samples, you would find the association simply as a result of
your sampling error 1 time. Convention suggests that
probabilities of .05, 01, and .001 support a differing levels of
statistical significance of your conclusions.
Let’s go back to our variables SEX and HAPPY. You actually
do the same thing that you did when you calculated lambda,
except you also check chi-square in the statistics box. Here are
the exact steps:
· Analyze > Descriptive Statistics > Crosstabs
· Dependent variable as the Row variable (Mneumonic
suggestion: remember DR, dependent belongs on the row)
· Independent variable as the column variable
2. · Statistics: Lambda or Gamma AND Chi-Square
· Continue > OK
Most of your output will look be the same as when we
calculated lambda. There is one new box:
Look at the Asymp. Sig (2 sided). This means that out of 1,000
chances, 883 times you would get the lambda of 0 totally by
accident of your sample! Is that statistically significant? Well,
not for scientific research. People play the lottery with far
worse odds than this, but remember, for a result to be
statistically significant in social science research, the
probability must be .05 or less.
Let’s think about our query about the relationship between
gender and happiness. We have discovered that there is no
association between the two and there is no statistical
significance. What does that mean? Well, I guess that is a good
thing for men and women. It is not a good finding however if
you were expecting to find an association between the
variables!
T Tests
T tests are used to determine statistical significance of scale
(ratio/interval) variables. If you want to examine the
associations of nominal/scale variables, you may be quickly
overwhelmed with data. If you do discover that you would like
to discover the statistical significance of scale variables, I
would suggest that you use an Independent-sample t test. You
can use your output for the Pearson’s r and it will tell you your
level of significance.
Here is the output for our analysis of the association between
3. AGE and SIBS:
SPSS has calculated your t-test for you: Sig. 2-tailed = .019.
What does this tell you? It tells you that you would find your
association only 19 times out of 1000 due to sample error. Since
the significance is less than .05, your association is statistically
significant.
Step 7
We can use Pearson’s r to calculate the association between two
scale (interval or ratio) variables. The interpretation is the same
as gamma. Here are the directions for using SPSS to calculate
this statistic:
Pearson’s r
Analyze, Correlate, Bivariate
Highlight variable name, transfer to Variables field
Highlight 2nd variable name, transfer to Variables field
Pearson is the default choice, leave it that way.
Two-tailed is the default choice, leave it that way.
OK
Let’s try an example. Remember, we need to have 2 scale
variables. Let’s pick respondents age (AGE) and number of
brothers and sisters (SIBS). If you follow the directions above
your screen should look like this:
Here is what your output will look like:
4. What does Pearson’s r tell us? It tells us that there is a perfect
correlation (r = 1) when you compare age to age and sibs to sibs
(that’s pretty obvious, right?). Now, let’s look at that other
statistic: .061. What does that mean? There is a weak direct
association between the two variables; that is, as age increases,
the number of siblings tend to increase!
So, how can you calculate lambda and gamma, using SPSS?
Here are the instructions:
Lambda and Gamma:
Analyze, Descriptive Statistics, Crosstabs
Dependent variable as the Row variable (Mneumonic
suggestion: remember
DR, dependent belongs on the row)
Independent variable as the column variable
Statistics: Lambda or Gamma,
Continue, OK
Now, let’s try one of each of these. Let’s go back to our original
comparison of gender and happiness. If you go back after you
take the first 3 steps, your screen should look like this:
Now we want to calculate the measure of association statistic.
SEX is a nominal variable and HAPPY is an ordinal variable.
Since you always use the statistic that is associated with the
5. lowest level of measurement, we will use lambda. Click on to
statistics and then check lambda. Your statistics screen should
look like this:
Now click continue and then OK. Your screen should look like
this:
How do you interpret this output? You’ve already seen the
crosstab. Now let’s look at the directional measures. Go to the
Value of lambda, with General Happiness as dependent variable.
The value of lambda is .000. What does that mean? There is
absolutely no association between gender and happiness!
Go back and check out the association between number of
children and happiness. You follow the same procedures as
above but instead of selecting lambda, select gamma. Your
output should look like this:
Look at the value of gamma: -.061. What does this mean? It
means that there is a very weak association between the number
of children and happiness. What does the negative sign mean? It
means that there is an indirect association. Let’s think about
this. Happiness is measured Very happy = 1, pretty happy = 2,
and not too happy =3. So if the association is indirect, that
means as the value of one goes up, the other goes down. So, as
the number of children increase, happiness attributes decrease.
What does that mean? Well, it means that you are becoming
happier (because 3 is not too happy, 2 is pretty happy, and 1 is
very happy). Does this make sense? Therefore, more children
equal happier people!
5 Bivariate Analysis: Measures of Association
6. Measures of Association
You can look at the totals and determine something about the
relationship between the two variables. However, if your
variables have multiple categories, it becomes challenging to
simply understand associations between variables by simply
examining percentages via crosstabs. SPSS makes additional
analyses very simple. You should consider two factors:
· Measures of association - Is there an association? How
strong/weak is the association? What direction?
· Statistical significance - Does the association that occurs
between two variables in the sample actually occur in the
population or is it due to chance or sampling error?
There are two directions of information: direct and indirect. For
example, most people assume that there is a direct relationship
between the amount of time spent studying and GPA. This is a
positive relationship. If you your study time ↑, your GPA↑.
There is frequently an indirect relationship between party time
and GPA: as your party time ↑, your GPA ↓. This is a negative
association. If you are examining ordinal or scale variables, you
can determine the direction of association. If you are examining
nominal variables, you can only determine if there is an
association, not any direction of that association.
How do you Measure Association?
Lambda is a measure of association for nominal variables.
Lambda ranges from 0.00 to 1.00. A lambda of 0.00 reflects no
association between variables (perhaps you wondered if there is
a relationship between a respondent having a dog as a child and
his/her grade point average). A Lambda of 1.00 is a perfect
association (perhaps you questioned the relationship between
7. gender and pregnancy). Lambda does not give you a direction of
association: it simply suggests an association between two
variables and its strength.
Gamma is a measure of association for ordinal variables.
Gamma ranges from -1.00 to 1.00. Again, a Gamma of 0.00
reflects no association; a Gamma of 1.00 reflects a positive
perfect relationship between variables; a Gamma of -1.00
reflects a negative perfect relationship between those variables.
Pearson’s r is a measure of association for scale (interval/ratio)
variables. Like Gamma, Pearson’s r ranges from -1.00 to 1.00.
Which do I use if I am examining different levels of
measurement?
Always use the measure of association of the lowest level of
measurement. For example, if you are analyzing a nominal and
ordinal variable, use lambda. If you are examining an ordinal
and scale pair, use gamma.
What do these values mean?
Here are guidelines for interpreting the strength of association
for Lambda, Gamma and Pearson’s r:
Strength of Association
Value
(of Lamda, Gamma, Pearson's r)
None
0.00
Weak, uninteresting association
+ .01 - .09
Moderate, worth noting
+ .10 - .29
Evidence of strong association, extremely interesting
+ .30 - .99
Perfect association, strongest possible
8. + 1.00
BIVARIATE ANALYSIS:
Constructing and interpreting crosstabs
Univariate analysis provides you information about individual
variables. Bivariate analysis explores relationships between
two variables. Running crosstabulations or “crosstabs” is one
way to do this. A crosstab is a matrix that shows the
distribution of one variable for each category of a second
variable. Let’s construct a crosstab for 2 variables found in the
General Social Survey (GSS): gender (SEX) and general
happiness.
To run a crosstab on SPSS, go to: Analyze, Descriptive
Statistics, Crosstabs
Highlight variable name, click on arrow pointing toward Row
box
Highlight variable name, click on arrow pointing toward
Column box
Your screen should look like this:
Click OK. Your screen should look like this:
What does this cross tab mean? This is relatively easy to
interpret. Approximately 28% of the males are very happy and
about 29% of the females have identified themselves as very
happy. But what about a crosstab that has many attributes? For
example, what if you wanted to analyze the relationship
between the number of children that you have and your general
happiness? This is the crosstab for that analysis:
This is far more complicated to analyze with a crosstab. That is
why we will consider measures of association in the next
content guide.
PAGE
9. 1
Presenting Data Graphically
The following tells you how to construct various graphs and the
types of graphs that are appropriate for different levels of
measurement.
For our example, let’s construct a bar chart (bar charts may be
used for nominal and ordinal variables) for our variable
‘happy’.
Go to Graphs, Legacy Dialogues, select Bar and then select the
type (if you have only one variable, choose Simple) then
Define. This box gives you a number if things to
determine. Select your variable by highlighting it and clicking
on the Category Axis arrow. (this will put your variable on the
“x” axis of the graph. You will determine the format of the “y”
axis by selecting whether you want the chart to reflect the
number or percentages of cases. Let’s decide that we want our
graph to be based on numbers, not percentages (that is the
default). Your screen should look like this:
If you want to creat a title for your graph, click on the Title
button and then Continue.
Your output will look like this:
You can use the following directions to create a number of
different types of graphs:
Construction of Pie Charts: (nominal, ordinal)
10. Go to Graphs, Pie. You will select your variable and click the
“Define Slices by” arrow. Again, you will determine whether
you want your Pie slice to be measured by N or a percentage.
Once again, click on the Options button to make sure that
missing vales are removed. You can also use Titles for pie
charts as well.
Constructing Histograms: (interval, ratio)
Histograms have an x axis (variable categories) and a y axis
(frequency or percentage). However, unlike bar charts,
histograms have contiguous bars. Go to Graphs, Histogram and
enter the variable. You don’t have to worry about eliminating
missing data for histograms, SPSS does that automatically.
Constructing Line Charts: (interval, ratio)
Go to Graphs, Line, select your variable and determine if your y
axis will be frequency or percentage. You must check that
missing data is eliminated here in the Options button.
Editing your charts:
If you want to change colors or the fill for your chart, double
click on your chart in the output viewer. You will get a chart
editor box. You can either click on the Properties button or go
to Edit and Properties to change color, background, and other
aspects of your graph.
Here is some additional information about how you may
11. use SPSS for individual variables.
SPSS and Descriptive Statistics
Descriptive statisticsare statistics that describe a variable’s
central tendency (the ‘middle’ or expected value) and dispersion
(the distribution of the variable’s responses). Be aware
that SPSS will calculate statistics even if the measure of central
tendency and dispersion are not appropriate. What do I mean by
inappropriate descriptive statistics? Let's think about the
variable GENDER. Most often, you will find two alternatives
for this variable: male or female. (Whether or not this is
exhaustive is another discussion). Assume you have 25 people
in your dataset: 15 have identified as male, 10 identifed as
female. Does it make sense to use a mean for this? No, not at all
in our society. Nominal variables are simply an identity or a
yes/no variable, and an average makes no logical sense.
As a reminder, I’ve included the basic statistics that are
appropriate for different types of variables:
Discrete variables: Mode, Median
Continuous variables: Median, Mean Range,
Interquartile Range, Variance, Standard Deviation
Nominal: Mode
Ordinal: Median, Range, Interquartile
Range
12. Interval or Ratio: Mean, Range, Interquartile
Range, Variance, Standard Deviation
Let’s use the variable ‘happy’ again. Begin just like you did
when you were going to calculate frequency (this is in the
previous content guide). Once you have clicked selected
Analyze, then Descriptive Statistics and Frequencies, you will
have this screen again:
Before selecting ‘OK’, click on the “statistics” button. This is
the place that you can select the appropriate statistics. Since
happy is an ordinal variable, let’s select median and range.
Your screen should look like this:
Now click Continue and then OK. Your screen will look like
this:
What does this output mean? You already know how to read the
frequency chart. The only change is in the statistics chart. You
have a median of 2 and that there was a range of 2 around the
median.
HINT: Make sure that you always check your statistics when
you change your variable. SPSS will not make the change for
you!!
UNIVARIATE ANALYSIS
STEP 1
13. Univariate analysis is the examination of a single variable in a
dataset and its characteristics. For example, GSS has a variable
happy. Respondents were asked "How happy are you
generally?". The potential responses were:
1) Very happy
2) Pretty happy
3) Not very happy
SPSS will give you information about how the respondents
answered this question. It can tell you how many people
responded to each alternative (the frequency distribution) as
well as provide you with information about the overall response
to this question (the descriptive statistics). SPSS can also
construct a graphic about your variable.
Frequency Distributions
To run a frequency distribution, select Analyze, then
Descriptive Statistics and Frequencies. Highlight the variable in
the left window and click on the arrow; this will transfer your
variable from the variable list to the Variable window. If you
want to remove a variable from the Variable window, just
highlight it and click the arrow; this will move it back to the
variable list. You may run frequencies on more than one
variable at a time. Simply go back to the variable window,
highlight an additional variable and click the arrow.
Here is what your screen should look like once you have clicked
analyze, descriptive statistics and frequencies:
14. After clicking OK, your data will appear in the SPSS Viewer
and will look like this:
The statistics box tells you the number of people in your survey
(N) and the number of missing cases. The next box gives you
the frequency for your variable. The frequency is the number of
respondents who selected each value. The next two boxes are
the percentage and valid percentage: the difference between the
two is the valid percentage eliminates missing cases in the
calculation. The final column is the cumulative percentage,
which adds the individual percentages.
SPSS is a statistical software package was first released in 1968
and continues to be a premier software for analysis of
quantitative data. Originally used with large mainframe
computers and data punch cards, SPSS (Statistical Package for
the Social Sciences), SPSS revolutionized quantitative data
analysis. SPSS gives the researcher the tools for describing
variable statistics, analyzing two variables together, predicting
numerical outcomes, and predicting identifying groups. We will
learn how to use SPSS in this course to describe variable
statistics and bivariate analysis.
SPSS: Getting Started
Load SPSS onto your computer before beginning this segment
(see information about the download: Insert link here). Put your
GSS disk in your computer. Open the GSS program and click on
the first icon that says “gss08_1500cases.sav”. As long as you
open this on the computer, you should have a screen that looks
15. like this:
When you open SPSS, you open the Data Editor. The Data
Editor is composed of the Data View and the Variable
View. Note the tabs at the bottom of the screen. The Data View
is where your data is held; the Variable View allows you to
establish value labels and describes the attributes of each
variable in your data file. Let's examine the variable view first.
Click on that tab.
Variable View holds ten columns of information about each
variable.
Name: abbreviated name of the variable
Data Type: you will most likely use "numeric" for your
projects
Width: the number of digits or characters in the variable
view
Decimals: the number of decimal places that the
variable requires
Labels: the description or label for your variable
Values: the values you have assigned to the labels (for
example, 1=yes, 2=no, 8= K, 9=NA
Missing: values designated as missing
Columns: Width of Column in Data View
Align: alignment of data in Data View: right, left or
16. center
Measure: level of measurement: Nominal, Ordinal,
Scale
(SPSS designates both Interval and Ratio
measures as Scale)
Now, let's examine the data view. Click on the tab at the bottom
of the screen.
Once you click on the button that says data view, your screen
should look like this:
Data View:
Each row of the Data View represents a respondent or case. If
you have 350 respondents, you will note that you have 350 rows
(identified by the record numbers on the left). Each column of
the Data View represents a variable. You can identify the
variables by clicking to the variable view. You can also find out
information about the variables in the Data View window:
1. Variables Dialog Box: go to Utilities, click on
Variables. The left portion of the window lists all
variables…highlight the variable that you would like to examine
and you will see the information about that variable on the right
side of the window.
2. Numeric Values and Value Labels: to determine what a
17. numeric value represents, go to View and click on Value
Labels. This will change numeric values to value labels. To
convert back to Numeric values again, just click on Value
Labels again.
3. “Value Labels” button on tool bar: find the button that
looks like a price tag: click it and you will see value labels;
another click reverts the data to numeric values.
Important Information! Setting options:
1. In order to make your data analysis easier, you will want
to have your variables listed alphabetically. Open a data file and
go to Edit, then Options and General. You will see the variable
lists option. Choose Display names and Alphabetical and click
OK. This will tell SPSS to display your variables alphabetically
whenever it lists the variables.
2. In order to set your options for your output window, you
will want to go to Edit,Options, and click on Output Labels.
You will see an area that says Pivot Table Labeling. Click on
the arrow for “variables in labels shown as” and select the
Names and Labels. Then click on the arrow for “variable values
in labels shown as” and select Values and Labels and then click
OK.
TheSPSS Vieweris the window that provides your output. There
are two panels in the viewer:
1. Outline panel: provides a complete listing of everything
that SPSS has done in that session.
18. 2. Contents panel: results (charts, tables, graphs, etc) are
displayed here.
You will see the SPSS viewer when you perform an analysis
(we will do this in the next content guide). When you close the
SPSS viewer window, SPSS reverts back to the data editor.
SPSS
SPSS: Getting Started
Prior to attempting to work through the examples in the content
guides or working on the problem set, you will need to install
SPSS. Once you have installed SPSS on your computer, you can
simply click on a GSS file and it will automatically open in
SPSS.
SPSS is a statistical software package that was first released in
1968 and continues to be a premier software for analysis of
quantitative data. Originally used with large mainframe
computers and data punch cards, SPSS (Statistical Package for
the Social Sciences), SPSS revolutionized quantitative data
analysis. SPSS gives the researcher the tools for describing
variable statistics, analyzing two variables together, predicting
numerical outcomes, and predicting identifying groups. We will
learn how to use SPSS in this course to describe variable
statistics and bivariate analysis. (Note: you can download a MS
Word document containing the information given below by
clicking here.)
Load SPSS onto your computer before beginning this segment.
19. Then save your GSS student database onto the hard drive of
your computer. Open the GSS program and click on the first
icon that says “gss08_1500cases.sav”. As long as you open this
on the computer, you should have a screen that looks like this
(click on image to view a larger version):
When you open SPSS, you open the Data Editor. The Data
Editor is composed of the Data View and the Variable View.
Note the tabs at the bottom of the screen. The Data View is
where your data is held; the Variable View allows you to
establish value labels and describes the attributes of each
variable in your data file. Let's examine the variable view first.
Click on that tab.
Variable View holds ten columns of information about each
variable:
Name: abbreviated name of the variable
Values: the values you have assigned to the labels (for example,
1=yes, 2=no, 8= K, 9=NA
Data Type: you will most likely use "numeric" for your projects
Missing: values designated as missing
Width: the number of digits or characters in the variable view
Columns: Width of Column in Data View
Decimals: the number of decimal places that the variable
requires
Align: alignment of data in Data View: right, left or center
Labels: the description or label for your variable
Measure: level of measurement: Nominal, Ordinal, Scale
(SPSS designates both Interval and Ratio measures as Scale)
Now, let's examine the data view. Click on the tab at the bottom
of the screen.
20. Once you click on the button that says Data View, your screen
should look like this (click here to view a larger version)
Data View
Each row of the Data View represents a respondent or case. If
you have 350 respondents, you will note that you have 350 rows
(identified by the record numbers on the left). Each column of
the Data View represents a variable. You can identify the
variables by clicking to the variable view. You can also find out
information about the variables in the Data View window:
1. Variables Dialog Box: go to Utilities, click on Variables. The
left portion of the window lists all variables…highlight the
variable that you would like to examine and you will see the
information about that variable on the right side of the window.
2. Numeric Values and Value Labels: to determine what a
numeric value represents, go to View and click on Value Labels.
This will change numeric values to value labels. To convert
back to Numeric values again, just click on Value Labels again.
3. “Value Labels” button on tool bar: find the button that looks
like a price tag: click it and you will see value labels; another
click reverts the data to numeric values.
Important Information!
Setting options:
1. In order to make your data analysis easier, you will want to
have your variables listed alphabetically. Open a data file and
go to Edit, then Options and General. You will see the variable
lists option. Choose Display names and Alphabetical and click
21. OK. This will tell SPSS to display your variables alphabetically
whenever it lists the variables.
2. In order to set your options for your output window, you will
want to go to Edit, Options, and click on Output Labels. You
will see an area that says Pivot Table Labeling. Click on the
arrow for “variables in labels shown as” and select the Names
and Labels. Then click on the arrow for “variable values in
labels shown as” and select Values and Labels and then click
OK.
The SPSS Viewer is the window that provides your output.
There are two panels in the viewer:
1. Outline panel: provides a complete listing of everything that
SPSS has done in that session.
2. Contents panel: results (charts, tables, graphs, etc.) are
displayed here.
You will see the SPSS viewer when you perform an analysis
(we will do this in the next content guide). When you close the
SPSS viewer window, SPSS reverts back to the data editor.