SlideShare a Scribd company logo
1 of 75
Download to read offline
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
BRAINLINK EDUCATIONAL SERVICES
IN CONJUNCTION WITH
BRIGHT AND DELIGHT GLOBAL CONSULT
PRESENT
(SPSS 16 WORKSHOP TRAINNING)
www.alukosayoenoch.wix.com/selfcoding
Version 1.0 winter 2015
Table of Contents
Introduction – Part 1...............................................................................................4
Downloading the Data Files....................................................................................4
Starting PASW Statistics ........................................................................................4
The PASW Statistics Window................................................................................5
Data View .................................................................................................................5
Variable View ..........................................................................................................6
Creating a Data File ................................................................................................6
Defining Variables...................................................................................................6
Data Entry................................................................................................................8
Descriptive Statistics ...............................................................................................9
Frequency Analysis .................................................................................................9
Crosstabs ................................................................................................................11
Data Manipulation ................................................................................................12
Select Cases ............................................................................................................12
Splitting a File........................................................................................................14
Find and Replace...................................................................................................15
Reporting................................................................................................................16
Appendix ................................................................................................................17
Introduction – Part 2.............................................................................................18
Downloading the Data Files..................................................................................18
Null Hypothesis......................................................................................................18
Statistical Tests ......................................................................................................19
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 2
Tests of Significance ..............................................................................................19
Correlations ...........................................................................................................19
Paired-Samples T Test ..........................................................................................20
Independent-Samples T Test................................................................................22
Multiple Response Sets .........................................................................................23
Multiple Response Frequencies............................................................................24
Multiple Response Crosstabs ...............................................................................25
Data Manipulation ................................................................................................27
Copying and Pasting Variable Properties...........................................................27
Inserting Variables and Cases..............................................................................29
Deleting Variables and Cases ...............................................................................30
Merging Data Files ................................................................................................30
Creating the Data File for Merging .....................................................................30
Inputting the Data in Variable View ...................................................................30
Merging the Data Files..........................................................................................32
Appendix ................................................................................................................35
Introduction – Part 3.............................................................................................37
Downloading the Data Files..................................................................................37
Simple Regression..................................................................................................37
Scatter Plot.............................................................................................................37
Predicting Values of Dependent Variables .........................................................39
Predicting This Year’s Sales with Simple Regression Model............................41
Multiple Regression...............................................................................................43
Predicting Values of Dependent Variables .........................................................43
Predicting This Year’s Sales with Multiple Regression Model.........................45
Data Transformation ............................................................................................46
Computing..............................................................................................................46
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 3
Polynomial Regression..........................................................................................47
Regression Analysis...............................................................................................48
Analyzing the Results............................................................................................48
Chart Editing .........................................................................................................49
Adding a Line to the Scatter Plot.........................................................................49
Manipulating the Scales on X- and Y-axes .........................................................50
Adding a Title to the Chart ..................................................................................52
Adding Colors to the Chart ..................................................................................53
Filling a Background Color..................................................................................54
Introduction – Part 4.............................................................................................55
Downloading the Data Files..................................................................................55
Chi-Square .............................................................................................................55
Chi-Square Test for Goodness-of-Fit ..................................................................55
With Fixed Expected Values ................................................................................55
With Fixed Expected Values and within a Contiguous Subset of Values ........58
With Customized Expected Values......................................................................59
One-Way Analysis of Variance ............................................................................60
Post Hoc Tests........................................................................................................63
Two-Way Analysis of Variance............................................................................65
Importing/Exporting Microsoft Excel and PowerPoint.....................................68
Using Scripting for Redundant Statistical Analyses ..........................................71
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 4
Introduction – Part 1
PASW stands for Predictive Analytics Software. This program can be used to analyze data
collected from surveys, tests, observations, etc. It can perform a variety of data analyses and
presentation functions, including statistical analysis and graphical presentation of data. Among
its features are modules for statistical data analysis. These include 1) descriptive statistics, such
as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and
multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,
cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for
survey research, though by no means is it limited to just this topic of exploration.
This handout (Descriptive Statistics) introduces basic skills necessary to run PASW Statistics. It
includes how to create a data file and run descriptive statistics. It is especially tailored to answer
three research questions formulated in the sample survey questionnaire, eventually giving users
an overview of how PASW Statistics can be used for survey research. The three research
questions formulated in the sample survey are as follows:
1. What kind of computer do people prefer to own?
2. What color do people prefer for their computer?
3. Is computer color preference different between genders?
Downloading the Data Files
This handout includes sample data files that can be used for hands-on practice. The data files are
stored in a self-extracting archive. The archive must be downloaded and executed in order to
extract the data files.
 The data files used with this handout are available for download at
www.alukosayoenoch.wix.com/selfcoding
 Instructions on how to download and extract the data files are available at
www.alukosayoenoch.wix.com/selfcoding
Starting PASW Statistics
The following steps are for starting PASW
Statistics 17 using the computers in the Open
Access Labs (OALs). The steps for starting
the program at home or on other computers
may be slightly different.
To start PASW Statistics 17:
1. Click the Start button, point to All
Programs, point to Course Work,
point to SPSS Inc, point to PASW
Statistics 17, and select PASW
Statistics 17. The PASW Statistics 17
dialog box opens (see Figure 1).
2. Click the Cancel button to create a
new data file.
Figure 1 - PASW Statistics 17 Dialog Box
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
The PASW Statistics Window
The Data Editor window opens with two view tabs: Data View and Variable View. The Data
View is used for data input, and the Variable View is used for adding variables and defining
variable properties (e.g., modifying attributes of variables). As displayed in Figure 2, the Data
Editor window includes several components. The Title bar displays the name of the current file
and the application. The Menu bar allows you to access various commands that are grouped
according to function. The Toolbar provides shortcuts to commonly used menu commands.
Figure 2 - PASW Statistics Data Editor Window
DATA VIEW
When PASW Statistics is launched, the Data Editor window opens in Data View, which looks
similar to a Microsoft Excel spreadsheet (which is just an array of rows and columns). The
difference is that the rows and columns in Data View are referred to as cases and variables,
respectively (see Table 1).
Table 1 - Elements in Data View
Element Description
Variable Each column represents a variable. Any survey questionnaire item or test
item can be a variable. Commonly defined variable types are numeric or
string. When defining variables as numeric, users need to specify decimal
places. Variable names can be up to 256 characters long and must start
with a letter. Make variable names meaningful and easily recognizable.
Case Each row represents a case. The participants in the study can be cases. For
example, if 100 participants are involved in your study, then 100 cases (or
rows) of information should be generated. Responses to the question
items should be entered consistently from left to right for each participant.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 6
Cell A cell is an intersection between cases and variables. Each response to a
survey question should be entered in a cell for each participant according
to the defined variable data types.
VARIABLE VIEW
Variable View is where variables are defined by assigning variable names and specifying the
attributes, such as data type (“String,” “Date,” “Numeric,” etc.), value labels, and measurement
scales (“Nominal,” “Ordinal,” or “Scale”). Users can think of Variable View as the backbone
structure for the Data View; data cannot be entered nor viewed without first defining variables in
Variable View (see Table 2).
Table 2 - Elements in Variable View
Element Description
Variable Name PASW Statistics will initially give a default variable name (var00001) that
users can change. It is recommended to assign a brief and meaningful
name to variables (e.g., “Name,” “Gender,” and “GPA”).
Variable Type The variable type determines how the cases are entered. Generally, text-
based characters are of “String” type and number-based characters are of
“Numeric” type. For example, if a user has a variable called “Name,”
then its variable type should be “String.” Similarly, a variable named
“GPA” should be a “Numeric” type with (normally two) decimal places.
Value Labels Value labels allow users to describe what the variable name stands for.
For example, if a variable has been defined as “Fav,” most likely others
may not know what it stands for. To avoid misinterpretation, value labels
can be utilized to clearly define variable names.
Creating a Data File
Creating a new PASW Statistics data file consists of two stages: (1) defining variables and (2)
entering the data. Defining the variables involves multiple processes and requires careful
planning. Once the variables have been defined, the data can then be added.
DEFINING VARIABLES
First, variable names based on your research questionnaire need to be assigned. If variable names
are not assigned, PASW Statistics will assign default names that may not be recognizable.
Second, the Type attribute should be specified for each variable. If necessary, assign labels to
values to help all users of the file understand the data better.
To define variables (example):
1. Click the Variable View tab at the lower left corner of the Data Editor window (see
Figure 3).
2. Type [Name] in the first cell under the Name column and press the [Enter] key.
3. Under the Type column, click the ellipses button . The Variable Type dialog box opens
(see Figure 4).
4. Select the String option.
5. Click the OK button.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 7
Figure 3 - Variable View Tab
Figure 4 - Variable Type Dialog Box
6. Type [Gender] in row two under the Name column.
7. Activate the cell in row two under the Decimals column and change the entry to “0”
using the spin box.
8. Type [What is your gender?] in row two under the Label column.
9. Click the ellipses button in row two under the Values column. The Value Labels dialog
box opens (see Figure 5).
10. Type [1] in the Value: box.
11. Type [female] in the Label: box.
12. Click the Add button.
13. Repeat steps 10-12 using a value of [2] and a label of [male].
Figure 5 - Value Labels Dialog Box (Gender)
14. Click the OK button.
15. Type [GPA] in row three under the Name column and press the [Enter] key.
16. Type [Age] in row four under the Name column.
17. Click row four under the Decimals column and change the entry to “0” using the spin
box.
18. Type [What is your age?] in row four under the Label column.
19. In row four under the Values column, click the ellipses button. The Value Labels dialog
box opens (see Figure 6).
20. Type [1] in the Value: box.
21. Type [19 or younger] in the Label: box.
22. Click the Add button.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
23. Repeat steps 20-22 for values [2] through [5] and label them as shown in Table 3 (you
may also refer back to the sample questionnaire). See Figure 6 for the results.
24. Click the OK button.
Table 3 - Value Labels
Value Label
2 20-23
3 24-27
4 28-31
5 32 or over
Figure 6 - Value Labels Dialog Box (Age)
DATA ENTRY
After defining the variables, users can enter data for each case. If variables are defined as having
a “Numeric” data type, then numeric data should be entered. PASW Statistics will only accept
numeric digits (0-9) for a “Numeric” data type. If variables are defined as “String” data, any
keyboard character can be entered.
To enter data:
1. Click the Data View tab at the lower left corner of the Data Editor window (see Figure
7).
2. Click in a cell and type the corresponding data. The entry will also appear in the Cell
Editor (see Figure 8).
Figure 7 - Data View Tab
Cell EditorCell Editor
Figure 8 - Data Entry
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
Descriptive Statistics
After data has been entered, users may begin analyzing the data by using descriptive statistics.
Descriptive statistics are the most commonly used statistics for summarizing data frequency or
measures of central tendency (mean, median, and mode).
Research Question # 1
What kind of computer do people prefer to own?
FREQUENCY ANALYSIS
We can use frequency analysis to answer the first research question. Frequency analysis is a
descriptive statistical method that shows the number of occurrences of each response chosen by
the respondents. When using frequency analysis, PASW Statistics can also calculate the mean,
median, and mode to help users analyze the results and draw conclusions. The following
example will use a frequency analysis to answer “Research Question # 1: What kind of computer
do people prefer to own?” using the data collected from our sample survey (see Appendix).
To perform frequency analysis:
1. Click the Open button on the Data Editor toolbar. The Open Data dialog box
opens.
2. Locate and open the “Part 1.sav” file.
3. Click the Analyze menu, point to Descriptive Statistics, and select Frequencies… (see
Figure 9). The Frequencies dialog box opens (see Figure 10).
4. Select the variable(s) desired to be analyzed. In this case, select the variable “Computer
Owned” from the list box on the left.
5. Click the transfer arrow button . The selected variable is moved to the Variable(s): list
box.
6. Select the Display frequency tables check box if necessary.
Figure 9 - Frequency Analysis from Analyze Menu
Figure 10 - Frequencies Dialog Box
7. Click the Statistics… button. The Frequencies: Statistics dialog box opens (see Figure
11).
8. Select the Mean, Median, and Mode check boxes in the Central Tendency section; select
the Std. deviation check box in the Dispersion section.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
Figure 11 - Frequencies: Statistics Dialog Box
9. Click the Continue button. This returns you to the Frequencies dialog box.
10. Click the OK button. An Output Viewer window opens and displays the statistics and
frequency table (see Figure 12). The columns of the table “Computer Owned” display the
“Frequency,” “Percent,” “Valid Percent,” and “Cumulative Percent” for each different
type of computer owned.
Figure 12 - Frequencies Output
The measures of central tendency (mean, median, and mode) can be used to summarize various
types of data. Mode can be used for nominal data, such as computer type, computer color,
ethnicity, etc. Mean or median can be used for interval/ratio data, such as test scores, age, etc.
The mean is also useful for data with a skewed distribution.
Answer to Research Question # 1
What kind of computer do people prefer to own?
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 11
Answer: IBM or Compatible
Explanation: Look at question # 7 in the Sample Survey. Notice that option # 3 is “IBM or
Compatible.” In the output “Statistics” table, the mode for “Computer Owned” is “3,” which is
“IBM or Compatible.” In addition, the frequency analysis results for “Computer Owned”
indicates that 49 out of 80 people own an “IBM or Compatible” computer. This can be
considered their preference.
Research Question # 2
What color do people prefer for their computer?
CROSSTABS
Crosstabs are used to examine the relationship between two variables. To answer the second
research question, users will need to analyze two variables: “Computer Owned” and “Color”
(which indicates color preference). Using crosstabs will show the intersection between these two
variables and reveal the computer type and color preferred by most people.
To perform a crosstabs analysis:
1. In Data View, click the Analyze menu, point to Descriptive Statistics, and select
Crosstabs… (see Figure 13). The Crosstabs dialog box opens.
2. Select the variable “Computer Owned” from the list box on the left.
3. Click the transfer arrow button to move it to the Row(s): list box.
4. Select the variable “color” (see Figure 14).
5. Click the transfer arrow button to move it to the Column(s): list box.
6. Click the OK button. An Output Viewer window opens and displays two tables: “Case
Processing Summary” and the “Crosstabulation” matrix (see Figure 15).
Figure 13 - Crosstab Analysis from Analyze Menu
Figure 14 - Crosstabs Dialog Box
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
Figure 15 - Crosstabs Output
Answer to Research Question # 2
What color do people prefer for their computer?
Answer: IBM or Compatible in beige color
Explanation: As shown in the “Cross tabulation” matrix above, “IBM or Compatible” is the
most preferred computer type from the row variable (“Computer Owned”). From the column
variable (“color”), “beige” is shown as the most preferred color. Therefore, you can conclude
that most people prefer “IBM or Compatible” computers that are in “beige” color.
Data Manipulation
Data files are not always ideally organized in a form to meet specific needs. For example, users
may wish to select a specific subject or split the data file into separate groups for analysis.
SELECT CASES
If you have two or more subject groups in your data and you want to analyze each subject in
isolation, you can use the select cases option. For example, the data we are currently analyzing
has both male and female participants. However, if you wish to analyze only female cases, then
you select “Gender” cases and set the condition for female cases only.
To select cases for analysis:
1. Click the Data menu and select Select Cases… (see Figure 16). The Select Cases dialog
box opens (see Figure 17).
2. Click the If condition is satisfied option.
3. Click the If… button. The Select Cases: If dialog box opens.
4. Select the variable “Gender” in the left list box.
5. Click the transfer arrow button to move it to the right text box.
6. Click the = button .
7. Click the 1 button .
8. Click the Continue button. This takes you back to the Select Cases dialog box.
9. Click the OK button. This takes you back to Data View. All males will be excluded from
the statistical analysis.
10. Rerun the crosstabs analysis by following steps 1-6 of the Crosstabs section of this
handout.
11. Click the OK button. The Output Viewer window updates (see Figure 18).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 13
Figure 16 - Select Cases from Data Menu
Figure 17 - Select Cases Dialog Box
From the cross tabulation in the Output Viewer window in Figure 18 below, look at the column
for the most preferred color and the row for the computer types. Since we selected only female
cases, what is the computer color most preferred by women? Ten women chose “IBM or
Compatible” with color option “5.” Thus, you may conclude that most female participants prefer
the color “5” for “IBM or Compatible” computers. However, what does “5” represent? This
problem arose by not labeling the variable value “5” as “Other.” Moreover, even if it were
labeled “Other,” it does not indicate any particular color, making it difficult to draw a
conclusion. In order to avoid such problems, it is suggested that you provide a blank space where
participants can specify “Other” color preferences besides the ones specified in the survey
questionnaire.
Figure 18 - Select Cases Output
Example:
What kind of color do you like to have for your computer?
1. Beige 2.Black 3.Gray 4.White 5.Other __________
Research Question # 3
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
Is computer color preference different between genders?
SPLITTING A FILE
To answer the third research question, we need to split the file. You can analyze one particular
group of subjects using the select cases option. However, if you wish to compare the response or
performance differences by groups within one variable, it is best to use the split files option.
To split a file for analysis:
1. Turn off the select cases option.
2. Click the Data menu and select Select Cases…. The Select Cases dialog box opens.
3. Select the All cases option.
4. Click the OK button. Notice that the male cases that were excluded are now all included
in the data file.
5. Select the Data menu and select Split File…. (see Figure 19). The Split File dialog box
opens (see Figure 20).
Figure 19 - Split File from Data Menu
Figure 20 - Split File Dialog Box
6. Select the variable “Gender” from the left list box.
7. Select the Compare groups option.
8. Click the transfer arrow button to move the variable “Gender” to the Groups Based
on: list box.
9. Click the OK button.
10. Rerun the crosstabs analysis by following steps 1-6 of the Crosstabs section of this
handout.
11. Click the OK button. The Output Viewer window crosstabulation table opens (see
Figure 21).
For additional SPSS help, visit http://www.youtube.com/mycsula.
Figure 21 - Split File Output Data
Answer to Research Question # 3
Is computer color preference different between genders?
Answer: Yes
Explanation: There is a computer color preference difference based on gender. From the
crosstabulation output, females prefer “IBM or Compatible” of “Other” color over the colors
beige, black, gray, or white. The male group prefers “IBM or Compatible” of “black” color.
FIND AND REPLACE
In PASW Statistics, the Find and Replace function is more efficient to use. Users can use Find
and Replace in Data View. However, only the Find function is available for users in Variable
View.
To use the Find and Replace function:
1. Click the Edit menu and select Find…. The Find and Replace dialog box opens (see
Figure 22).
2. In the Find: box, type [Clinton].
3. Select the Replace check box to replace ‘Clinton’ with another word.
4. Click in the Replace with: box, and type the name [Cliff].
5. Click the Show Options button.
6. Under Match to, select the Entire cell option.
7. Click the Replace All button.
Figure 22 - Find and Replace Dialog Box (Data View)
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
16
NOTE: Under the Match to section of the Find and Replace dialog box (see Figure 22),
Contains means PASW Statistics will find each instance of the word/phrase/number appearing in
a cell, whether or not it is the only information enclosed. The Entire cell option will find the
word/phrase/number that matches the entire cell as a whole. Selecting the Begins with and Ends
with options will search the character indicated by the user.
Reporting
Once the statistical analysis is complete, the final step is to create a report. In the report, you may
include PASW Statistics output (e.g., graphs and tables) for supporting your analysis. Using the
Copy and Paste functions, the tables/graphs generated in PASW Statistics can be copied from the
Output Viewer window and pasted into a Microsoft Word document without having to create
new tables or graphs.
To create a report using Microsoft Word:
1. In the Output Viewer window, right-click a table. A box appears around the table and a
red arrow to the left of the table (which means it is selected).
2. Select Copy from the shortcut menu.
3. Open Microsoft Word.
4. Right-click in the Word document and select Paste from the shortcut menu. The table is
copied into the Word document.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
17
Appendix
SAMPLE SURVEY
Research Questions
1. What kind of computer do people prefer to own?
2. What color do people prefer for their computer?
3. Is computer color preference different between genders?
Survey Questions
1. What is your name? ____________________________
2. What is your gender? ____________________________
3. What is your G.P.A.? ____________________________
4. What is your age?
1. 19 or younger 2. 20-23 3. 24-27 4. 28-31 5. 32 or over
5. How much do you make in a month?
1. Less than $1000 2. $1000–$1499 3. $1500–$1999 4. $2000–$2499 5. Over $2500
6. What is your class standing?
1. Freshman 2. Sophomore 3. Junior 4. Senior 5. Graduate
7. What kind of computer do you own?
1. Toshiba 2. Apple 3. IBM or Compatible 4. Other 5. None
8. What kind of computer have you used?
1. IBM or Compatible 2. Apple 3. Toshiba 4. Other 5. None
9. What color do you like to have for your computer?
1. Beige 2. Black 3. Gray 4. White 5. Other
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
18
Introduction – Part 2
PASW stands for Predictive Analytics Software. This program can be used to analyze data
collected from surveys, tests, observations, etc. It can perform a variety of data analyses and
presentation functions, including statistical analysis and graphical presentation of data. Among
its features are modules for statistical data analysis. These include 1) descriptive statistics, such
as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and
multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,
cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for
survey research, though by no means is it limited to just this topic of exploration.
This handout (Test of Significance) introduces 1) several data entry and data manipulation
techniques that help you save time, 2) basic skills to perform tests of significance, such as
correlations and t tests, and 3) an introduction to multiple response sets. The step-by-step
instructions will help you understand how to interpret the output of your tests from data supplied
by your research question(s). Follow the steps carefully to get appropriate results. Please note
that a slightly different process might yield unexpected and complicated results. This is a
continuation of the PASW Statistics Descriptive Statistics handout.
Downloading the Data Files
This handout includes sample data files that can be used for hands-on practice. The data files are
stored in a self-extracting archive. The archive must be downloaded and executed in order to
extract the data files.
 The data files used with this handout are available for download at For more assistance,
visit www.alukosayoenoch.wix.com/selfcoding.
 Instructions on how to download and extract the data files are available at For more
assistance, visit www.alukosayoenoch.wix.com/selfcoding.
Null Hypothesis
The null hypothesis (H0) represents a theory that has been presented, either because it is believed
to be true or because it is to be used as a basis for an argument. It is a statement that has not been
proven. It is also important to realize that the null hypothesis is the statement of no difference.
For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is
no better, on average, than the current drug (in other words, the new drug exhibits the same
behavior as the old drug). The null hypothesis (H0) and the alternative hypothesis (H1) can be
stated as:
H0: There is no difference between the two drugs.
H1: There is a significant difference between the two drugs.
Special consideration is given to the null hypothesis. This is due to the fact that the null
hypothesis relates to the statement being tested, whereas the alternative hypothesis relates to the
statement to be accepted if and when the null is rejected.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
19
The final conclusion, once the test has been carried out, is always given in terms of the null
hypothesis. The result is either "Reject H0 in favor of H1" or "Do not reject H0"; the conclusion is
never "Reject H1" or "Accept H1."
If the conclusion is "Do not reject H0," this does not necessarily mean that the null hypothesis is
true. It only suggests that there is no sufficient evidence against H0 in favor of H1. Rejecting the
null hypothesis then suggests that the alternative hypothesis may be true.
NOTE: The null hypothesis essentially states that the given cases or items under consideration are
statistically the same or exhibit the same behavior without any significant difference. The alternate
hypothesis states that the given cases exhibit different behavior or that they have a statistically significant
difference.
Statistical Tests
Statistics is a set of mathematical techniques used to summarize research data and determine
whether the data supports a proposed hypothesis. PASW Statistics includes tools that can be used
to analyze variables and determine the strength and nature of the relationship between two
variables and whether the means (averages) of two data sets (samples) are statistically the same
or different.
Tests of Significance
The following examples are sample research questions that can be answered using PASW
Statistics analytical methods.
CORRELATIONS
A correlation is a statistical device that measures strength or degree of a supposed linear
association between two or more variables. One of the more common measures used is the
Pearson correlation, which estimates a relationship between two interval variables.
Research Question # 1
Is there a relationship between academic performance and Internet access?
H0: There is no difference between academic performance and Internet access.
H1: There is a significant difference between academic performance and Internet
access.
To run a correlation analysis:
1. Locate and open the “Part 2.sav” file.
2. Click the Analyze menu, point to Correlate, and select Bivariate…. The Bivariate
Correlations dialog box opens (see Figure 1).
3. Select the variables “active,” “posttest,” and “gpa” in the list box on the left.
4. Click the transfer arrow button to move them to the Variables: list box.
5. Select the Pearson check box and the Two-tailed option if necessary.
6. Click the OK button. The Output Viewer window opens with a “Correlations” table
(see Figure 2).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
20
Figure 1 - Bivariate Correlations Dialog Box
Figure 2 - Bivariate Correlations Output Table
The Answer to Research Question # 1
Is there a relationship between academic performance and Internet access?
Answer: Yes
Explanation: As shown in Figure 2 above, the correlation index for the relationship between “active”
and “posttest” is 0.476, which is between 0.4-0.7. The correlation index for the relationship between
“active” and “gpa” is 0.448, which is between 0.4-0.7. The results from these analyses indicate that
there is a moderate, positive relationship between academic performance and Internet access.
PAIRED-SAMPLES T TEST
A Paired-Samples T Test is used to test if an observed difference between two means is
statistically significant. To run a t test, the following assumptions should be met: the data 1) has
normal distribution, 2) is a large data set, and 3) has no outliers. If any of these assumptions are
not met, then a nonparametric test should be used.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
21
Research Question # 2
Is there an instructional effect taking place in the computer class?
H0: There is no influence of using the Internet on academic achievement for this class.
H1: There is an influence of using the Internet on academic achievement for this class.
The hypothesis is that Internet familiarity cannot influence the academic achievement in the
computer class. The variables that reflect academic achievement are “pretest” and “posttest.”
To run a Paired-Samples T Test:
1. Click the Analyze menu, point to Compare Means, and select Paired-Samples T
Test…. The Paired-Samples T Test dialog box opens (see Figure 3).
2. Select the variables “pretest” and “posttest” in the list box on the left.
3. Click the transfer arrow button to move them to the Paired Variables: list box.
4. Click the OK button. The Output Viewer window opens (see Figure 4).
Figure 3 - Paired-Samples T Test Dialog Box
The Answer to Research Question # 2
Is there an instructional effect taking place in the computer class?
Figure 4 - Paired-Samples T Test Output Table
Answer: Yes
Explanation: The observed mean difference is -4.5172. Since the value of t is -3.820 at p < .001,
the mean difference (-4.5172) between “pretest” and “posttest” is statistically significant.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
22
According to the Sig. of 0.001 (which is less than 0.05), the hypothesis is rejected. Therefore, it
can be inferred that there was instructional effect taking place in the computer class.
INDEPENDENT-SAMPLES T TEST
An Independent-Samples T Test is used to determine the likelihood that two independent data
samples came from populations that have identical means. If this were true, then the difference
between the means should be equal to zero. The null hypothesis in this case would be that the
two means are equal.
Two variables are required in the data set. One variable is the measured parameter. Examples
include weight, height, or frequency. The second variable divides the data set into two groups.
Light and Dark are the groups whose means will be compared.
Research Question # 3
Is there a difference in the average number of seedlings grown in the light
and those grown in the dark?
In this example, 20 Petri dishes each contained 10 celery seeds. Ten of the dishes were kept in
the dark for one week; the other 10 were placed under a grow light for the same amount of time.
At the end of the week, the number of seeds that sprouted was counted in each dish.
H0: Variance (light) = variance (dark).
H1: Variance (light) ≠ variance (dark).
H0: There is no difference between seedlings under the light and in the dark ( (light) =  (dark)).
H1: There is sig. difference between seedlings under the light and in the dark (  (light) ≠ 
(dark)).
NOTE: The first set of hypotheses is testing the variance, while the proceeding set is testing for the mean.
The variances have to be equal before we can determine if the means are equal.
NOTE: Variance: The arithmetic mean of the squared deviations from the mean, which is essentially used
to see how far the single samples are from the mean. We need to make sure the variances are equal before
we can determine if the means are equal. If the variances are equal, users will be able to move to the T
Test. If the variances are not equal, users will have to do more testing.
To run the Independent-Samples T Test:
1. Locate and open the “Seedlings.sav” file.
2. In Data View, click the Analyze menu, point to Compare Means, and select
Independent-Samples T Test…. The Independent-Samples T Test dialog box opens (see
Figure 5).
3. Select the “Seedlings” variable in the list box on the left.
4. Click the transfer arrow button to move the variable to the Test Variable(s): list box.
5. Select the “Treatment” variable in the list box on the left.
6. Click the transfer arrow button to move the variable to the Grouping Variable: list box.
7. Click the Define Groups… button. The Define Groups dialog box opens (see Figure 6).
8. Enter [0] in the Group 1: box, enter [1] in the Group 2: box, and then click the Continue
button.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
23
9. Click the OK button. The Output Viewer window opens with several tables, including
an Independent-Samples Test table (see Figure 7).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
24
Figure 5 - Independent-Samples T Test Dialog Box Figure 6 - Define Groups Dialog Box
The Answer to Research Question # 3
Is there a difference in the average number of seedlings grown in the light
and those grown in the dark?
Figure 7 - Independent-Samples T Test Output
Answer: Yes
Explanation: The mean difference in seedlings sprouted between the two treatments (light and
dark) was -2.900. The value of t, which is -3.179, was statistically significant (p=0.005).
Therefore, the null hypothesis is rejected.
Multiple Response Sets
Very often, a survey will contain questions where the respondent is allowed to select more than
one answer. Managing such questions in PASW Statistics can produce some difficulty. Each
response in a multiple response question should be coded as a separate variable and then grouped
under a multiple response set of variables. The multiple response set can then be analyzed using
frequency counts or crosstabs.
To define a multiple response set of variables:
1. Locate and open the “Airlines.sav” file.
2. In Data View, click the Analyze menu, point to Multiple Response, and select Define
Variable Sets… (see Figure 8). The Define Multiple Response Sets dialog box opens (see
Figure 9).
For additional SPSS help, visit http://www.youtube.com/mycsula.
Figure 8 - Define Variable Sets from Analyze Menu
Figure 9 - Define Multiple Response Sets Dialog Box
3. Select the “American,” “TWA,” “United,” “USAir,” and “Other” airline variables and
move them to the Variables in Set: list box.
4. Make sure the Dichotomies option is selected and enter [1] in the Counted value: box.
5. Type [Airlines] in the Name: box.
6. Type [Airline frequency of response] in the Label: box.
7. Click the Add button. The set is created as “$Airlines” and listed in the Multiple
Response Sets: list box.
8. Click the Close button.
MULTIPLE RESPONSE FREQUENCIES
It is possible to obtain the answer by running a frequency analysis for each of the airline
variables. The result of such an analysis will only provide an overall raw frequency for each
response and will not allow percentage comparisons between the different airlines. A frequency
analysis that uses a multiple response set will provide an appropriate response with concise
output.
Research Question # 4
In a survey of airline passengers, which airline was selected as having been
flown most often in the previous six months?
To analyze the frequency of response for each variable in a multiple response set:
1. Click the Analyze menu, point to Multiple Response, and select Frequencies…. The
Multiple Response Frequencies dialog box opens (see Figure 10).
2. Select the multiple response set labeled “$Airlines” and move it to the Table(s) for: list
box.
3. Click the OK button. An Output Viewer window opens with the frequency analysis (see
Figure 11).
For additional SPSS help, visit http://www.youtube.com/mycsula.
Figure 10 - Multiple Response Frequencies Dialog Box
The Answer to Research Question # 4
In a survey of airline passengers, which airline was selected as having been
flown most often in the previous six months?
Figure 11 - Airline Frequency Analysis Output
Answer: United
Explanation: As seen in the Output Viewer window, there were 18 people surveyed and 44 total
responses generated. Of the 44 total responses, United was selected most often with 12 responses
(representing 27.3% – the largest portion of the total responses).
MULTIPLE RESPONSE CROSSTABS
Without the use of a multiple response set, each airline would have to be analyzed against the
variable that the passengers used to identify themselves as being afraid of flying. This would
require the use of a crosstab analysis. However, the overall results would not allow for easy
comparison between each of the airlines. The best way to answer the question would be to
include the multiple response set into a crosstab analysis.
Research Question # 5
In a survey of airline passengers, which airline was selected most often by
those passengers who identified themselves as afraid to fly?
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
27
To incorporate a multiple response set into a crosstab analysis:
1. Click the Analyze menu, point to Multiple Response, and select Crosstabs…. The
Multiple Response Crosstabs dialog box opens (see Figure 12).
Figure 12 - Multiple Response Crosstabs Dialog Box
2. Select the “FearFactor” variable as the Row(s): variable and the “$Airlines” multiple
response set as the Column(s): variable.
3. Select the “FearFactor” variable after it is designated as the Row(s): variable. The
Define Ranges… button becomes active.
4. Click the Define Ranges… button. The Multiple Response Crosstabs: Define Variable
Ranges dialog box opens (see Figure 13).
Figure 13 - Multiple Response Crosstabs: Define Variable Ranges Dialog Box
5. Enter [0] in the Minimum: box and [1] in the Maximum: box for the “FearFactor”
variable.
6. Click the Continue button.
7. Click the Options… button. The Multiple Response Crosstabs: Options dialog box opens
(see Figure 14).
8. Select the Cases option and then click the Continue button.
9. Click the OK button. The Output Viewer window opens with the crosstab results (see
Figure 15).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
28
Figure 14 - Multiple Response Crosstabs: Options Dialog Box
The Answer to Research Question # 5
In a survey of airline passengers, which airline was selected most often by
those passengers who identified themselves as afraid to fly?
Figure 15 - Multiple Response Crosstabs Output
Answer: USAir
Explanation: Of the 18 people surveyed, ten identified themselves as being afraid to fly. Within
that group of survey respondents, USAir was the airline selected most often (seven times).
Data Manipulation
PASW Statistics also provides tools to make data manipulation a simple task.
COPYING AND PASTING VARIABLE PROPERTIES
Copying and pasting is very useful when the same properties need to be given to different
variables.
To copy and paste variable properties:
1. Click the File menu, point to New, and select Data.
2. Click the Variable View tab at the lower left corner of the Data Editor window (see
Figure 16).
Figure 16 - Variable View Tab
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
29
3. Type [active] in the first cell under the Name column and press the [Enter] key.
4. Click in the first cell under the Decimals column and decrease the entry to “0.”
5. Click in the first cell under the Values column and click the Ellipses button . The
Value Labels dialog box opens (see Figure 17).
6. Type [1] in the Value: box.
7. Type [Strongly Disagree] in the Label: box.
8. Click the Add button.
9. Assign [2], [3], and [4] for [Disagree], [Agree], and [Strongly Agree], respectively, by
repeating steps 6-8 for each value added (see Figure 17).
Figure 17 - Value Labels Dialog Box
10. Click the OK button.
11. Switch back to Data View (see Figure 18).
12. Click the “active” variable heading to highlight the column.
13. Click the Edit menu and select Copy to copy the properties of the variable “active.”
14. Highlight the number of variables needed to apply the same properties to by clicking on
the header of the first variable and dragging the pointer across to the last header (see
Figure 19 and Figure 20).
15. Click the Edit menu and select Paste. The copied properties of the variable “active” will
be applied to the target variables, and the Data View and Variable View will change (see
Figure 21 and Figure 22).
Figure 18 - Data View Tab
Figure 19 - Selected Variable
For additional SPSS help, visit http://www.youtube.com/mycsula.
Figure 20 - Selecting Target Variables
Figure 21 - Data View Showing New Variables
Figure 22 - Variable View Showing New Variables
INSERTING VARIABLES AND CASES
By using Insert Variable and Insert Cases, variables and cases can be added into any location
of the data file in a simple, straightforward manner. Assume that one wants to insert a new
variable named “midterm” between “pretest” and “posttest” and use it for test score data. The
following instructions describe how to insert a new variable and make it available for “Numeric”
data type.
To insert a variable:
1. Switch to Data View.
2. Click the “posttest” variable heading to highlight the column.
3. Click the Edit menu and select Insert Variable. A new variable is inserted to the left of
the highlighted variable (“posttest”).
NOTE: The new variable is created with a default name “VAR00001” which can be changed
later.
4. To define the properties of the new variable, double-click the variable heading. The
Variable View is activated for the new variable.
5. Type [midterm] in the Name column of the new variable.
6. Change the variable type if desired.
In the same manner, it is possible to insert cases in a particular location in Data View. For
instance, assume that a case should be inserted between case “10” and “11” for a particular
student’s record. By following the instructions below, one case will be inserted after the 10th
case.
To insert cases (example):
1. Switch to Data View.
2. Click row number “11” to highlight the case.
3. Click the Edit menu and select Insert Cases. A new case is inserted above case “11.”
For additional SPSS help, visit http://www.youtube.com/mycsula.
DELETING VARIABLES AND CASES
Variables and cases can be deleted by using the Delete command.
To delete a variable or case:
1. In Data View, click the variable heading or the case number to highlight what will be
deleted.
2. Click the Edit menu and select Clear. The variable or case is deleted.
Merging Data Files
The merging data files function is useful for users who store each of their topics in separate files
and eventually need or want to combine them together. This allows users to import data from one
file into another as long as both sets of data (from each file) contain a common identifier for each
of the cases that the user wishes to combine.
An identifier has no meaning other than to distinguish each case from one another, and to
identify the correlating cases from the additional data files. This identifier can be a unique value,
number, or letter combination to be applied to each case.
NOTE: The variables do not have to be the same across data files.
CREATING THE DATA FILE FOR MERGING
Scenario: A psychological focus group on campus needs to create a file for a longitudinal study
for ten students on campus. Each file will have the same students, but four different focal points
of study pertaining to each question. Over the five year span of the study, the ten students will be
asked twelve questions each year (one a month), and the same questions will be asked each year.
At the end of the year, the three files will be combined in an annual questionnaire file to be
properly analyzed.
The merging data files function can be used to satisfy this requirement.
Inputting the Data in Variable View
Files must be created first before being merged.
To create a data file for merging:
1. Click the File menu, point to New, and select Data.
2. Once the new file has been created, select the Variable View tab.
3. For the first variable, name it [ID] to be your identifier variable, and press the [Enter]
key.
4. Change the Type attribute by clicking the ellipses button and selecting the String option
from the Variable Type dialog box.
5. Change the width to [10] and click the OK button.
6. Click in the second variable cell, type [January], and press the [Enter] key.
7. Change the Type attribute to String.
8. In the Label attribute, type [What pet would you like to own?] (see Figure 23).
9. Repeat steps 6 through 8 to enter the data in Table 1.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 32
Figure 23 - Define Variables in Variable View
Table 1 - Variables for Case Study
Month Attribute Type Length Label Attribute
February String 10 What is your favorite shape?
March String 12 It is 1:30pm, what are you eating?
April String 12 What is your preferred beverage?
10. Once this information has been defined in Variable View, switch by clicking the Data
View tab to enter the corresponding case information.
11. Enter [Alfred] in case 1 of the ID variable, [Bethel] in case 2 of the ID variable, down to
[Jessie] in case 10 of the ID variable. Enter the corresponding information according to
Table 2. See Figure 24 for the results.
Table 2 - Input Case Information
Case ID January February March April
1 Alfred Dog Star Pizza Water
2 Bethel Cat Square Fruit Soda Pop
3 Chris Cat Triangle Veggies Grape Juice
4 Dante Dog Rectangle Sandwich Orange Juice
5 Erica Tiger Oval Chips Aloe Water
6 Fernando Tarantula Circle Calzon Beer
7 Grenadine Dog Octagon Salad White Wine
8 Harold Bees Polygon Soup Naked Juices
9 Isadora Turtle Rhombus PandaExpress V8 Juice
10 Jessie Hamster Oval Egg Salad Lemonade
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 33
Figure 24 - Input Case Information
12. Save the file by clicking the File menu and selecting Save. The Save Data As dialog box
opens.
13. Select the Desktop as the destination and type [Merge 1] in the File name: text box.
14. Click the Save button.
15. Close the Output Viewer window.
MERGING THE DATA FILES
To merge data files, all files must have a common variable. The common variable in this case is
ID.
To merge data files: (First, make sure the files have the same IDs.)
1. Open the files “Merge 2” and “Merge 3” and check for consistency across all of the IDs.
2. Minimize the “Merge 2” and “Merge 3” data files.
3. Once back in the “Merge 1” file, click the Data menu, point to Merge Files, and select
Add Variables… (see Figure 25).
Figure 25 - Data Menu When Selecting Add Variables
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 34
4. The Add Variables to Merge 1.sav dialog box opens. Select the An external PASW
Statistics data file option and click the Browse… button (see Figure 26).
Figure 26 - Add Variables to Merge 1.sav Dialog Box
5. Locate and select the “Merge 2” data file and click the Open button.
6. Click the Continue button. The Add Variables from Merge 2.sav dialog box opens (see
Figure 27).
7. Select the Match cases on key variables in sorted files check box.
8. From the Excluded Variables: list box, select “ID>(+)” (see Figure 27), and using the
transfer arrow button , move it to the Key Variables: box.
Figure 27 - Add Variable from Merge 2.sav Dialog Box
9. Click the OK button. A warning message dialog box opens (see Figure 28).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 35
Figure 28 - Sorting Warning Dialog Box
10. Click the OK button to close the warning message. The finished product should look like
Figure 29.
Figure 29 - Merged 1 and 2 Files
11. Repeat steps 3-10 for the “Merge 3” file.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 36
Appendix
QUESTIONNAIRE
This survey is designed to investigate relationships between Internet access and academic
success. It consists of three parts: questions related to the background information of the
respondent, questions about Internet use patterns, and several open-ended questions. Please
select appropriate answers that best describe your activities on the Internet as truthfully as
possible. The results of this study will be used anonymously for the PASW Statistics Part 2: Test
of Significance workshop.
Background Information
1. Age: ____________________________
2. Major: ___________________________
3. G.P.A.: __________________________
4. Monthly Income: __________________
Internet Access
5. Do you have a computer at home?
1. Yes 2. No
6. Where do you surf on the Internet? (You can circle more than one option for this question.)
1. At school 2. At home 3. At work 4. Other
____________
7. How long do you stay online per day?
1. Less than 30 minutes 2. 1-2 hours 3. More than two hours
Questions 8 through 19 are designed to investigate the frequency and types of activities on
the Internet. These questions have a 4 point Likert-scale ranging from strongly disagree to
strongly agree. Please circle the option that best describes your activities on the Internet.
SD: Strongly Disagree
D: Disagree
A: Agree
SA: Strongly Agree
SD D A SA
8. I am a very active Internet surfer. 1 2 3 4
9. I surf the Internet to look for articles for research
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 37
papers. 1 2 3 4
SD D A SA
10. I surf the Internet to read current news. 1 2 3 4
11. I use the Internet only to e-mail my friends,
family, and professors. 1 2 3 4
12. I surf the Internet to check movie schedules. 1 2 3 4
13. I surf the Internet to look for personal
information (e.g., yellow pages). 1 2 3 4
14. I surf the Internet to look for job openings 1 2 3 4
15. I use the Internet to play games. 1 2 3 4
16. I use the Internet to download forms and files
(e.g., income tax forms). 1 2 3 4
17. I surf the Internet to improve my computer skills. 1 2 3 4
18. I surf the Internet to purchase books. 1 2 3 4
19. I surf the Internet to purchase other merchandise
(e.g., video tapes, clothes, computers). 1 2 3 4
Question 20 is an open-ended question.
20. Are there any other Internet activities that are not included in this survey? If so, please
describe them below.
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 38
Introduction – Part 3
PASW stands for Predictive Analytics Software. This program can be used to analyze data
collected from surveys, tests, observations, etc. It can perform a variety of data analyses and
presentation functions, including statistical analysis and graphical presentation of data. Among
its features are modules for statistical data analysis. These include 1) descriptive statistics, such
as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and
multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,
cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for
survey research, though by no means is it limited to just this topic of exploration.
This handout (Regression Analysis) provides basic instructions on how to answer research
questions and test hypotheses through the use of linear regression (a technique which examines
the relationship between a dependent variable and a set of independent variables). The value of
the dependent variable (e.g., salesperson’s total annual sales) can be predicted based on its
relationship to the independent variables used in the analysis (e.g., age, education, and years of
experience). The two research questions proposed for this workshop are as follows:
1. How much will each salesperson make this year?
2. Who will qualify for a $1,000 bonus?
Downloading the Data Files
This handout includes sample data files that can be used for hands-on practice. The data files are
stored in a self-extracting archive. The archive must be downloaded and executed in order to
extract the data files.
 The data files used with this handout are available for download at For more assistance,
visit www.alukosayoenoch.wix.com/selfcoding
 Instructions on how to download and extract the data files are available at For more
assistance, visit www.alukosayoenoch.wix.com/selfcoding
Simple Regression
Simple regression estimates how the value of one dependent variable (Y) can be predicted based
on the value of one independent variable (X). The linear equation for simple regression is as
follows:
Y = aX + b
Simple regression can answer the following research question:
Research Question # 1
How much will each salesperson make this year?
SCATTER PLOT
A scatter plot displays the nature of the relationship between two variables. It is recommended to
run a scatter plot before performing a regression analysis to determine if there is a linear
relationship between the variables. If there is no linear relationship (i.e., points on a graph are
not clustered in a straight line), there is no need to run a simple regression.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 39
To run a scatter plot:
1. Start PASW Statistics 17.
2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens.
3. Locate and open the “Regression.sav” file.
4. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot… (see Figure
1). The Scatter/Dot dialog box opens (see Figure 2).
NOTE: To estimate the relationship between two variables, select the Simple Scatter plot.
Figure 1 - Graphs Menu When Selecting
Scatter/Dot
Figure 2 - Scatter/Dot Dialog Box
5. If necessary, select the Simple Scatter option, and then click the Define button (see
Figure 2). The Simple Scatterplot dialog box opens (see Figure 3).
Figure 3 - Simple Scatterplot Dialog Box
6. Select the variable “Last year sales [lastsale]” from the list box on the left.
7. Click the first transfer arrow button to move the variable to the Y Axis: box.
8. Select the variable “Years of experience [yearexpe]” from the list box on the left.
9. Click the second transfer arrow button to move the variable in the X Axis: box.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 40
10. Click the OK button. The Output Viewer window opens with a scatter plot of the
variables (see Figure 4).
NOTE: A graph similar to Figure 4 will be displayed in the Output Viewer window. This scatter
plot indicates that there is a linear relationship between the variables “Last year sales” and “Years
of experience.”
The next step is to find a line that best accommodates the pattern of points in this scatter plot.
The steps on how to enhance graph appearance are included in the last section of this handout.
Figure 4 - Scatter Plot
PREDICTING VALUES OF DEPENDENT VARIABLES
Since it is known that a linear relationship exists between the two variables, the regression
analysis can be performed to predict this year’s sales.
To run a simple regression analysis:
1. Switch to the Data Editor window.
2. Click the Analyze menu, point to Regression, and select Linear… (see Figure 5). The
Linear Regression dialog box opens.
Figure 5 - Analyze Menu When Selecting Linear
3. Select the variable “Last year sales [lastsale]” from the variable list box on the left and
move it to the Dependent: box by clicking the first transfer arrow button (see Figure 6).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 41
Figure 6 - Linear Regression Dialog Box
4. Select the variable “Years of experience [yearexpe]” from the variable list box on the
left and move it to the Independent(s): box by clicking the second transfer arrow button.
5. Click the OK button.
The following tables present the results of a simple regression. “R Square” (.918) indicates that
this model accounts for almost 92% of the total variation in the data (see Figure 7).
Figure 7 - Model Summary Output
Figure 8 - Coefficients Output
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 42
The slope and the y-intercept as seen in Figure 8 should be substituted in the following linear
equation to predict this year’s sales: Y = aX + b. In this case, the values of a, b, x, and y will be
as follows:
a = 1954.658
b = 440.987
X = Years of experience (values of independent variable)
Y = Last year sales (values of dependent variable)
PREDICTING THIS YEAR’S SALES WITH SIMPLE REGRESSION MODEL
To predict this year’s sales for each salesman, the values of a and b should be substituted in the
following linear equation:
Y = aX + b
Last year sales = (a * yearexpe) + b
This year sales = (1954.658 * yearexp2) + 440.987
a = 1954.658
b = 440.987
X = Years of experience [yearexp2]
Y = This year sales
NOTE: The new independent variable, “yearexp2” is used instead of “yearexpe” in order to predict
this year’s sales.
To predict this year’s sales using the computing function:
1. Switch to the Data Editor window.
2. Click the Transform menu and select Compute Variable…. The Compute Variable
dialog box opens (see Figure 9).
3. In the Target Variable: box, type [Simple].
Figure 9 - Compute Variable Dialog Box
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 43
4. In the Numeric Expression: box, enter the following equation by typing or selecting
from the dialog box keypad:
[1954.658 * yearexp2 + 440.987]
NOTE: It is recommended to select the variable “yearexp2” directly from the variable list box
on the left of the Compute Variable dialog box to prevent typing mistakes.
5. Click the OK button. The results will be displayed in the Simple column in Data View
(see Figure 10).
Figure 10 - Simple Regression Results
To change the data type for the new variable “Simple”:
1. Click the Variable View tab at the lower left corner of the Data Editor window (see
Figure 11).
Figure 11 - Variable View Tab
2. Locate the variable “Simple” and click the Ellipses button under the Type column.
The Variable Type dialog box opens (see Figure 12).
3. Select the Dollar option, and then select the $###,###,### format (12 digits width with 0
decimal places).
Figure 12 - Variable Type Dialog Box
4. Click the OK button, and then click the Data View tab.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 44
Figure 13 - Simple Regression Prediction
NOTE: The prediction of this year’s sales for each salesperson are computed under the new
variable named “Simple” as shown in Figure 13.
Multiple Regression
Multiple regression estimates the coefficients of the linear equation when there is more than one
independent variable that best predicts the value of the dependent variable. For example, it is
possible to predict a salesperson’s total annual sales (the dependent variable) based on
independent variables such as age, education, and years of experience. The linear equation for
multiple regression is as follows:
Z = aX + bY + c
PREDICTING VALUES OF DEPENDENT VARIABLES
The previous section demonstrated how to predict this year’s sales (the dependent variable)
based on one independent variable (number of years of experience) by using simple regression
analysis. Similarly, this year’s sales (the dependent variable) can be predicted from more than
one independent variable, such as “Years of experience” and “Years of education,” by using
multiple regression analysis.
To run multiple regression analysis:
1. Click the Analyze menu, point to Regression, and select Linear…. The Linear
Regression dialog box opens (see Figure 14).
2. From the variable list box, select “Last year sales [lastsale]” as a dependent variable and
move it to the Dependent: box by clicking the first transfer arrow button .
3. From the variable list box, select “Years of experience [yearexpe]” and “Years of
education [educatio]” and move them to the Independent(s): box by clicking the second
transfer arrow button .
4. Click the OK button.
NOTE: If there are variables in the Independent(s): or Dependent: boxes, click the Reset button
before performing steps 2 and 3 above.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 45
Figure 14 - Linear Regression Dialog Box
Figure 15 - Model Summary Output for Multiple
Regression
NOTE: The table should look similar to Figure
15. “R Square” = “.976” indicates that this
model can predict this year’s sales almost 98%
correctly.
Figure 16 - Multiple Regression Output
The slopes and the y-intercept as seen in Figure 16 should be substituted in the following linear
equation to predict this year’s sales: Z = aX+ bY + c
In this case, the values of a, b, x, and y will be as follows:
a = 1874.5
b = 609.391
c = (-8510.838)
X = Years of experience (independent variable)
Y = Years of education (independent variable)
Z = This year sales (dependent variable)
As indicated in the output table, the coefficient for “Years of experience” is “1874.5”and the
coefficient for “Years of education” is “609.391.”
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 46
PREDICTING THIS YEAR’S SALES WITH MULTIPLE REGRESSION MODEL
To predict this year’s sales for each salesman, the values of a, b, and c should be substituted in
the following linear equation: Z = aX + bY + c
This year sales = 1874.5 * Years of experience + 609.391 * Years of education + (-8510.838)
To predict this year’s sales by multiple regression analysis:
1. Switch to the Data Editor window.
2. Click the Transform menu and select Compute Variable…. The Compute Variable
dialog box opens (see Figure 17).
3. Click the Reset button.
4. In the Target Variable: box, type [multiple].
5. In the Numeric Expression: box, enter the following equation by typing or selecting
from the dialog box keypad:
[1874.5 * yearexp2 + 609.391 * educatio - 8510.838]
Figure 17 - Compute Variable Dialog Box
6. Click the OK button. The results will be displayed in the multiple column in Data View
(see Figure 18).
Figure 18 - Multiple Regression Results
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 47
NOTE: The predictions of sales for each salesperson using two independent variables are listed under the
new variable named “multiple.”
Data Transformation
Situations may arise where data transformation is useful. Most data transformations can be done
with the Compute… command. Using this command, the data file can be manipulated to fit
various statistical performances.
Research Question # 2
Who will earn a $1,000 bonus?
COMPUTING
Since each person’s yearly sales were already predicted, those who made more than $2,000
above the predicted values, obtained via multiple regression analysis, will receive $1,000 as a
bonus. Using the Compute… command, those salespeople who met the criteria can be easily
located by comparing the values of this year’s actual sales with the predictions from multiple
regression analysis computed in the previous lesson.
The first step in predicting who will receive a bonus is to calculate the difference between this
year’s actual sales and the prediction of this year’s sales from the multiple regression analysis.
To predict who will qualify for the bonus:
1. Open the “Bonus.sav” file.
2. If the Save As dialog box opens, click the No button.
3. Click the Transform menu and select Compute Variable…. The Compute Variable
dialog box opens (see Figure 19).
4. In the Target Variable: box, type [bonus].
5. In the Numeric Expression: box, type [1000].
Figure 19 - Compute Variable Dialog Box
6. Click the If… button. The Compute Variable: If Cases dialog box opens (see Figure 20).
7. Select the Include if case satisfies condition: option.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 48
8. Enter the following expression by typing or selecting from the dialog box keypad:
[thissale - multiple >= 2000]
Figure 20 - Compute Variable: If Cases Dialog Box
NOTE: It is recommended that you select the variables and the >= sign directly from the variable
list box and keypad provided in the dialog box to prevent mistakes.
9. Click the Continue button, and then click the OK button.
NOTE: Salespersons #49 “Jason” and #44
“Ivett” are a couple of the sales personnel
who will be qualified to receive a $1,000
bonus due to them making $2,000 over
their predicted sales from last lesson (see
Figure 21).
Figure 21 - Bonus Results
Polynomial Regression
This type of regression involves fitting a dependent variable (Yi) to a polynomial function of a
single independent variable (Xi). The regression model is as follows (see Table 1 for the meaning
of the variables):
Yi = a + b1Xi + b2Xi
2 + b3Xi
3 + … + bkXi
k + ei
Table 1 - Breakdown of the Variables
Variable Meaning
a Constant
bj The coefficient for the independent variable to the j’th power
ei Random error term
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 49
REGRESSION ANALYSIS
To look at the growth relationship between weight and age:
1. Open the “Growth.sav” file.
2. Click the Analyze menu, point to Regression, and select Curve Estimation…. The
Curve Estimation dialog box opens to define the parameters of the analysis (see Figure
22).
3. Transfer the “wght” variable to the Dependent(s): box and the “age” variable to the
Independent Variable: box.
NOTE: The weight (dependent) variable is what is being predicted using the age (independent)
variable.
4. Deselect the Plot models check box.
5. Select the Display ANOVA table check box.
6. Under Models, deselect the Linear check box and select the Cubic check box.
7. Click the OK button.
Figure 22 - Curve Estimation Dialog Box
Analyzing the Results
This cubic model has an R2 of 99.567% (see Figure 23). The F-ratio indicates a highly
significant fit. The best fitting cubic polynomial is given by the follow equation:
(Where Yi is weight and Xi is age);
Yi = 0.052 – 0.017 Xi + 0.010 Xi
2 – 0.001 Xi
3 + ei
Multiple regression can be used to fit polynomials of higher order. If X is the dependent variable,
use the Transform and Compute options of the Data Editor (as discussed earlier in this lesson)
to create new variables X2 = X*X, X3 = X*X2, X4 = X*X3, etc., then use these new variables
(X, X2, X3, X4, etc.) as a set of independent variables for a multiple regression analysis.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 50
Figure 23 - Polynomial Regression Summary Results
Chart Editing
During the final stage of research, enhancing the appearance of charts and figures can be very
helpful for readers to understand what may seem to be confusing statistics. This will save the
time and effort to copy and paste an object from one program to another and to modify its
features. The following steps explain some useful methods to enhance the appearance of a chart.
ADDING A LINE TO THE SCATTER PLOT
Adding a straight line to fit the scattered pattern of a data chart can help emphasize the linear
relationship among the data.
To add a line to the scatter plot:
1. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot….
2. Select the Simple Scatter option, and then click the Define button.
3. Transfer the “age” variable to the X Axis: box and the “wght” variable to the Y Axis:
box, and then click the OK button. A chart appears in the Output Viewer window.
4. Double-click the chart in the Output Viewer window to modify it. The Chart Editor
window opens (see Figure 24).
5. Right-click a chart marker (see Figure 25) and select Add Fit Line at Total from the
shortcut menu.
6. Under Fit Method, select the Cubic option, and then click the Apply button.
7. Close the Chart Editor window.
NOTE: Notice that the Add Fit Line at Total does not capture the way the data curves, but the
cubic method is almost a perfect fit (see Figure 26).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 51
Figure 24 - Chart Editor Window
Figure 25 - Chart Markers
Figure 26 - Adding a Fit Line to the Scatter Plot
MANIPULATING THE SCALES ON X- AND Y-AXES
The X-axis and Y-axis can be adjusted to enhance the overall appearance and readability of the
chart. Various elements of the axes can be manipulated, such as scale, ticks and grids, number
format, and axis label.
To manipulate the scales on the X-axis:
1. If necessary, open the “Regression.sav” file.
2. Run the scatter plot where the Y-axis is “Last year sales” and the X-axis is “Years of
experience.”
3. Double-click the chart to open the Chart Editor window.
4. Click the Select the X axis button on the Standard toolbar to manipulate the X-axis.
The Properties dialog box opens.
5. Select the Scale tab (see Figure 27).
6. Change the value in the Lower margin (%): box to 0.
7. Select the Labels & Ticks tab (see Figure 28).
8. In the Major Ticks section, select the Display ticks check box.
9. Click the Style arrow and select Inside from the list.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 52
Figure 27 - X-axis Properties Dialog Box: Scale
Tab
Figure 28 - X-axis Properties Dialog Box: Labels
& Ticks Tab
10. Click the Show Grid Lines button on the Standard toolbar to show the Properties
dialog box.
11. Select the Grid Lines tab, select the Major ticks only option, click the Apply button, and
then click the Close button (see Figure 29).
12. Click the Select the Y axis button on the Standard toolbar to manipulate the Y-axis.
The Properties dialog box opens.
13. Select the Scale tab (see Figure 30).
Figure 29 - Properties Dialog Box: Grid Lines Tab
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 53
Figure 30 - Y-axis Properties Dialog Box: Scale
Tab
14. Change the value in the Lower margin (%:) box to 0.
15. Click the Apply button, and then click the Close button.
Figure 31 - Before Manipulating the X-axis Figure 32 - After Manipulating the X-axis
ADDING A TITLE TO THE CHART
Adding a title to the chart is a simple process that enhances the chart’s appearance.
To add a title to a chart:
1. In the Chart Editor window, click in a blank area outside the first chart to select the
whole chart, then move the mouse pointer to one of the selection handles until it becomes
a two-headed arrow.
2. Drag the mouse pointer to reduce the chart size.
3. Click the Insert a text box button on the Standard toolbar. The text box appears
above the chart and the Properties dialog box opens.
4. Type “Relationship Between Last Year Sales and Years of Experience” in the text box.
5. Click the border of the text box to select it.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 54
6. Select the Text Style tab in the Properties dialog box, select a color for the title text, click
the Apply button, and then click the Close button.
7. Click the Bold button on the Standard toolbar, and change the Font Size to “12.”
8. Resize the text box to fit the text.
9. If necessary, resize the chart to display the title at the top of the chart (see Figure 33).
Figure 33 - Adding a Title to the Chart
ADDING COLORS TO THE CHART
All elements on the chart can be colored differently to add emphasis or distinguish between
elements.
To add colors to a chart:
1. In the Chart Editor window, select the chart element to change or add color to, such as
one of the plots (see Figure 34).
2. Click the Show Properties Window button on the Standard toolbar. The Properties
dialog box opens (see Figure 35).
3. Select the Marker tab, and then select a color from the color palette.
4. To change the marker type, click the Type arrow in the Marker section and select a
symbol from the menu (see Figure 35).
5. View the changes in the Preview section.
6. Click the Apply button, and then click the Close button.
Figure 34 - Adding Color to the Chart
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 55
Figure 35 - Properties Dialog Box
FILLING A BACKGROUND COLOR
The background color can also be filled to make the chart stand out.
To fill in a background color:
1. Click inside a blank area of the chart to select the entire chart area (see Figure 36).
2. Click the Show Properties Window button on the Standard toolbar. The Properties
dialog box opens.
3. Select the Fill swatch .
4. Click the Pattern arrow and select a background pattern.
5. Click the Apply button, and then click the Close button.
Figure 36 - Filling a Background Color
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
Introduction – Part 4
PASW stands for Predictive Analytics Software. This program can be used to analyze data
collected from surveys, tests, observations, etc. It can perform a variety of data analyses and
presentation functions, including statistical analysis and graphical presentation of data. Among
its features are modules for statistical data analysis. These include 1) descriptive statistics, such
as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and
multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,
cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for
survey research, though by no means is it limited to just this topic of exploration.
This handout (Chi-Square and ANOVA) introduces basic skills for performing hypothesis tests
utilizing Chi-Square test for Goodness-of-Fit and generalized pooled t tests, such as ANOVA.
The step-by-step instructions will guide the user in performing “tests of significance” using
PASW Statistics and help the user understand how to interpret the output for research questions.
Downloading the Data Files
This handout includes sample data files that can be used for hands-on practice. The data files are
stored in a self-extracting archive. The archive must be downloaded and executed in order to
extract the data files.
 The data files used with this handout are available for download at For more assistance,
visit www.alukosayoenoch.wix.com/selfcoding
 Instructions on how to download and extract the data files are available at For more
assistance, visit www.alukosayoenoch.wix.com/selfcoding
Chi-Square
The Chi-Square (χ2) test is a statistical tool used to examine differences between nominal or
categorical variables. The Chi-Square test is used in two similar but distinct circumstances:
 To estimate how closely an observed distribution matches an expected distribution – also
known as the Goodness-of-Fit test.
 To determine whether two random variables are independent.
CHI-SQUARE TEST FOR GOODNESS-OF-FIT
This procedure can be used to perform a hypothesis test about the distribution of a qualitative
(categorical) variable or a discrete quantitative variable having only finite possible values. It
analyzes whether the observed frequency distribution of a categorical or nominal variable is
consistent with the expected frequency distribution.
With Fixed Expected Values
Research Question # 1
Can the hospital schedule discharge support staff evenly throughout the week?
A large hospital schedules discharge support staff assuming that patients leave the hospital at a
fairly constant rate throughout the week. However, because of increasing complaints of staff
shortages, the hospital administration wants to determine whether the number of discharges
varies by the day of the week.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 57
H0: Patients leave the hospital at a constant rate (there is no difference between the discharge
rates for each day of the week).
To perform the analysis:
1. Start PASW Statistics 17.
2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens.
3. Navigate to the data files folder, select the “chi-hospital.sav” file, and then click the
Open button.
Before the Chi-Square test is run, the observed values need to be declared.
To declare the observed values:
1. Click the Data menu and select Weight Cases…. The Weight Cases dialog box opens
(see Figure 1).
Figure 1 - Weight Cases Dialog Box
2. Select the Weight cases by option.
3. Select the “Average Daily Discharges [discharge]” variable and transfer it to the
Frequency Variable: box.
4. Click the OK button.
To perform the analysis:
1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square…. The
Chi-Square Test dialog box opens (see Figure 2).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 58
Figure 2 - Chi-Square Test Dialog Box
2. Select the “Day of the Week [dow]” variable and transfer it to the Test Variable List: box
(see Figure 2).
3. Click the OK button. The Output Viewer window opens (see Figure 3).
Figure 3 - Chi-Square Frequencies Output Table
Figure 4 - Chi-Square Test Statistics Output Table
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 59
Reporting the analysis results:
H0: Rejected in favor of H1.
H1: Patients do not leave the hospital at a constant rate.
Explanation: Figure 4 indicates that the calculated χ2 statistic,for six degrees of freedom, is
29.389. Additionally, it indicates that the significance value (0.000) is less than the usual
threshold value of 0.05. This suggests that the null hypothesis, H0 (patients leave the hospital at a
constant rate), can be rejected in favor of the alternate hypothesis, H1 (patients leave the hospital
at different rates during the week).
With Fixed Expected Values and within a Contiguous Subset of Values
By default, the Chi-Square test procedure builds frequencies and calculates an expected value
based on all valid values of the test variable in the data file. However, it may be desirable to
restrict the range of the test to a contiguous subset of the available values, such as weekdays only
(Monday through Friday).
Research Question # 2
The hospital requests a follow-up analysis: can staff be scheduled assuming that
patients discharged on weekdays only (Monday through Friday) leave at a constant
daily rate?
H0: Patients discharged on weekdays only (Monday through Friday) leave at a constant daily
rate.
To run the analysis:
1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square…. The
Chi-Square Test dialog box opens.
2. Select the Use specified range option (see Figure 2).
3. Enter [2] in the Lower: box and [6] in the Upper: box.
4. Click the OK button. The Output Viewer window opens (see Figure 5 and Figure 6).
Notice that the test range is restricted to Monday through Friday.
Figure 5 - Chi-Square (Subset)
Frequencies Output Table
Figure 6 - Test Statistics Output Table
NOTE: The expected values are equal to the sum of the observed values divided by the number of
rows, while the observed values are the actual numbers of patients discharged.
Reporting the analysis results:
H0: Do not reject. Patients discharged on weekdays only (Monday through Friday) leave at a
constant daily rate.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding
Explanation: Figure 5 indicates that on average, about 92 patients were discharged from the
hospital each weekday. The rate for Mondays was below average and the rate for Fridays was
greater than average. Figure 6 indicates that the calculated value of the Chi-Square statistic was
5.822 at four degrees of freedom. Because the significance level (0.213) is greater than the
rejection threshold of 0.05, H0 (patients were discharged at a constant rate on weekdays) could
not be rejected.
Using the Chi-Square test procedure, it was determined that the rate at which patients were
discharged from the hospital was not constant over the course of an average week. This was
primarily due to a greater number of discharges on Fridays and fewer discharges on Sundays.
When the range of the test was restricted to weekdays, the discharge rates appeared to be more
uniform. Staff shortages could be corrected by adopting separate weekday and weekend staff
schedules.
With Customized Expected Values
Research Question # 3
Does first-class mailing provide quicker response time than bulk mail?
A manufacturer tries first-class postage for direct mailings, hoping for faster responses than with
bulk mail. Order takers record how many weeks each order takes after mailing.
H0: First-class and bulk mailings do not result in different customer response times.
Before the Chi-Square test is run, the cases must be weighted. Because this example compares
two different methods, one method must be selected to provide the expected values for the test
and the other will provide the observed values.
To weight the cases:
1. Open the “chi-mail.sav” file.
2. Click the Data menu and select Weight Cases…. The Weight Cases dialog box opens.
3. Select the Weight cases by option.
4. Select the “First Class Mail [fcmail]” variable and transfer it to the Frequency Variable:
box.
5. Click the OK button.
To run the analysis:
1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square…. The
Chi-Square Test dialog box opens.
2. Select the “Week of Response [week]” variable and transfer it to the Test Variable List:
box.
3. Select the Values: option in the Expected Values section.
4. Enter [6] in the Values: box.
5. Click the Add button.
6. Repeat steps 4 and 5, adding the values [15.1], [18], [12], [11.5], [9.8], [7], [6.1], [5.5],
[3.9], [2.1], and [2] (in that order).
7. Click the OK button. The Output Viewer window opens.
NOTE: The expected frequencies in this example are the response percentages that the firm has
historically obtained with bulk mail.
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 61
Figure 7 - First-Class/Bulk Mail Week of Response
Figure 8 - Week of Response Test Statistics
Reporting the analysis results:
H0: Do not reject. There was no statistical difference between customer response times using
first-class mailing and customer response times using bulk mailing.
Explanation: The manufacturer hoped that first-class mail would result in quicker customer
response. As indicated in Figure 7, the first two weeks indicated different response times of four
and seven percentage points, respectively. The question was whether the overall differences
between the two distributions were statistically significant.
The Chi-Square statistic was calculated to be 12.249 at eleven degrees of freedom (see Figure 8).
The significance value (p) associated with the data was 0.345, which was greater than the
threshold value of 0.05. Hence, H0 was not rejected because there was no significant difference
between first-class and bulk mailings. The first-class mail promotion did not result in response
times that were statistically different from standard bulk mail. Therefore, bulk postage was more
economical for direct mailings.
One-Way Analysis of Variance
One-way analysis of variance (One-Way ANOVA) procedures produce an analysis for a
quantitative dependent variable affected by a single factor (independent variable). Analysis of
variance is used to test the hypothesis that several means are equal. This technique is an
extension of the two-sample t test. It can be thought of as a generalization of the pooled t test.
Instead of two populations (as in the case of a t test), there are more than two populations or
treatments.
Research Question # 4
Which of the alloys tested would be appropriate for creating an underwater sensor array?
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 62
To create an underwater sensor array, four different alloys are tested for corrosion resistance.
Five plates of the same size of each alloy are placed underwater for 60 days. After 60 days, the
number of corrosion pits on each plate is measured.
H0: The four alloys exhibit the same kind of behavior and are not different from one another.
To run One-Way ANOVA:
1. Open the “alloy.sav” file.
NOTE: Each case within the One-Way ANOVA data file represents one of the 20 metal plates
(five plates of four different alloys) and is characterized by two variables. One variable assigns a
numeric value to the alloy. The other variable is used to quantify the number of pits on the plate
after being underwater for 60 days (see Figure 9).
Figure 9 - Alloy Data File
2. In Data View, click the Analyze menu, point to Compare Means, and select One-Way
ANOVA…. The One-Way ANOVA dialog box opens (Figure 10).
Figure 10 - One-Way ANOVA Dialog Box
3. Select the “pits” variable from the box on the left and transfer it to the Dependent List:
box (see Figure 10).
4. Select the “Alloy [alloy]” variable from the box on the left and transfer it to the Factor:
box (see Figure 10).
5. Click the Options… button. The One-Way ANOVA: Options dialog box opens (see
Figure 11).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 63
Figure 11 - One-Way ANOVA: Options Dialog Box
6. Select the Descriptive, Homogeneity of variance test, and Means plot check boxes.
7. Click the Continue button.
8. Click the OK button. The Output Viewer window opens.
Figure 12 - ANOVA Descriptive Output
Figure 13 - Output for Test of Homogeneity of Variances
Figure 14 - ANOVA Output
Reporting the analysis results:
H0: Reject in favor of H1.
H1: The four alloys do not exhibit the same kind of behavior. They are statistically different from
one another.
Explanation: Figure 12 lists the means, standard deviations, and individual sample sizes of each
alloy. Figure 13 provides the degrees of freedom and the significance level of the population;
“df1” is one less than the number of sample alloys (4-1=3) and “df2” is the difference between
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 64
the total sample size and the number of sample alloys (20-4=16). Figure 14 lists the sum of the
squares of the differences between means of different alloy populations and their mean square
errors. In Figure 14, the “Between Groups” variation “6026.200” is due to interaction in samples
between groups. If sample means are close to each other, this value is small. The “Within
Groups” variation “335.600” is due to differences within individual samples. The “Mean
Square” values are calculated by dividing each “Sum of Squares” value by its respective degree
of freedom (“df”). The table also lists the F statistic “95.768,” which is calculated by dividing the
“Between Groups Mean Square” by the “Within Groups Mean Square.” The significance level
of “0.000” is less than the threshold value of 0.05 and indicates that the null hypothesis can be
rejected, leading to the conclusion that the alloys are not all the same.
Post Hoc Tests
In ANOVA, if the null hypothesis is rejected, then it is concluded that there are differences
between the means (μ1, μ2,…, μa). It is useful to know specifically where these differences exist.
Post hoc testing identifies these differences. Multiple comparison procedures look at all possible
pairs of means and determine if each individual pairing is the same or statistically different. In an
ANOVA with α treatments, there will be α*(α-1)/2 possible unique pairings, which could mean a
large number of comparisons.
Research Question # 5
Is the mean difference between alloy sets statistically significant?
The previous null hypothesis was rejected, leading to the conclusion that all the alloys do not
exhibit the same behavior. The next part of the analysis is to determine if the mean difference
between individual alloy sets is statistically significant.
H0: μ0 = μ1…= μa
H1: μ0 ≠ μ1 …≠ μa
To run post hoc tests:
1. In Data View, click the Analyze menu, point to Compare Means, and select One-Way
ANOVA…. The One-Way ANOVA dialog box opens (see Figure 15).
Figure 15 - One-Way ANOVA Dialog Box
2. Click the Post Hoc… button. The One-Way ANOVA: Post Hoc Multiple Comparisons
dialog box opens (see Figure 16).
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 65
3. Select the LSD check box, click the Continue button, and then click the OK button. The
Output Viewer window opens.
NOTE: LSD stands for List Square Difference, which compares the means one by one.
Figure 16 - One-Way ANOVA: Post Hoc Multiple Comparisons Dialog Box
Figure 17 - Multiple Comparisons Output
Figure 18 - Means Plot
Reporting the analysis results:
For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 66
H0: Reject in favor of H1.
H1: At least one of the means is different.
Explanation: Figure 17 shows the results of comparing pairs of means between different alloy
sets. Each row indicates the difference between the two corresponding treatments. Alloys “1”
and “4” have a mean difference of “2.4” (a relatively small value). Also, the significance level of
“0.420” indicates that the null hypothesis cannot be rejected for the comparison of alloys “1” and
“4.”
There is no statistically significant difference between them. Alloy pairs “1” and “2,” “1” and
“3,” “2” and “3,” “2” and “4,” and “3” and “4” have large mean differences with significance
values of “0.000.” In these cases, the null hypothesis can be rejected, leading to the conclusion
that they are statistically different. Also, the means plot (see Figure 18) shows that both alloys
“1” and “4” have average mean values of pits very close to each other. Because alloys “1” and
“4” have the lowest mean number of corrosion pits, they are the best candidates for the array.
Depending on the relative costs of the two alloys, the one that is more cost effective can be
selected to construct the array.
Two-Way Analysis of Variance
Two-way analysis of variance (Two-Way ANOVA) is an extension to the one-way analysis of
variance. The difference is that instead of running the test by using a single independent variable,
two or more independent variables can be used to run the test in two-way analysis of variance.
There are several advantages in using several variables over using a one variable design. Some of
the advantages are a two-variable design ANOVA is more efficient and it also helps increase
statistical power of the result.
Research Question # 6
Will typing ability and test method affect student test scores?
To answer the question, an essay final is given to the class. Two test methods are used – half the
students are assigned to write the final with a blue-book and the other half with notebook
computers. In addition, the students are partitioned into three groups, namely: no typing ability,
some typing ability, and highly skilled at typing. After evaluating the final, the mean score of
each group is examined.
H0: Typing ability and test method do not affect student test scores.
H1: Typing ability and test method do affect student test scores.
To run Two-Way ANOVA:
1. Open the “Two-Way-ANOVA.sav” file (see Figure 19).
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis
SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis

More Related Content

Viewers also liked (14)

Enseñar o el oficio de aprender.
Enseñar o el oficio de aprender.Enseñar o el oficio de aprender.
Enseñar o el oficio de aprender.
 
1
11
1
 
1
11
1
 
1
11
1
 
1
11
1
 
1
11
1
 
Cue 005453
Cue 005453Cue 005453
Cue 005453
 
Thomas Corporate Presentation
Thomas Corporate PresentationThomas Corporate Presentation
Thomas Corporate Presentation
 
1
11
1
 
Laia rodriguez
Laia rodriguezLaia rodriguez
Laia rodriguez
 
Наша планета земля. чому на земі бувають день і ніч
Наша планета земля. чому на земі бувають день і нічНаша планета земля. чому на земі бувають день і ніч
Наша планета земля. чому на земі бувають день і ніч
 
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
 
Ancient Rome: Political Evolution
Ancient Rome: Political EvolutionAncient Rome: Political Evolution
Ancient Rome: Political Evolution
 
Guía Educación Ecoeficiencia.
Guía Educación Ecoeficiencia.Guía Educación Ecoeficiencia.
Guía Educación Ecoeficiencia.
 

Similar to SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis

SPSS statistics - how to use SPSS
SPSS statistics - how to use SPSSSPSS statistics - how to use SPSS
SPSS statistics - how to use SPSScsula its training
 
2014-07-30 defense in depth scap workbook
2014-07-30 defense in depth scap workbook2014-07-30 defense in depth scap workbook
2014-07-30 defense in depth scap workbookShawn Wells
 
Business Analytics Portfolio - Hannah Forsythe
Business Analytics Portfolio - Hannah ForsytheBusiness Analytics Portfolio - Hannah Forsythe
Business Analytics Portfolio - Hannah ForsytheHannah Forsythe
 
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...
The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...IBM India Smarter Computing
 
Ibm spss bootstrapping
Ibm spss bootstrappingIbm spss bootstrapping
Ibm spss bootstrappingDũ Lê Anh
 
Access 2007 Notes-All Chapters.pdf
Access 2007 Notes-All Chapters.pdfAccess 2007 Notes-All Chapters.pdf
Access 2007 Notes-All Chapters.pdfMandy Brown
 
Energy sector cybersecurity framework implementation guidance final 01-05-15
Energy sector cybersecurity framework implementation guidance final 01-05-15Energy sector cybersecurity framework implementation guidance final 01-05-15
Energy sector cybersecurity framework implementation guidance final 01-05-15Dr Dev Kambhampati
 
SPi Global Services Overview
SPi Global Services OverviewSPi Global Services Overview
SPi Global Services Overviewbloevens
 
Conference proceedings 2011 AEGIS International Workshop and Conference
Conference proceedings 2011 AEGIS International Workshop and ConferenceConference proceedings 2011 AEGIS International Workshop and Conference
Conference proceedings 2011 AEGIS International Workshop and ConferenceAEGIS-ACCESSIBLE Projects
 
Critical Success Factors in Implementation of ERP Systems
Critical Success Factors in Implementation of ERP SystemsCritical Success Factors in Implementation of ERP Systems
Critical Success Factors in Implementation of ERP SystemsStephen Coady
 
Intrusion Detection on Public IaaS - Kevin L. Jackson
Intrusion Detection on Public IaaS  - Kevin L. JacksonIntrusion Detection on Public IaaS  - Kevin L. Jackson
Intrusion Detection on Public IaaS - Kevin L. JacksonGovCloud Network
 
Aiim Industry Watch: Content Analytics: automating processes and extracting ...
Aiim Industry Watch:  Content Analytics: automating processes and extracting ...Aiim Industry Watch:  Content Analytics: automating processes and extracting ...
Aiim Industry Watch: Content Analytics: automating processes and extracting ...Swiss Post Solutions
 
Hadoop as an extension of DW
Hadoop as an extension of DWHadoop as an extension of DW
Hadoop as an extension of DWSidi yazid
 
Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0Adnan Khalid
 
VeraCode State of software security report volume5 2013
VeraCode State of software security report volume5 2013VeraCode State of software security report volume5 2013
VeraCode State of software security report volume5 2013Cristiano Caetano
 
DISC 2016 Final proceedings
DISC 2016 Final proceedingsDISC 2016 Final proceedings
DISC 2016 Final proceedings은경 김
 

Similar to SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis (20)

SPSS statistics - how to use SPSS
SPSS statistics - how to use SPSSSPSS statistics - how to use SPSS
SPSS statistics - how to use SPSS
 
2014-07-30 defense in depth scap workbook
2014-07-30 defense in depth scap workbook2014-07-30 defense in depth scap workbook
2014-07-30 defense in depth scap workbook
 
Cube_it!_software_report_for_IMIS
Cube_it!_software_report_for_IMISCube_it!_software_report_for_IMIS
Cube_it!_software_report_for_IMIS
 
Business Analytics Portfolio - Hannah Forsythe
Business Analytics Portfolio - Hannah ForsytheBusiness Analytics Portfolio - Hannah Forsythe
Business Analytics Portfolio - Hannah Forsythe
 
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...
The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...The Analytics Revolution 2011:  Optimizing Reporting and Analytics to  Make A...
The Analytics Revolution 2011: Optimizing Reporting and Analytics to Make A...
 
Ibm spss bootstrapping
Ibm spss bootstrappingIbm spss bootstrapping
Ibm spss bootstrapping
 
Access 2007 Notes-All Chapters.pdf
Access 2007 Notes-All Chapters.pdfAccess 2007 Notes-All Chapters.pdf
Access 2007 Notes-All Chapters.pdf
 
Open ERP comparision
Open ERP comparisionOpen ERP comparision
Open ERP comparision
 
Estado Del Arte Supply Chain
Estado Del Arte Supply ChainEstado Del Arte Supply Chain
Estado Del Arte Supply Chain
 
Energy sector cybersecurity framework implementation guidance final 01-05-15
Energy sector cybersecurity framework implementation guidance final 01-05-15Energy sector cybersecurity framework implementation guidance final 01-05-15
Energy sector cybersecurity framework implementation guidance final 01-05-15
 
SPi Global Services Overview
SPi Global Services OverviewSPi Global Services Overview
SPi Global Services Overview
 
Conference proceedings 2011 AEGIS International Workshop and Conference
Conference proceedings 2011 AEGIS International Workshop and ConferenceConference proceedings 2011 AEGIS International Workshop and Conference
Conference proceedings 2011 AEGIS International Workshop and Conference
 
Critical Success Factors in Implementation of ERP Systems
Critical Success Factors in Implementation of ERP SystemsCritical Success Factors in Implementation of ERP Systems
Critical Success Factors in Implementation of ERP Systems
 
Intrusion Detection on Public IaaS - Kevin L. Jackson
Intrusion Detection on Public IaaS  - Kevin L. JacksonIntrusion Detection on Public IaaS  - Kevin L. Jackson
Intrusion Detection on Public IaaS - Kevin L. Jackson
 
Montero thesis-project
Montero thesis-projectMontero thesis-project
Montero thesis-project
 
Aiim Industry Watch: Content Analytics: automating processes and extracting ...
Aiim Industry Watch:  Content Analytics: automating processes and extracting ...Aiim Industry Watch:  Content Analytics: automating processes and extracting ...
Aiim Industry Watch: Content Analytics: automating processes and extracting ...
 
Hadoop as an extension of DW
Hadoop as an extension of DWHadoop as an extension of DW
Hadoop as an extension of DW
 
Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0Master guide-ehp6for erp6.0-ehp3fornw7.0
Master guide-ehp6for erp6.0-ehp3fornw7.0
 
VeraCode State of software security report volume5 2013
VeraCode State of software security report volume5 2013VeraCode State of software security report volume5 2013
VeraCode State of software security report volume5 2013
 
DISC 2016 Final proceedings
DISC 2016 Final proceedingsDISC 2016 Final proceedings
DISC 2016 Final proceedings
 

More from Aluko Sayo Enoch

Approximation and estimation
Approximation and estimationApproximation and estimation
Approximation and estimationAluko Sayo Enoch
 
Matching heading study tips for ielts
Matching heading study tips for ieltsMatching heading study tips for ielts
Matching heading study tips for ieltsAluko Sayo Enoch
 
Ielts general writing upload
Ielts general writing uploadIelts general writing upload
Ielts general writing uploadAluko Sayo Enoch
 
Descriptive output of careere management and succession planning
Descriptive output of careere management and succession planningDescriptive output of careere management and succession planning
Descriptive output of careere management and succession planningAluko Sayo Enoch
 

More from Aluko Sayo Enoch (9)

Congruence and similarity
Congruence and similarityCongruence and similarity
Congruence and similarity
 
Vectors
VectorsVectors
Vectors
 
Approximation and estimation
Approximation and estimationApproximation and estimation
Approximation and estimation
 
SET THEORY
SET THEORYSET THEORY
SET THEORY
 
Matching heading study tips for ielts
Matching heading study tips for ieltsMatching heading study tips for ielts
Matching heading study tips for ielts
 
Ielts general writing upload
Ielts general writing uploadIelts general writing upload
Ielts general writing upload
 
Algebra prep5 equation
Algebra prep5 equationAlgebra prep5 equation
Algebra prep5 equation
 
Frequency table
Frequency tableFrequency table
Frequency table
 
Descriptive output of careere management and succession planning
Descriptive output of careere management and succession planningDescriptive output of careere management and succession planning
Descriptive output of careere management and succession planning
 

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

SPSS Workshop Training Guide for Descriptive Statistics and Data Analysis

  • 1. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding BRAINLINK EDUCATIONAL SERVICES IN CONJUNCTION WITH BRIGHT AND DELIGHT GLOBAL CONSULT PRESENT (SPSS 16 WORKSHOP TRAINNING) www.alukosayoenoch.wix.com/selfcoding Version 1.0 winter 2015 Table of Contents Introduction – Part 1...............................................................................................4 Downloading the Data Files....................................................................................4 Starting PASW Statistics ........................................................................................4 The PASW Statistics Window................................................................................5 Data View .................................................................................................................5 Variable View ..........................................................................................................6 Creating a Data File ................................................................................................6 Defining Variables...................................................................................................6 Data Entry................................................................................................................8 Descriptive Statistics ...............................................................................................9 Frequency Analysis .................................................................................................9 Crosstabs ................................................................................................................11 Data Manipulation ................................................................................................12 Select Cases ............................................................................................................12 Splitting a File........................................................................................................14 Find and Replace...................................................................................................15 Reporting................................................................................................................16 Appendix ................................................................................................................17 Introduction – Part 2.............................................................................................18 Downloading the Data Files..................................................................................18 Null Hypothesis......................................................................................................18 Statistical Tests ......................................................................................................19
  • 2. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 2 Tests of Significance ..............................................................................................19 Correlations ...........................................................................................................19 Paired-Samples T Test ..........................................................................................20 Independent-Samples T Test................................................................................22 Multiple Response Sets .........................................................................................23 Multiple Response Frequencies............................................................................24 Multiple Response Crosstabs ...............................................................................25 Data Manipulation ................................................................................................27 Copying and Pasting Variable Properties...........................................................27 Inserting Variables and Cases..............................................................................29 Deleting Variables and Cases ...............................................................................30 Merging Data Files ................................................................................................30 Creating the Data File for Merging .....................................................................30 Inputting the Data in Variable View ...................................................................30 Merging the Data Files..........................................................................................32 Appendix ................................................................................................................35 Introduction – Part 3.............................................................................................37 Downloading the Data Files..................................................................................37 Simple Regression..................................................................................................37 Scatter Plot.............................................................................................................37 Predicting Values of Dependent Variables .........................................................39 Predicting This Year’s Sales with Simple Regression Model............................41 Multiple Regression...............................................................................................43 Predicting Values of Dependent Variables .........................................................43 Predicting This Year’s Sales with Multiple Regression Model.........................45 Data Transformation ............................................................................................46 Computing..............................................................................................................46
  • 3. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 3 Polynomial Regression..........................................................................................47 Regression Analysis...............................................................................................48 Analyzing the Results............................................................................................48 Chart Editing .........................................................................................................49 Adding a Line to the Scatter Plot.........................................................................49 Manipulating the Scales on X- and Y-axes .........................................................50 Adding a Title to the Chart ..................................................................................52 Adding Colors to the Chart ..................................................................................53 Filling a Background Color..................................................................................54 Introduction – Part 4.............................................................................................55 Downloading the Data Files..................................................................................55 Chi-Square .............................................................................................................55 Chi-Square Test for Goodness-of-Fit ..................................................................55 With Fixed Expected Values ................................................................................55 With Fixed Expected Values and within a Contiguous Subset of Values ........58 With Customized Expected Values......................................................................59 One-Way Analysis of Variance ............................................................................60 Post Hoc Tests........................................................................................................63 Two-Way Analysis of Variance............................................................................65 Importing/Exporting Microsoft Excel and PowerPoint.....................................68 Using Scripting for Redundant Statistical Analyses ..........................................71
  • 4. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 4 Introduction – Part 1 PASW stands for Predictive Analytics Software. This program can be used to analyze data collected from surveys, tests, observations, etc. It can perform a variety of data analyses and presentation functions, including statistical analysis and graphical presentation of data. Among its features are modules for statistical data analysis. These include 1) descriptive statistics, such as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis, cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for survey research, though by no means is it limited to just this topic of exploration. This handout (Descriptive Statistics) introduces basic skills necessary to run PASW Statistics. It includes how to create a data file and run descriptive statistics. It is especially tailored to answer three research questions formulated in the sample survey questionnaire, eventually giving users an overview of how PASW Statistics can be used for survey research. The three research questions formulated in the sample survey are as follows: 1. What kind of computer do people prefer to own? 2. What color do people prefer for their computer? 3. Is computer color preference different between genders? Downloading the Data Files This handout includes sample data files that can be used for hands-on practice. The data files are stored in a self-extracting archive. The archive must be downloaded and executed in order to extract the data files.  The data files used with this handout are available for download at www.alukosayoenoch.wix.com/selfcoding  Instructions on how to download and extract the data files are available at www.alukosayoenoch.wix.com/selfcoding Starting PASW Statistics The following steps are for starting PASW Statistics 17 using the computers in the Open Access Labs (OALs). The steps for starting the program at home or on other computers may be slightly different. To start PASW Statistics 17: 1. Click the Start button, point to All Programs, point to Course Work, point to SPSS Inc, point to PASW Statistics 17, and select PASW Statistics 17. The PASW Statistics 17 dialog box opens (see Figure 1). 2. Click the Cancel button to create a new data file. Figure 1 - PASW Statistics 17 Dialog Box
  • 5. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding The PASW Statistics Window The Data Editor window opens with two view tabs: Data View and Variable View. The Data View is used for data input, and the Variable View is used for adding variables and defining variable properties (e.g., modifying attributes of variables). As displayed in Figure 2, the Data Editor window includes several components. The Title bar displays the name of the current file and the application. The Menu bar allows you to access various commands that are grouped according to function. The Toolbar provides shortcuts to commonly used menu commands. Figure 2 - PASW Statistics Data Editor Window DATA VIEW When PASW Statistics is launched, the Data Editor window opens in Data View, which looks similar to a Microsoft Excel spreadsheet (which is just an array of rows and columns). The difference is that the rows and columns in Data View are referred to as cases and variables, respectively (see Table 1). Table 1 - Elements in Data View Element Description Variable Each column represents a variable. Any survey questionnaire item or test item can be a variable. Commonly defined variable types are numeric or string. When defining variables as numeric, users need to specify decimal places. Variable names can be up to 256 characters long and must start with a letter. Make variable names meaningful and easily recognizable. Case Each row represents a case. The participants in the study can be cases. For example, if 100 participants are involved in your study, then 100 cases (or rows) of information should be generated. Responses to the question items should be entered consistently from left to right for each participant.
  • 6. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 6 Cell A cell is an intersection between cases and variables. Each response to a survey question should be entered in a cell for each participant according to the defined variable data types. VARIABLE VIEW Variable View is where variables are defined by assigning variable names and specifying the attributes, such as data type (“String,” “Date,” “Numeric,” etc.), value labels, and measurement scales (“Nominal,” “Ordinal,” or “Scale”). Users can think of Variable View as the backbone structure for the Data View; data cannot be entered nor viewed without first defining variables in Variable View (see Table 2). Table 2 - Elements in Variable View Element Description Variable Name PASW Statistics will initially give a default variable name (var00001) that users can change. It is recommended to assign a brief and meaningful name to variables (e.g., “Name,” “Gender,” and “GPA”). Variable Type The variable type determines how the cases are entered. Generally, text- based characters are of “String” type and number-based characters are of “Numeric” type. For example, if a user has a variable called “Name,” then its variable type should be “String.” Similarly, a variable named “GPA” should be a “Numeric” type with (normally two) decimal places. Value Labels Value labels allow users to describe what the variable name stands for. For example, if a variable has been defined as “Fav,” most likely others may not know what it stands for. To avoid misinterpretation, value labels can be utilized to clearly define variable names. Creating a Data File Creating a new PASW Statistics data file consists of two stages: (1) defining variables and (2) entering the data. Defining the variables involves multiple processes and requires careful planning. Once the variables have been defined, the data can then be added. DEFINING VARIABLES First, variable names based on your research questionnaire need to be assigned. If variable names are not assigned, PASW Statistics will assign default names that may not be recognizable. Second, the Type attribute should be specified for each variable. If necessary, assign labels to values to help all users of the file understand the data better. To define variables (example): 1. Click the Variable View tab at the lower left corner of the Data Editor window (see Figure 3). 2. Type [Name] in the first cell under the Name column and press the [Enter] key. 3. Under the Type column, click the ellipses button . The Variable Type dialog box opens (see Figure 4). 4. Select the String option. 5. Click the OK button.
  • 7. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 7 Figure 3 - Variable View Tab Figure 4 - Variable Type Dialog Box 6. Type [Gender] in row two under the Name column. 7. Activate the cell in row two under the Decimals column and change the entry to “0” using the spin box. 8. Type [What is your gender?] in row two under the Label column. 9. Click the ellipses button in row two under the Values column. The Value Labels dialog box opens (see Figure 5). 10. Type [1] in the Value: box. 11. Type [female] in the Label: box. 12. Click the Add button. 13. Repeat steps 10-12 using a value of [2] and a label of [male]. Figure 5 - Value Labels Dialog Box (Gender) 14. Click the OK button. 15. Type [GPA] in row three under the Name column and press the [Enter] key. 16. Type [Age] in row four under the Name column. 17. Click row four under the Decimals column and change the entry to “0” using the spin box. 18. Type [What is your age?] in row four under the Label column. 19. In row four under the Values column, click the ellipses button. The Value Labels dialog box opens (see Figure 6). 20. Type [1] in the Value: box. 21. Type [19 or younger] in the Label: box. 22. Click the Add button.
  • 8. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 23. Repeat steps 20-22 for values [2] through [5] and label them as shown in Table 3 (you may also refer back to the sample questionnaire). See Figure 6 for the results. 24. Click the OK button. Table 3 - Value Labels Value Label 2 20-23 3 24-27 4 28-31 5 32 or over Figure 6 - Value Labels Dialog Box (Age) DATA ENTRY After defining the variables, users can enter data for each case. If variables are defined as having a “Numeric” data type, then numeric data should be entered. PASW Statistics will only accept numeric digits (0-9) for a “Numeric” data type. If variables are defined as “String” data, any keyboard character can be entered. To enter data: 1. Click the Data View tab at the lower left corner of the Data Editor window (see Figure 7). 2. Click in a cell and type the corresponding data. The entry will also appear in the Cell Editor (see Figure 8). Figure 7 - Data View Tab Cell EditorCell Editor Figure 8 - Data Entry
  • 9. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Descriptive Statistics After data has been entered, users may begin analyzing the data by using descriptive statistics. Descriptive statistics are the most commonly used statistics for summarizing data frequency or measures of central tendency (mean, median, and mode). Research Question # 1 What kind of computer do people prefer to own? FREQUENCY ANALYSIS We can use frequency analysis to answer the first research question. Frequency analysis is a descriptive statistical method that shows the number of occurrences of each response chosen by the respondents. When using frequency analysis, PASW Statistics can also calculate the mean, median, and mode to help users analyze the results and draw conclusions. The following example will use a frequency analysis to answer “Research Question # 1: What kind of computer do people prefer to own?” using the data collected from our sample survey (see Appendix). To perform frequency analysis: 1. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens. 2. Locate and open the “Part 1.sav” file. 3. Click the Analyze menu, point to Descriptive Statistics, and select Frequencies… (see Figure 9). The Frequencies dialog box opens (see Figure 10). 4. Select the variable(s) desired to be analyzed. In this case, select the variable “Computer Owned” from the list box on the left. 5. Click the transfer arrow button . The selected variable is moved to the Variable(s): list box. 6. Select the Display frequency tables check box if necessary. Figure 9 - Frequency Analysis from Analyze Menu Figure 10 - Frequencies Dialog Box 7. Click the Statistics… button. The Frequencies: Statistics dialog box opens (see Figure 11). 8. Select the Mean, Median, and Mode check boxes in the Central Tendency section; select the Std. deviation check box in the Dispersion section.
  • 10. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Figure 11 - Frequencies: Statistics Dialog Box 9. Click the Continue button. This returns you to the Frequencies dialog box. 10. Click the OK button. An Output Viewer window opens and displays the statistics and frequency table (see Figure 12). The columns of the table “Computer Owned” display the “Frequency,” “Percent,” “Valid Percent,” and “Cumulative Percent” for each different type of computer owned. Figure 12 - Frequencies Output The measures of central tendency (mean, median, and mode) can be used to summarize various types of data. Mode can be used for nominal data, such as computer type, computer color, ethnicity, etc. Mean or median can be used for interval/ratio data, such as test scores, age, etc. The mean is also useful for data with a skewed distribution. Answer to Research Question # 1 What kind of computer do people prefer to own?
  • 11. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 11 Answer: IBM or Compatible Explanation: Look at question # 7 in the Sample Survey. Notice that option # 3 is “IBM or Compatible.” In the output “Statistics” table, the mode for “Computer Owned” is “3,” which is “IBM or Compatible.” In addition, the frequency analysis results for “Computer Owned” indicates that 49 out of 80 people own an “IBM or Compatible” computer. This can be considered their preference. Research Question # 2 What color do people prefer for their computer? CROSSTABS Crosstabs are used to examine the relationship between two variables. To answer the second research question, users will need to analyze two variables: “Computer Owned” and “Color” (which indicates color preference). Using crosstabs will show the intersection between these two variables and reveal the computer type and color preferred by most people. To perform a crosstabs analysis: 1. In Data View, click the Analyze menu, point to Descriptive Statistics, and select Crosstabs… (see Figure 13). The Crosstabs dialog box opens. 2. Select the variable “Computer Owned” from the list box on the left. 3. Click the transfer arrow button to move it to the Row(s): list box. 4. Select the variable “color” (see Figure 14). 5. Click the transfer arrow button to move it to the Column(s): list box. 6. Click the OK button. An Output Viewer window opens and displays two tables: “Case Processing Summary” and the “Crosstabulation” matrix (see Figure 15). Figure 13 - Crosstab Analysis from Analyze Menu Figure 14 - Crosstabs Dialog Box
  • 12. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Figure 15 - Crosstabs Output Answer to Research Question # 2 What color do people prefer for their computer? Answer: IBM or Compatible in beige color Explanation: As shown in the “Cross tabulation” matrix above, “IBM or Compatible” is the most preferred computer type from the row variable (“Computer Owned”). From the column variable (“color”), “beige” is shown as the most preferred color. Therefore, you can conclude that most people prefer “IBM or Compatible” computers that are in “beige” color. Data Manipulation Data files are not always ideally organized in a form to meet specific needs. For example, users may wish to select a specific subject or split the data file into separate groups for analysis. SELECT CASES If you have two or more subject groups in your data and you want to analyze each subject in isolation, you can use the select cases option. For example, the data we are currently analyzing has both male and female participants. However, if you wish to analyze only female cases, then you select “Gender” cases and set the condition for female cases only. To select cases for analysis: 1. Click the Data menu and select Select Cases… (see Figure 16). The Select Cases dialog box opens (see Figure 17). 2. Click the If condition is satisfied option. 3. Click the If… button. The Select Cases: If dialog box opens. 4. Select the variable “Gender” in the left list box. 5. Click the transfer arrow button to move it to the right text box. 6. Click the = button . 7. Click the 1 button . 8. Click the Continue button. This takes you back to the Select Cases dialog box. 9. Click the OK button. This takes you back to Data View. All males will be excluded from the statistical analysis. 10. Rerun the crosstabs analysis by following steps 1-6 of the Crosstabs section of this handout. 11. Click the OK button. The Output Viewer window updates (see Figure 18).
  • 13. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 13 Figure 16 - Select Cases from Data Menu Figure 17 - Select Cases Dialog Box From the cross tabulation in the Output Viewer window in Figure 18 below, look at the column for the most preferred color and the row for the computer types. Since we selected only female cases, what is the computer color most preferred by women? Ten women chose “IBM or Compatible” with color option “5.” Thus, you may conclude that most female participants prefer the color “5” for “IBM or Compatible” computers. However, what does “5” represent? This problem arose by not labeling the variable value “5” as “Other.” Moreover, even if it were labeled “Other,” it does not indicate any particular color, making it difficult to draw a conclusion. In order to avoid such problems, it is suggested that you provide a blank space where participants can specify “Other” color preferences besides the ones specified in the survey questionnaire. Figure 18 - Select Cases Output Example: What kind of color do you like to have for your computer? 1. Beige 2.Black 3.Gray 4.White 5.Other __________ Research Question # 3
  • 14. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Is computer color preference different between genders? SPLITTING A FILE To answer the third research question, we need to split the file. You can analyze one particular group of subjects using the select cases option. However, if you wish to compare the response or performance differences by groups within one variable, it is best to use the split files option. To split a file for analysis: 1. Turn off the select cases option. 2. Click the Data menu and select Select Cases…. The Select Cases dialog box opens. 3. Select the All cases option. 4. Click the OK button. Notice that the male cases that were excluded are now all included in the data file. 5. Select the Data menu and select Split File…. (see Figure 19). The Split File dialog box opens (see Figure 20). Figure 19 - Split File from Data Menu Figure 20 - Split File Dialog Box 6. Select the variable “Gender” from the left list box. 7. Select the Compare groups option. 8. Click the transfer arrow button to move the variable “Gender” to the Groups Based on: list box. 9. Click the OK button. 10. Rerun the crosstabs analysis by following steps 1-6 of the Crosstabs section of this handout. 11. Click the OK button. The Output Viewer window crosstabulation table opens (see Figure 21).
  • 15. For additional SPSS help, visit http://www.youtube.com/mycsula. Figure 21 - Split File Output Data Answer to Research Question # 3 Is computer color preference different between genders? Answer: Yes Explanation: There is a computer color preference difference based on gender. From the crosstabulation output, females prefer “IBM or Compatible” of “Other” color over the colors beige, black, gray, or white. The male group prefers “IBM or Compatible” of “black” color. FIND AND REPLACE In PASW Statistics, the Find and Replace function is more efficient to use. Users can use Find and Replace in Data View. However, only the Find function is available for users in Variable View. To use the Find and Replace function: 1. Click the Edit menu and select Find…. The Find and Replace dialog box opens (see Figure 22). 2. In the Find: box, type [Clinton]. 3. Select the Replace check box to replace ‘Clinton’ with another word. 4. Click in the Replace with: box, and type the name [Cliff]. 5. Click the Show Options button. 6. Under Match to, select the Entire cell option. 7. Click the Replace All button. Figure 22 - Find and Replace Dialog Box (Data View)
  • 16. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 16 NOTE: Under the Match to section of the Find and Replace dialog box (see Figure 22), Contains means PASW Statistics will find each instance of the word/phrase/number appearing in a cell, whether or not it is the only information enclosed. The Entire cell option will find the word/phrase/number that matches the entire cell as a whole. Selecting the Begins with and Ends with options will search the character indicated by the user. Reporting Once the statistical analysis is complete, the final step is to create a report. In the report, you may include PASW Statistics output (e.g., graphs and tables) for supporting your analysis. Using the Copy and Paste functions, the tables/graphs generated in PASW Statistics can be copied from the Output Viewer window and pasted into a Microsoft Word document without having to create new tables or graphs. To create a report using Microsoft Word: 1. In the Output Viewer window, right-click a table. A box appears around the table and a red arrow to the left of the table (which means it is selected). 2. Select Copy from the shortcut menu. 3. Open Microsoft Word. 4. Right-click in the Word document and select Paste from the shortcut menu. The table is copied into the Word document.
  • 17. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 17 Appendix SAMPLE SURVEY Research Questions 1. What kind of computer do people prefer to own? 2. What color do people prefer for their computer? 3. Is computer color preference different between genders? Survey Questions 1. What is your name? ____________________________ 2. What is your gender? ____________________________ 3. What is your G.P.A.? ____________________________ 4. What is your age? 1. 19 or younger 2. 20-23 3. 24-27 4. 28-31 5. 32 or over 5. How much do you make in a month? 1. Less than $1000 2. $1000–$1499 3. $1500–$1999 4. $2000–$2499 5. Over $2500 6. What is your class standing? 1. Freshman 2. Sophomore 3. Junior 4. Senior 5. Graduate 7. What kind of computer do you own? 1. Toshiba 2. Apple 3. IBM or Compatible 4. Other 5. None 8. What kind of computer have you used? 1. IBM or Compatible 2. Apple 3. Toshiba 4. Other 5. None 9. What color do you like to have for your computer? 1. Beige 2. Black 3. Gray 4. White 5. Other
  • 18. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 18 Introduction – Part 2 PASW stands for Predictive Analytics Software. This program can be used to analyze data collected from surveys, tests, observations, etc. It can perform a variety of data analyses and presentation functions, including statistical analysis and graphical presentation of data. Among its features are modules for statistical data analysis. These include 1) descriptive statistics, such as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis, cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for survey research, though by no means is it limited to just this topic of exploration. This handout (Test of Significance) introduces 1) several data entry and data manipulation techniques that help you save time, 2) basic skills to perform tests of significance, such as correlations and t tests, and 3) an introduction to multiple response sets. The step-by-step instructions will help you understand how to interpret the output of your tests from data supplied by your research question(s). Follow the steps carefully to get appropriate results. Please note that a slightly different process might yield unexpected and complicated results. This is a continuation of the PASW Statistics Descriptive Statistics handout. Downloading the Data Files This handout includes sample data files that can be used for hands-on practice. The data files are stored in a self-extracting archive. The archive must be downloaded and executed in order to extract the data files.  The data files used with this handout are available for download at For more assistance, visit www.alukosayoenoch.wix.com/selfcoding.  Instructions on how to download and extract the data files are available at For more assistance, visit www.alukosayoenoch.wix.com/selfcoding. Null Hypothesis The null hypothesis (H0) represents a theory that has been presented, either because it is believed to be true or because it is to be used as a basis for an argument. It is a statement that has not been proven. It is also important to realize that the null hypothesis is the statement of no difference. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug (in other words, the new drug exhibits the same behavior as the old drug). The null hypothesis (H0) and the alternative hypothesis (H1) can be stated as: H0: There is no difference between the two drugs. H1: There is a significant difference between the two drugs. Special consideration is given to the null hypothesis. This is due to the fact that the null hypothesis relates to the statement being tested, whereas the alternative hypothesis relates to the statement to be accepted if and when the null is rejected.
  • 19. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 19 The final conclusion, once the test has been carried out, is always given in terms of the null hypothesis. The result is either "Reject H0 in favor of H1" or "Do not reject H0"; the conclusion is never "Reject H1" or "Accept H1." If the conclusion is "Do not reject H0," this does not necessarily mean that the null hypothesis is true. It only suggests that there is no sufficient evidence against H0 in favor of H1. Rejecting the null hypothesis then suggests that the alternative hypothesis may be true. NOTE: The null hypothesis essentially states that the given cases or items under consideration are statistically the same or exhibit the same behavior without any significant difference. The alternate hypothesis states that the given cases exhibit different behavior or that they have a statistically significant difference. Statistical Tests Statistics is a set of mathematical techniques used to summarize research data and determine whether the data supports a proposed hypothesis. PASW Statistics includes tools that can be used to analyze variables and determine the strength and nature of the relationship between two variables and whether the means (averages) of two data sets (samples) are statistically the same or different. Tests of Significance The following examples are sample research questions that can be answered using PASW Statistics analytical methods. CORRELATIONS A correlation is a statistical device that measures strength or degree of a supposed linear association between two or more variables. One of the more common measures used is the Pearson correlation, which estimates a relationship between two interval variables. Research Question # 1 Is there a relationship between academic performance and Internet access? H0: There is no difference between academic performance and Internet access. H1: There is a significant difference between academic performance and Internet access. To run a correlation analysis: 1. Locate and open the “Part 2.sav” file. 2. Click the Analyze menu, point to Correlate, and select Bivariate…. The Bivariate Correlations dialog box opens (see Figure 1). 3. Select the variables “active,” “posttest,” and “gpa” in the list box on the left. 4. Click the transfer arrow button to move them to the Variables: list box. 5. Select the Pearson check box and the Two-tailed option if necessary. 6. Click the OK button. The Output Viewer window opens with a “Correlations” table (see Figure 2).
  • 20. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 20 Figure 1 - Bivariate Correlations Dialog Box Figure 2 - Bivariate Correlations Output Table The Answer to Research Question # 1 Is there a relationship between academic performance and Internet access? Answer: Yes Explanation: As shown in Figure 2 above, the correlation index for the relationship between “active” and “posttest” is 0.476, which is between 0.4-0.7. The correlation index for the relationship between “active” and “gpa” is 0.448, which is between 0.4-0.7. The results from these analyses indicate that there is a moderate, positive relationship between academic performance and Internet access. PAIRED-SAMPLES T TEST A Paired-Samples T Test is used to test if an observed difference between two means is statistically significant. To run a t test, the following assumptions should be met: the data 1) has normal distribution, 2) is a large data set, and 3) has no outliers. If any of these assumptions are not met, then a nonparametric test should be used.
  • 21. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 21 Research Question # 2 Is there an instructional effect taking place in the computer class? H0: There is no influence of using the Internet on academic achievement for this class. H1: There is an influence of using the Internet on academic achievement for this class. The hypothesis is that Internet familiarity cannot influence the academic achievement in the computer class. The variables that reflect academic achievement are “pretest” and “posttest.” To run a Paired-Samples T Test: 1. Click the Analyze menu, point to Compare Means, and select Paired-Samples T Test…. The Paired-Samples T Test dialog box opens (see Figure 3). 2. Select the variables “pretest” and “posttest” in the list box on the left. 3. Click the transfer arrow button to move them to the Paired Variables: list box. 4. Click the OK button. The Output Viewer window opens (see Figure 4). Figure 3 - Paired-Samples T Test Dialog Box The Answer to Research Question # 2 Is there an instructional effect taking place in the computer class? Figure 4 - Paired-Samples T Test Output Table Answer: Yes Explanation: The observed mean difference is -4.5172. Since the value of t is -3.820 at p < .001, the mean difference (-4.5172) between “pretest” and “posttest” is statistically significant.
  • 22. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 22 According to the Sig. of 0.001 (which is less than 0.05), the hypothesis is rejected. Therefore, it can be inferred that there was instructional effect taking place in the computer class. INDEPENDENT-SAMPLES T TEST An Independent-Samples T Test is used to determine the likelihood that two independent data samples came from populations that have identical means. If this were true, then the difference between the means should be equal to zero. The null hypothesis in this case would be that the two means are equal. Two variables are required in the data set. One variable is the measured parameter. Examples include weight, height, or frequency. The second variable divides the data set into two groups. Light and Dark are the groups whose means will be compared. Research Question # 3 Is there a difference in the average number of seedlings grown in the light and those grown in the dark? In this example, 20 Petri dishes each contained 10 celery seeds. Ten of the dishes were kept in the dark for one week; the other 10 were placed under a grow light for the same amount of time. At the end of the week, the number of seeds that sprouted was counted in each dish. H0: Variance (light) = variance (dark). H1: Variance (light) ≠ variance (dark). H0: There is no difference between seedlings under the light and in the dark ( (light) =  (dark)). H1: There is sig. difference between seedlings under the light and in the dark (  (light) ≠  (dark)). NOTE: The first set of hypotheses is testing the variance, while the proceeding set is testing for the mean. The variances have to be equal before we can determine if the means are equal. NOTE: Variance: The arithmetic mean of the squared deviations from the mean, which is essentially used to see how far the single samples are from the mean. We need to make sure the variances are equal before we can determine if the means are equal. If the variances are equal, users will be able to move to the T Test. If the variances are not equal, users will have to do more testing. To run the Independent-Samples T Test: 1. Locate and open the “Seedlings.sav” file. 2. In Data View, click the Analyze menu, point to Compare Means, and select Independent-Samples T Test…. The Independent-Samples T Test dialog box opens (see Figure 5). 3. Select the “Seedlings” variable in the list box on the left. 4. Click the transfer arrow button to move the variable to the Test Variable(s): list box. 5. Select the “Treatment” variable in the list box on the left. 6. Click the transfer arrow button to move the variable to the Grouping Variable: list box. 7. Click the Define Groups… button. The Define Groups dialog box opens (see Figure 6). 8. Enter [0] in the Group 1: box, enter [1] in the Group 2: box, and then click the Continue button.
  • 23. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 23 9. Click the OK button. The Output Viewer window opens with several tables, including an Independent-Samples Test table (see Figure 7).
  • 24. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 24 Figure 5 - Independent-Samples T Test Dialog Box Figure 6 - Define Groups Dialog Box The Answer to Research Question # 3 Is there a difference in the average number of seedlings grown in the light and those grown in the dark? Figure 7 - Independent-Samples T Test Output Answer: Yes Explanation: The mean difference in seedlings sprouted between the two treatments (light and dark) was -2.900. The value of t, which is -3.179, was statistically significant (p=0.005). Therefore, the null hypothesis is rejected. Multiple Response Sets Very often, a survey will contain questions where the respondent is allowed to select more than one answer. Managing such questions in PASW Statistics can produce some difficulty. Each response in a multiple response question should be coded as a separate variable and then grouped under a multiple response set of variables. The multiple response set can then be analyzed using frequency counts or crosstabs. To define a multiple response set of variables: 1. Locate and open the “Airlines.sav” file. 2. In Data View, click the Analyze menu, point to Multiple Response, and select Define Variable Sets… (see Figure 8). The Define Multiple Response Sets dialog box opens (see Figure 9).
  • 25. For additional SPSS help, visit http://www.youtube.com/mycsula. Figure 8 - Define Variable Sets from Analyze Menu Figure 9 - Define Multiple Response Sets Dialog Box 3. Select the “American,” “TWA,” “United,” “USAir,” and “Other” airline variables and move them to the Variables in Set: list box. 4. Make sure the Dichotomies option is selected and enter [1] in the Counted value: box. 5. Type [Airlines] in the Name: box. 6. Type [Airline frequency of response] in the Label: box. 7. Click the Add button. The set is created as “$Airlines” and listed in the Multiple Response Sets: list box. 8. Click the Close button. MULTIPLE RESPONSE FREQUENCIES It is possible to obtain the answer by running a frequency analysis for each of the airline variables. The result of such an analysis will only provide an overall raw frequency for each response and will not allow percentage comparisons between the different airlines. A frequency analysis that uses a multiple response set will provide an appropriate response with concise output. Research Question # 4 In a survey of airline passengers, which airline was selected as having been flown most often in the previous six months? To analyze the frequency of response for each variable in a multiple response set: 1. Click the Analyze menu, point to Multiple Response, and select Frequencies…. The Multiple Response Frequencies dialog box opens (see Figure 10). 2. Select the multiple response set labeled “$Airlines” and move it to the Table(s) for: list box. 3. Click the OK button. An Output Viewer window opens with the frequency analysis (see Figure 11).
  • 26. For additional SPSS help, visit http://www.youtube.com/mycsula. Figure 10 - Multiple Response Frequencies Dialog Box The Answer to Research Question # 4 In a survey of airline passengers, which airline was selected as having been flown most often in the previous six months? Figure 11 - Airline Frequency Analysis Output Answer: United Explanation: As seen in the Output Viewer window, there were 18 people surveyed and 44 total responses generated. Of the 44 total responses, United was selected most often with 12 responses (representing 27.3% – the largest portion of the total responses). MULTIPLE RESPONSE CROSSTABS Without the use of a multiple response set, each airline would have to be analyzed against the variable that the passengers used to identify themselves as being afraid of flying. This would require the use of a crosstab analysis. However, the overall results would not allow for easy comparison between each of the airlines. The best way to answer the question would be to include the multiple response set into a crosstab analysis. Research Question # 5 In a survey of airline passengers, which airline was selected most often by those passengers who identified themselves as afraid to fly?
  • 27. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 27 To incorporate a multiple response set into a crosstab analysis: 1. Click the Analyze menu, point to Multiple Response, and select Crosstabs…. The Multiple Response Crosstabs dialog box opens (see Figure 12). Figure 12 - Multiple Response Crosstabs Dialog Box 2. Select the “FearFactor” variable as the Row(s): variable and the “$Airlines” multiple response set as the Column(s): variable. 3. Select the “FearFactor” variable after it is designated as the Row(s): variable. The Define Ranges… button becomes active. 4. Click the Define Ranges… button. The Multiple Response Crosstabs: Define Variable Ranges dialog box opens (see Figure 13). Figure 13 - Multiple Response Crosstabs: Define Variable Ranges Dialog Box 5. Enter [0] in the Minimum: box and [1] in the Maximum: box for the “FearFactor” variable. 6. Click the Continue button. 7. Click the Options… button. The Multiple Response Crosstabs: Options dialog box opens (see Figure 14). 8. Select the Cases option and then click the Continue button. 9. Click the OK button. The Output Viewer window opens with the crosstab results (see Figure 15).
  • 28. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 28 Figure 14 - Multiple Response Crosstabs: Options Dialog Box The Answer to Research Question # 5 In a survey of airline passengers, which airline was selected most often by those passengers who identified themselves as afraid to fly? Figure 15 - Multiple Response Crosstabs Output Answer: USAir Explanation: Of the 18 people surveyed, ten identified themselves as being afraid to fly. Within that group of survey respondents, USAir was the airline selected most often (seven times). Data Manipulation PASW Statistics also provides tools to make data manipulation a simple task. COPYING AND PASTING VARIABLE PROPERTIES Copying and pasting is very useful when the same properties need to be given to different variables. To copy and paste variable properties: 1. Click the File menu, point to New, and select Data. 2. Click the Variable View tab at the lower left corner of the Data Editor window (see Figure 16). Figure 16 - Variable View Tab
  • 29. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 29 3. Type [active] in the first cell under the Name column and press the [Enter] key. 4. Click in the first cell under the Decimals column and decrease the entry to “0.” 5. Click in the first cell under the Values column and click the Ellipses button . The Value Labels dialog box opens (see Figure 17). 6. Type [1] in the Value: box. 7. Type [Strongly Disagree] in the Label: box. 8. Click the Add button. 9. Assign [2], [3], and [4] for [Disagree], [Agree], and [Strongly Agree], respectively, by repeating steps 6-8 for each value added (see Figure 17). Figure 17 - Value Labels Dialog Box 10. Click the OK button. 11. Switch back to Data View (see Figure 18). 12. Click the “active” variable heading to highlight the column. 13. Click the Edit menu and select Copy to copy the properties of the variable “active.” 14. Highlight the number of variables needed to apply the same properties to by clicking on the header of the first variable and dragging the pointer across to the last header (see Figure 19 and Figure 20). 15. Click the Edit menu and select Paste. The copied properties of the variable “active” will be applied to the target variables, and the Data View and Variable View will change (see Figure 21 and Figure 22). Figure 18 - Data View Tab Figure 19 - Selected Variable
  • 30. For additional SPSS help, visit http://www.youtube.com/mycsula. Figure 20 - Selecting Target Variables Figure 21 - Data View Showing New Variables Figure 22 - Variable View Showing New Variables INSERTING VARIABLES AND CASES By using Insert Variable and Insert Cases, variables and cases can be added into any location of the data file in a simple, straightforward manner. Assume that one wants to insert a new variable named “midterm” between “pretest” and “posttest” and use it for test score data. The following instructions describe how to insert a new variable and make it available for “Numeric” data type. To insert a variable: 1. Switch to Data View. 2. Click the “posttest” variable heading to highlight the column. 3. Click the Edit menu and select Insert Variable. A new variable is inserted to the left of the highlighted variable (“posttest”). NOTE: The new variable is created with a default name “VAR00001” which can be changed later. 4. To define the properties of the new variable, double-click the variable heading. The Variable View is activated for the new variable. 5. Type [midterm] in the Name column of the new variable. 6. Change the variable type if desired. In the same manner, it is possible to insert cases in a particular location in Data View. For instance, assume that a case should be inserted between case “10” and “11” for a particular student’s record. By following the instructions below, one case will be inserted after the 10th case. To insert cases (example): 1. Switch to Data View. 2. Click row number “11” to highlight the case. 3. Click the Edit menu and select Insert Cases. A new case is inserted above case “11.”
  • 31. For additional SPSS help, visit http://www.youtube.com/mycsula. DELETING VARIABLES AND CASES Variables and cases can be deleted by using the Delete command. To delete a variable or case: 1. In Data View, click the variable heading or the case number to highlight what will be deleted. 2. Click the Edit menu and select Clear. The variable or case is deleted. Merging Data Files The merging data files function is useful for users who store each of their topics in separate files and eventually need or want to combine them together. This allows users to import data from one file into another as long as both sets of data (from each file) contain a common identifier for each of the cases that the user wishes to combine. An identifier has no meaning other than to distinguish each case from one another, and to identify the correlating cases from the additional data files. This identifier can be a unique value, number, or letter combination to be applied to each case. NOTE: The variables do not have to be the same across data files. CREATING THE DATA FILE FOR MERGING Scenario: A psychological focus group on campus needs to create a file for a longitudinal study for ten students on campus. Each file will have the same students, but four different focal points of study pertaining to each question. Over the five year span of the study, the ten students will be asked twelve questions each year (one a month), and the same questions will be asked each year. At the end of the year, the three files will be combined in an annual questionnaire file to be properly analyzed. The merging data files function can be used to satisfy this requirement. Inputting the Data in Variable View Files must be created first before being merged. To create a data file for merging: 1. Click the File menu, point to New, and select Data. 2. Once the new file has been created, select the Variable View tab. 3. For the first variable, name it [ID] to be your identifier variable, and press the [Enter] key. 4. Change the Type attribute by clicking the ellipses button and selecting the String option from the Variable Type dialog box. 5. Change the width to [10] and click the OK button. 6. Click in the second variable cell, type [January], and press the [Enter] key. 7. Change the Type attribute to String. 8. In the Label attribute, type [What pet would you like to own?] (see Figure 23). 9. Repeat steps 6 through 8 to enter the data in Table 1.
  • 32. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 32 Figure 23 - Define Variables in Variable View Table 1 - Variables for Case Study Month Attribute Type Length Label Attribute February String 10 What is your favorite shape? March String 12 It is 1:30pm, what are you eating? April String 12 What is your preferred beverage? 10. Once this information has been defined in Variable View, switch by clicking the Data View tab to enter the corresponding case information. 11. Enter [Alfred] in case 1 of the ID variable, [Bethel] in case 2 of the ID variable, down to [Jessie] in case 10 of the ID variable. Enter the corresponding information according to Table 2. See Figure 24 for the results. Table 2 - Input Case Information Case ID January February March April 1 Alfred Dog Star Pizza Water 2 Bethel Cat Square Fruit Soda Pop 3 Chris Cat Triangle Veggies Grape Juice 4 Dante Dog Rectangle Sandwich Orange Juice 5 Erica Tiger Oval Chips Aloe Water 6 Fernando Tarantula Circle Calzon Beer 7 Grenadine Dog Octagon Salad White Wine 8 Harold Bees Polygon Soup Naked Juices 9 Isadora Turtle Rhombus PandaExpress V8 Juice 10 Jessie Hamster Oval Egg Salad Lemonade
  • 33. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 33 Figure 24 - Input Case Information 12. Save the file by clicking the File menu and selecting Save. The Save Data As dialog box opens. 13. Select the Desktop as the destination and type [Merge 1] in the File name: text box. 14. Click the Save button. 15. Close the Output Viewer window. MERGING THE DATA FILES To merge data files, all files must have a common variable. The common variable in this case is ID. To merge data files: (First, make sure the files have the same IDs.) 1. Open the files “Merge 2” and “Merge 3” and check for consistency across all of the IDs. 2. Minimize the “Merge 2” and “Merge 3” data files. 3. Once back in the “Merge 1” file, click the Data menu, point to Merge Files, and select Add Variables… (see Figure 25). Figure 25 - Data Menu When Selecting Add Variables
  • 34. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 34 4. The Add Variables to Merge 1.sav dialog box opens. Select the An external PASW Statistics data file option and click the Browse… button (see Figure 26). Figure 26 - Add Variables to Merge 1.sav Dialog Box 5. Locate and select the “Merge 2” data file and click the Open button. 6. Click the Continue button. The Add Variables from Merge 2.sav dialog box opens (see Figure 27). 7. Select the Match cases on key variables in sorted files check box. 8. From the Excluded Variables: list box, select “ID>(+)” (see Figure 27), and using the transfer arrow button , move it to the Key Variables: box. Figure 27 - Add Variable from Merge 2.sav Dialog Box 9. Click the OK button. A warning message dialog box opens (see Figure 28).
  • 35. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 35 Figure 28 - Sorting Warning Dialog Box 10. Click the OK button to close the warning message. The finished product should look like Figure 29. Figure 29 - Merged 1 and 2 Files 11. Repeat steps 3-10 for the “Merge 3” file.
  • 36. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 36 Appendix QUESTIONNAIRE This survey is designed to investigate relationships between Internet access and academic success. It consists of three parts: questions related to the background information of the respondent, questions about Internet use patterns, and several open-ended questions. Please select appropriate answers that best describe your activities on the Internet as truthfully as possible. The results of this study will be used anonymously for the PASW Statistics Part 2: Test of Significance workshop. Background Information 1. Age: ____________________________ 2. Major: ___________________________ 3. G.P.A.: __________________________ 4. Monthly Income: __________________ Internet Access 5. Do you have a computer at home? 1. Yes 2. No 6. Where do you surf on the Internet? (You can circle more than one option for this question.) 1. At school 2. At home 3. At work 4. Other ____________ 7. How long do you stay online per day? 1. Less than 30 minutes 2. 1-2 hours 3. More than two hours Questions 8 through 19 are designed to investigate the frequency and types of activities on the Internet. These questions have a 4 point Likert-scale ranging from strongly disagree to strongly agree. Please circle the option that best describes your activities on the Internet. SD: Strongly Disagree D: Disagree A: Agree SA: Strongly Agree SD D A SA 8. I am a very active Internet surfer. 1 2 3 4 9. I surf the Internet to look for articles for research
  • 37. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 37 papers. 1 2 3 4 SD D A SA 10. I surf the Internet to read current news. 1 2 3 4 11. I use the Internet only to e-mail my friends, family, and professors. 1 2 3 4 12. I surf the Internet to check movie schedules. 1 2 3 4 13. I surf the Internet to look for personal information (e.g., yellow pages). 1 2 3 4 14. I surf the Internet to look for job openings 1 2 3 4 15. I use the Internet to play games. 1 2 3 4 16. I use the Internet to download forms and files (e.g., income tax forms). 1 2 3 4 17. I surf the Internet to improve my computer skills. 1 2 3 4 18. I surf the Internet to purchase books. 1 2 3 4 19. I surf the Internet to purchase other merchandise (e.g., video tapes, clothes, computers). 1 2 3 4 Question 20 is an open-ended question. 20. Are there any other Internet activities that are not included in this survey? If so, please describe them below. ____________________________________________________________________ ____________________________________________________________________ ____________________________________________________________________ ____________________________________________________________________
  • 38. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 38 Introduction – Part 3 PASW stands for Predictive Analytics Software. This program can be used to analyze data collected from surveys, tests, observations, etc. It can perform a variety of data analyses and presentation functions, including statistical analysis and graphical presentation of data. Among its features are modules for statistical data analysis. These include 1) descriptive statistics, such as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis, cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for survey research, though by no means is it limited to just this topic of exploration. This handout (Regression Analysis) provides basic instructions on how to answer research questions and test hypotheses through the use of linear regression (a technique which examines the relationship between a dependent variable and a set of independent variables). The value of the dependent variable (e.g., salesperson’s total annual sales) can be predicted based on its relationship to the independent variables used in the analysis (e.g., age, education, and years of experience). The two research questions proposed for this workshop are as follows: 1. How much will each salesperson make this year? 2. Who will qualify for a $1,000 bonus? Downloading the Data Files This handout includes sample data files that can be used for hands-on practice. The data files are stored in a self-extracting archive. The archive must be downloaded and executed in order to extract the data files.  The data files used with this handout are available for download at For more assistance, visit www.alukosayoenoch.wix.com/selfcoding  Instructions on how to download and extract the data files are available at For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Simple Regression Simple regression estimates how the value of one dependent variable (Y) can be predicted based on the value of one independent variable (X). The linear equation for simple regression is as follows: Y = aX + b Simple regression can answer the following research question: Research Question # 1 How much will each salesperson make this year? SCATTER PLOT A scatter plot displays the nature of the relationship between two variables. It is recommended to run a scatter plot before performing a regression analysis to determine if there is a linear relationship between the variables. If there is no linear relationship (i.e., points on a graph are not clustered in a straight line), there is no need to run a simple regression.
  • 39. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 39 To run a scatter plot: 1. Start PASW Statistics 17. 2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens. 3. Locate and open the “Regression.sav” file. 4. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot… (see Figure 1). The Scatter/Dot dialog box opens (see Figure 2). NOTE: To estimate the relationship between two variables, select the Simple Scatter plot. Figure 1 - Graphs Menu When Selecting Scatter/Dot Figure 2 - Scatter/Dot Dialog Box 5. If necessary, select the Simple Scatter option, and then click the Define button (see Figure 2). The Simple Scatterplot dialog box opens (see Figure 3). Figure 3 - Simple Scatterplot Dialog Box 6. Select the variable “Last year sales [lastsale]” from the list box on the left. 7. Click the first transfer arrow button to move the variable to the Y Axis: box. 8. Select the variable “Years of experience [yearexpe]” from the list box on the left. 9. Click the second transfer arrow button to move the variable in the X Axis: box.
  • 40. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 40 10. Click the OK button. The Output Viewer window opens with a scatter plot of the variables (see Figure 4). NOTE: A graph similar to Figure 4 will be displayed in the Output Viewer window. This scatter plot indicates that there is a linear relationship between the variables “Last year sales” and “Years of experience.” The next step is to find a line that best accommodates the pattern of points in this scatter plot. The steps on how to enhance graph appearance are included in the last section of this handout. Figure 4 - Scatter Plot PREDICTING VALUES OF DEPENDENT VARIABLES Since it is known that a linear relationship exists between the two variables, the regression analysis can be performed to predict this year’s sales. To run a simple regression analysis: 1. Switch to the Data Editor window. 2. Click the Analyze menu, point to Regression, and select Linear… (see Figure 5). The Linear Regression dialog box opens. Figure 5 - Analyze Menu When Selecting Linear 3. Select the variable “Last year sales [lastsale]” from the variable list box on the left and move it to the Dependent: box by clicking the first transfer arrow button (see Figure 6).
  • 41. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 41 Figure 6 - Linear Regression Dialog Box 4. Select the variable “Years of experience [yearexpe]” from the variable list box on the left and move it to the Independent(s): box by clicking the second transfer arrow button. 5. Click the OK button. The following tables present the results of a simple regression. “R Square” (.918) indicates that this model accounts for almost 92% of the total variation in the data (see Figure 7). Figure 7 - Model Summary Output Figure 8 - Coefficients Output
  • 42. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 42 The slope and the y-intercept as seen in Figure 8 should be substituted in the following linear equation to predict this year’s sales: Y = aX + b. In this case, the values of a, b, x, and y will be as follows: a = 1954.658 b = 440.987 X = Years of experience (values of independent variable) Y = Last year sales (values of dependent variable) PREDICTING THIS YEAR’S SALES WITH SIMPLE REGRESSION MODEL To predict this year’s sales for each salesman, the values of a and b should be substituted in the following linear equation: Y = aX + b Last year sales = (a * yearexpe) + b This year sales = (1954.658 * yearexp2) + 440.987 a = 1954.658 b = 440.987 X = Years of experience [yearexp2] Y = This year sales NOTE: The new independent variable, “yearexp2” is used instead of “yearexpe” in order to predict this year’s sales. To predict this year’s sales using the computing function: 1. Switch to the Data Editor window. 2. Click the Transform menu and select Compute Variable…. The Compute Variable dialog box opens (see Figure 9). 3. In the Target Variable: box, type [Simple]. Figure 9 - Compute Variable Dialog Box
  • 43. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 43 4. In the Numeric Expression: box, enter the following equation by typing or selecting from the dialog box keypad: [1954.658 * yearexp2 + 440.987] NOTE: It is recommended to select the variable “yearexp2” directly from the variable list box on the left of the Compute Variable dialog box to prevent typing mistakes. 5. Click the OK button. The results will be displayed in the Simple column in Data View (see Figure 10). Figure 10 - Simple Regression Results To change the data type for the new variable “Simple”: 1. Click the Variable View tab at the lower left corner of the Data Editor window (see Figure 11). Figure 11 - Variable View Tab 2. Locate the variable “Simple” and click the Ellipses button under the Type column. The Variable Type dialog box opens (see Figure 12). 3. Select the Dollar option, and then select the $###,###,### format (12 digits width with 0 decimal places). Figure 12 - Variable Type Dialog Box 4. Click the OK button, and then click the Data View tab.
  • 44. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 44 Figure 13 - Simple Regression Prediction NOTE: The prediction of this year’s sales for each salesperson are computed under the new variable named “Simple” as shown in Figure 13. Multiple Regression Multiple regression estimates the coefficients of the linear equation when there is more than one independent variable that best predicts the value of the dependent variable. For example, it is possible to predict a salesperson’s total annual sales (the dependent variable) based on independent variables such as age, education, and years of experience. The linear equation for multiple regression is as follows: Z = aX + bY + c PREDICTING VALUES OF DEPENDENT VARIABLES The previous section demonstrated how to predict this year’s sales (the dependent variable) based on one independent variable (number of years of experience) by using simple regression analysis. Similarly, this year’s sales (the dependent variable) can be predicted from more than one independent variable, such as “Years of experience” and “Years of education,” by using multiple regression analysis. To run multiple regression analysis: 1. Click the Analyze menu, point to Regression, and select Linear…. The Linear Regression dialog box opens (see Figure 14). 2. From the variable list box, select “Last year sales [lastsale]” as a dependent variable and move it to the Dependent: box by clicking the first transfer arrow button . 3. From the variable list box, select “Years of experience [yearexpe]” and “Years of education [educatio]” and move them to the Independent(s): box by clicking the second transfer arrow button . 4. Click the OK button. NOTE: If there are variables in the Independent(s): or Dependent: boxes, click the Reset button before performing steps 2 and 3 above.
  • 45. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 45 Figure 14 - Linear Regression Dialog Box Figure 15 - Model Summary Output for Multiple Regression NOTE: The table should look similar to Figure 15. “R Square” = “.976” indicates that this model can predict this year’s sales almost 98% correctly. Figure 16 - Multiple Regression Output The slopes and the y-intercept as seen in Figure 16 should be substituted in the following linear equation to predict this year’s sales: Z = aX+ bY + c In this case, the values of a, b, x, and y will be as follows: a = 1874.5 b = 609.391 c = (-8510.838) X = Years of experience (independent variable) Y = Years of education (independent variable) Z = This year sales (dependent variable) As indicated in the output table, the coefficient for “Years of experience” is “1874.5”and the coefficient for “Years of education” is “609.391.”
  • 46. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 46 PREDICTING THIS YEAR’S SALES WITH MULTIPLE REGRESSION MODEL To predict this year’s sales for each salesman, the values of a, b, and c should be substituted in the following linear equation: Z = aX + bY + c This year sales = 1874.5 * Years of experience + 609.391 * Years of education + (-8510.838) To predict this year’s sales by multiple regression analysis: 1. Switch to the Data Editor window. 2. Click the Transform menu and select Compute Variable…. The Compute Variable dialog box opens (see Figure 17). 3. Click the Reset button. 4. In the Target Variable: box, type [multiple]. 5. In the Numeric Expression: box, enter the following equation by typing or selecting from the dialog box keypad: [1874.5 * yearexp2 + 609.391 * educatio - 8510.838] Figure 17 - Compute Variable Dialog Box 6. Click the OK button. The results will be displayed in the multiple column in Data View (see Figure 18). Figure 18 - Multiple Regression Results
  • 47. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 47 NOTE: The predictions of sales for each salesperson using two independent variables are listed under the new variable named “multiple.” Data Transformation Situations may arise where data transformation is useful. Most data transformations can be done with the Compute… command. Using this command, the data file can be manipulated to fit various statistical performances. Research Question # 2 Who will earn a $1,000 bonus? COMPUTING Since each person’s yearly sales were already predicted, those who made more than $2,000 above the predicted values, obtained via multiple regression analysis, will receive $1,000 as a bonus. Using the Compute… command, those salespeople who met the criteria can be easily located by comparing the values of this year’s actual sales with the predictions from multiple regression analysis computed in the previous lesson. The first step in predicting who will receive a bonus is to calculate the difference between this year’s actual sales and the prediction of this year’s sales from the multiple regression analysis. To predict who will qualify for the bonus: 1. Open the “Bonus.sav” file. 2. If the Save As dialog box opens, click the No button. 3. Click the Transform menu and select Compute Variable…. The Compute Variable dialog box opens (see Figure 19). 4. In the Target Variable: box, type [bonus]. 5. In the Numeric Expression: box, type [1000]. Figure 19 - Compute Variable Dialog Box 6. Click the If… button. The Compute Variable: If Cases dialog box opens (see Figure 20). 7. Select the Include if case satisfies condition: option.
  • 48. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 48 8. Enter the following expression by typing or selecting from the dialog box keypad: [thissale - multiple >= 2000] Figure 20 - Compute Variable: If Cases Dialog Box NOTE: It is recommended that you select the variables and the >= sign directly from the variable list box and keypad provided in the dialog box to prevent mistakes. 9. Click the Continue button, and then click the OK button. NOTE: Salespersons #49 “Jason” and #44 “Ivett” are a couple of the sales personnel who will be qualified to receive a $1,000 bonus due to them making $2,000 over their predicted sales from last lesson (see Figure 21). Figure 21 - Bonus Results Polynomial Regression This type of regression involves fitting a dependent variable (Yi) to a polynomial function of a single independent variable (Xi). The regression model is as follows (see Table 1 for the meaning of the variables): Yi = a + b1Xi + b2Xi 2 + b3Xi 3 + … + bkXi k + ei Table 1 - Breakdown of the Variables Variable Meaning a Constant bj The coefficient for the independent variable to the j’th power ei Random error term
  • 49. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 49 REGRESSION ANALYSIS To look at the growth relationship between weight and age: 1. Open the “Growth.sav” file. 2. Click the Analyze menu, point to Regression, and select Curve Estimation…. The Curve Estimation dialog box opens to define the parameters of the analysis (see Figure 22). 3. Transfer the “wght” variable to the Dependent(s): box and the “age” variable to the Independent Variable: box. NOTE: The weight (dependent) variable is what is being predicted using the age (independent) variable. 4. Deselect the Plot models check box. 5. Select the Display ANOVA table check box. 6. Under Models, deselect the Linear check box and select the Cubic check box. 7. Click the OK button. Figure 22 - Curve Estimation Dialog Box Analyzing the Results This cubic model has an R2 of 99.567% (see Figure 23). The F-ratio indicates a highly significant fit. The best fitting cubic polynomial is given by the follow equation: (Where Yi is weight and Xi is age); Yi = 0.052 – 0.017 Xi + 0.010 Xi 2 – 0.001 Xi 3 + ei Multiple regression can be used to fit polynomials of higher order. If X is the dependent variable, use the Transform and Compute options of the Data Editor (as discussed earlier in this lesson) to create new variables X2 = X*X, X3 = X*X2, X4 = X*X3, etc., then use these new variables (X, X2, X3, X4, etc.) as a set of independent variables for a multiple regression analysis.
  • 50. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 50 Figure 23 - Polynomial Regression Summary Results Chart Editing During the final stage of research, enhancing the appearance of charts and figures can be very helpful for readers to understand what may seem to be confusing statistics. This will save the time and effort to copy and paste an object from one program to another and to modify its features. The following steps explain some useful methods to enhance the appearance of a chart. ADDING A LINE TO THE SCATTER PLOT Adding a straight line to fit the scattered pattern of a data chart can help emphasize the linear relationship among the data. To add a line to the scatter plot: 1. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot…. 2. Select the Simple Scatter option, and then click the Define button. 3. Transfer the “age” variable to the X Axis: box and the “wght” variable to the Y Axis: box, and then click the OK button. A chart appears in the Output Viewer window. 4. Double-click the chart in the Output Viewer window to modify it. The Chart Editor window opens (see Figure 24). 5. Right-click a chart marker (see Figure 25) and select Add Fit Line at Total from the shortcut menu. 6. Under Fit Method, select the Cubic option, and then click the Apply button. 7. Close the Chart Editor window. NOTE: Notice that the Add Fit Line at Total does not capture the way the data curves, but the cubic method is almost a perfect fit (see Figure 26).
  • 51. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 51 Figure 24 - Chart Editor Window Figure 25 - Chart Markers Figure 26 - Adding a Fit Line to the Scatter Plot MANIPULATING THE SCALES ON X- AND Y-AXES The X-axis and Y-axis can be adjusted to enhance the overall appearance and readability of the chart. Various elements of the axes can be manipulated, such as scale, ticks and grids, number format, and axis label. To manipulate the scales on the X-axis: 1. If necessary, open the “Regression.sav” file. 2. Run the scatter plot where the Y-axis is “Last year sales” and the X-axis is “Years of experience.” 3. Double-click the chart to open the Chart Editor window. 4. Click the Select the X axis button on the Standard toolbar to manipulate the X-axis. The Properties dialog box opens. 5. Select the Scale tab (see Figure 27). 6. Change the value in the Lower margin (%): box to 0. 7. Select the Labels & Ticks tab (see Figure 28). 8. In the Major Ticks section, select the Display ticks check box. 9. Click the Style arrow and select Inside from the list.
  • 52. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 52 Figure 27 - X-axis Properties Dialog Box: Scale Tab Figure 28 - X-axis Properties Dialog Box: Labels & Ticks Tab 10. Click the Show Grid Lines button on the Standard toolbar to show the Properties dialog box. 11. Select the Grid Lines tab, select the Major ticks only option, click the Apply button, and then click the Close button (see Figure 29). 12. Click the Select the Y axis button on the Standard toolbar to manipulate the Y-axis. The Properties dialog box opens. 13. Select the Scale tab (see Figure 30). Figure 29 - Properties Dialog Box: Grid Lines Tab
  • 53. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 53 Figure 30 - Y-axis Properties Dialog Box: Scale Tab 14. Change the value in the Lower margin (%:) box to 0. 15. Click the Apply button, and then click the Close button. Figure 31 - Before Manipulating the X-axis Figure 32 - After Manipulating the X-axis ADDING A TITLE TO THE CHART Adding a title to the chart is a simple process that enhances the chart’s appearance. To add a title to a chart: 1. In the Chart Editor window, click in a blank area outside the first chart to select the whole chart, then move the mouse pointer to one of the selection handles until it becomes a two-headed arrow. 2. Drag the mouse pointer to reduce the chart size. 3. Click the Insert a text box button on the Standard toolbar. The text box appears above the chart and the Properties dialog box opens. 4. Type “Relationship Between Last Year Sales and Years of Experience” in the text box. 5. Click the border of the text box to select it.
  • 54. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 54 6. Select the Text Style tab in the Properties dialog box, select a color for the title text, click the Apply button, and then click the Close button. 7. Click the Bold button on the Standard toolbar, and change the Font Size to “12.” 8. Resize the text box to fit the text. 9. If necessary, resize the chart to display the title at the top of the chart (see Figure 33). Figure 33 - Adding a Title to the Chart ADDING COLORS TO THE CHART All elements on the chart can be colored differently to add emphasis or distinguish between elements. To add colors to a chart: 1. In the Chart Editor window, select the chart element to change or add color to, such as one of the plots (see Figure 34). 2. Click the Show Properties Window button on the Standard toolbar. The Properties dialog box opens (see Figure 35). 3. Select the Marker tab, and then select a color from the color palette. 4. To change the marker type, click the Type arrow in the Marker section and select a symbol from the menu (see Figure 35). 5. View the changes in the Preview section. 6. Click the Apply button, and then click the Close button. Figure 34 - Adding Color to the Chart
  • 55. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 55 Figure 35 - Properties Dialog Box FILLING A BACKGROUND COLOR The background color can also be filled to make the chart stand out. To fill in a background color: 1. Click inside a blank area of the chart to select the entire chart area (see Figure 36). 2. Click the Show Properties Window button on the Standard toolbar. The Properties dialog box opens. 3. Select the Fill swatch . 4. Click the Pattern arrow and select a background pattern. 5. Click the Apply button, and then click the Close button. Figure 36 - Filling a Background Color
  • 56. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Introduction – Part 4 PASW stands for Predictive Analytics Software. This program can be used to analyze data collected from surveys, tests, observations, etc. It can perform a variety of data analyses and presentation functions, including statistical analysis and graphical presentation of data. Among its features are modules for statistical data analysis. These include 1) descriptive statistics, such as frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential and multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis, cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited for survey research, though by no means is it limited to just this topic of exploration. This handout (Chi-Square and ANOVA) introduces basic skills for performing hypothesis tests utilizing Chi-Square test for Goodness-of-Fit and generalized pooled t tests, such as ANOVA. The step-by-step instructions will guide the user in performing “tests of significance” using PASW Statistics and help the user understand how to interpret the output for research questions. Downloading the Data Files This handout includes sample data files that can be used for hands-on practice. The data files are stored in a self-extracting archive. The archive must be downloaded and executed in order to extract the data files.  The data files used with this handout are available for download at For more assistance, visit www.alukosayoenoch.wix.com/selfcoding  Instructions on how to download and extract the data files are available at For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Chi-Square The Chi-Square (χ2) test is a statistical tool used to examine differences between nominal or categorical variables. The Chi-Square test is used in two similar but distinct circumstances:  To estimate how closely an observed distribution matches an expected distribution – also known as the Goodness-of-Fit test.  To determine whether two random variables are independent. CHI-SQUARE TEST FOR GOODNESS-OF-FIT This procedure can be used to perform a hypothesis test about the distribution of a qualitative (categorical) variable or a discrete quantitative variable having only finite possible values. It analyzes whether the observed frequency distribution of a categorical or nominal variable is consistent with the expected frequency distribution. With Fixed Expected Values Research Question # 1 Can the hospital schedule discharge support staff evenly throughout the week? A large hospital schedules discharge support staff assuming that patients leave the hospital at a fairly constant rate throughout the week. However, because of increasing complaints of staff shortages, the hospital administration wants to determine whether the number of discharges varies by the day of the week.
  • 57. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 57 H0: Patients leave the hospital at a constant rate (there is no difference between the discharge rates for each day of the week). To perform the analysis: 1. Start PASW Statistics 17. 2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens. 3. Navigate to the data files folder, select the “chi-hospital.sav” file, and then click the Open button. Before the Chi-Square test is run, the observed values need to be declared. To declare the observed values: 1. Click the Data menu and select Weight Cases…. The Weight Cases dialog box opens (see Figure 1). Figure 1 - Weight Cases Dialog Box 2. Select the Weight cases by option. 3. Select the “Average Daily Discharges [discharge]” variable and transfer it to the Frequency Variable: box. 4. Click the OK button. To perform the analysis: 1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square…. The Chi-Square Test dialog box opens (see Figure 2).
  • 58. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 58 Figure 2 - Chi-Square Test Dialog Box 2. Select the “Day of the Week [dow]” variable and transfer it to the Test Variable List: box (see Figure 2). 3. Click the OK button. The Output Viewer window opens (see Figure 3). Figure 3 - Chi-Square Frequencies Output Table Figure 4 - Chi-Square Test Statistics Output Table
  • 59. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 59 Reporting the analysis results: H0: Rejected in favor of H1. H1: Patients do not leave the hospital at a constant rate. Explanation: Figure 4 indicates that the calculated χ2 statistic,for six degrees of freedom, is 29.389. Additionally, it indicates that the significance value (0.000) is less than the usual threshold value of 0.05. This suggests that the null hypothesis, H0 (patients leave the hospital at a constant rate), can be rejected in favor of the alternate hypothesis, H1 (patients leave the hospital at different rates during the week). With Fixed Expected Values and within a Contiguous Subset of Values By default, the Chi-Square test procedure builds frequencies and calculates an expected value based on all valid values of the test variable in the data file. However, it may be desirable to restrict the range of the test to a contiguous subset of the available values, such as weekdays only (Monday through Friday). Research Question # 2 The hospital requests a follow-up analysis: can staff be scheduled assuming that patients discharged on weekdays only (Monday through Friday) leave at a constant daily rate? H0: Patients discharged on weekdays only (Monday through Friday) leave at a constant daily rate. To run the analysis: 1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square…. The Chi-Square Test dialog box opens. 2. Select the Use specified range option (see Figure 2). 3. Enter [2] in the Lower: box and [6] in the Upper: box. 4. Click the OK button. The Output Viewer window opens (see Figure 5 and Figure 6). Notice that the test range is restricted to Monday through Friday. Figure 5 - Chi-Square (Subset) Frequencies Output Table Figure 6 - Test Statistics Output Table NOTE: The expected values are equal to the sum of the observed values divided by the number of rows, while the observed values are the actual numbers of patients discharged. Reporting the analysis results: H0: Do not reject. Patients discharged on weekdays only (Monday through Friday) leave at a constant daily rate.
  • 60. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding Explanation: Figure 5 indicates that on average, about 92 patients were discharged from the hospital each weekday. The rate for Mondays was below average and the rate for Fridays was greater than average. Figure 6 indicates that the calculated value of the Chi-Square statistic was 5.822 at four degrees of freedom. Because the significance level (0.213) is greater than the rejection threshold of 0.05, H0 (patients were discharged at a constant rate on weekdays) could not be rejected. Using the Chi-Square test procedure, it was determined that the rate at which patients were discharged from the hospital was not constant over the course of an average week. This was primarily due to a greater number of discharges on Fridays and fewer discharges on Sundays. When the range of the test was restricted to weekdays, the discharge rates appeared to be more uniform. Staff shortages could be corrected by adopting separate weekday and weekend staff schedules. With Customized Expected Values Research Question # 3 Does first-class mailing provide quicker response time than bulk mail? A manufacturer tries first-class postage for direct mailings, hoping for faster responses than with bulk mail. Order takers record how many weeks each order takes after mailing. H0: First-class and bulk mailings do not result in different customer response times. Before the Chi-Square test is run, the cases must be weighted. Because this example compares two different methods, one method must be selected to provide the expected values for the test and the other will provide the observed values. To weight the cases: 1. Open the “chi-mail.sav” file. 2. Click the Data menu and select Weight Cases…. The Weight Cases dialog box opens. 3. Select the Weight cases by option. 4. Select the “First Class Mail [fcmail]” variable and transfer it to the Frequency Variable: box. 5. Click the OK button. To run the analysis: 1. Click the Analyze menu, point to Nonparametric Tests, and select Chi-Square…. The Chi-Square Test dialog box opens. 2. Select the “Week of Response [week]” variable and transfer it to the Test Variable List: box. 3. Select the Values: option in the Expected Values section. 4. Enter [6] in the Values: box. 5. Click the Add button. 6. Repeat steps 4 and 5, adding the values [15.1], [18], [12], [11.5], [9.8], [7], [6.1], [5.5], [3.9], [2.1], and [2] (in that order). 7. Click the OK button. The Output Viewer window opens. NOTE: The expected frequencies in this example are the response percentages that the firm has historically obtained with bulk mail.
  • 61. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 61 Figure 7 - First-Class/Bulk Mail Week of Response Figure 8 - Week of Response Test Statistics Reporting the analysis results: H0: Do not reject. There was no statistical difference between customer response times using first-class mailing and customer response times using bulk mailing. Explanation: The manufacturer hoped that first-class mail would result in quicker customer response. As indicated in Figure 7, the first two weeks indicated different response times of four and seven percentage points, respectively. The question was whether the overall differences between the two distributions were statistically significant. The Chi-Square statistic was calculated to be 12.249 at eleven degrees of freedom (see Figure 8). The significance value (p) associated with the data was 0.345, which was greater than the threshold value of 0.05. Hence, H0 was not rejected because there was no significant difference between first-class and bulk mailings. The first-class mail promotion did not result in response times that were statistically different from standard bulk mail. Therefore, bulk postage was more economical for direct mailings. One-Way Analysis of Variance One-way analysis of variance (One-Way ANOVA) procedures produce an analysis for a quantitative dependent variable affected by a single factor (independent variable). Analysis of variance is used to test the hypothesis that several means are equal. This technique is an extension of the two-sample t test. It can be thought of as a generalization of the pooled t test. Instead of two populations (as in the case of a t test), there are more than two populations or treatments. Research Question # 4 Which of the alloys tested would be appropriate for creating an underwater sensor array?
  • 62. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 62 To create an underwater sensor array, four different alloys are tested for corrosion resistance. Five plates of the same size of each alloy are placed underwater for 60 days. After 60 days, the number of corrosion pits on each plate is measured. H0: The four alloys exhibit the same kind of behavior and are not different from one another. To run One-Way ANOVA: 1. Open the “alloy.sav” file. NOTE: Each case within the One-Way ANOVA data file represents one of the 20 metal plates (five plates of four different alloys) and is characterized by two variables. One variable assigns a numeric value to the alloy. The other variable is used to quantify the number of pits on the plate after being underwater for 60 days (see Figure 9). Figure 9 - Alloy Data File 2. In Data View, click the Analyze menu, point to Compare Means, and select One-Way ANOVA…. The One-Way ANOVA dialog box opens (Figure 10). Figure 10 - One-Way ANOVA Dialog Box 3. Select the “pits” variable from the box on the left and transfer it to the Dependent List: box (see Figure 10). 4. Select the “Alloy [alloy]” variable from the box on the left and transfer it to the Factor: box (see Figure 10). 5. Click the Options… button. The One-Way ANOVA: Options dialog box opens (see Figure 11).
  • 63. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 63 Figure 11 - One-Way ANOVA: Options Dialog Box 6. Select the Descriptive, Homogeneity of variance test, and Means plot check boxes. 7. Click the Continue button. 8. Click the OK button. The Output Viewer window opens. Figure 12 - ANOVA Descriptive Output Figure 13 - Output for Test of Homogeneity of Variances Figure 14 - ANOVA Output Reporting the analysis results: H0: Reject in favor of H1. H1: The four alloys do not exhibit the same kind of behavior. They are statistically different from one another. Explanation: Figure 12 lists the means, standard deviations, and individual sample sizes of each alloy. Figure 13 provides the degrees of freedom and the significance level of the population; “df1” is one less than the number of sample alloys (4-1=3) and “df2” is the difference between
  • 64. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 64 the total sample size and the number of sample alloys (20-4=16). Figure 14 lists the sum of the squares of the differences between means of different alloy populations and their mean square errors. In Figure 14, the “Between Groups” variation “6026.200” is due to interaction in samples between groups. If sample means are close to each other, this value is small. The “Within Groups” variation “335.600” is due to differences within individual samples. The “Mean Square” values are calculated by dividing each “Sum of Squares” value by its respective degree of freedom (“df”). The table also lists the F statistic “95.768,” which is calculated by dividing the “Between Groups Mean Square” by the “Within Groups Mean Square.” The significance level of “0.000” is less than the threshold value of 0.05 and indicates that the null hypothesis can be rejected, leading to the conclusion that the alloys are not all the same. Post Hoc Tests In ANOVA, if the null hypothesis is rejected, then it is concluded that there are differences between the means (μ1, μ2,…, μa). It is useful to know specifically where these differences exist. Post hoc testing identifies these differences. Multiple comparison procedures look at all possible pairs of means and determine if each individual pairing is the same or statistically different. In an ANOVA with α treatments, there will be α*(α-1)/2 possible unique pairings, which could mean a large number of comparisons. Research Question # 5 Is the mean difference between alloy sets statistically significant? The previous null hypothesis was rejected, leading to the conclusion that all the alloys do not exhibit the same behavior. The next part of the analysis is to determine if the mean difference between individual alloy sets is statistically significant. H0: μ0 = μ1…= μa H1: μ0 ≠ μ1 …≠ μa To run post hoc tests: 1. In Data View, click the Analyze menu, point to Compare Means, and select One-Way ANOVA…. The One-Way ANOVA dialog box opens (see Figure 15). Figure 15 - One-Way ANOVA Dialog Box 2. Click the Post Hoc… button. The One-Way ANOVA: Post Hoc Multiple Comparisons dialog box opens (see Figure 16).
  • 65. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 65 3. Select the LSD check box, click the Continue button, and then click the OK button. The Output Viewer window opens. NOTE: LSD stands for List Square Difference, which compares the means one by one. Figure 16 - One-Way ANOVA: Post Hoc Multiple Comparisons Dialog Box Figure 17 - Multiple Comparisons Output Figure 18 - Means Plot Reporting the analysis results:
  • 66. For more assistance, visit www.alukosayoenoch.wix.com/selfcoding 66 H0: Reject in favor of H1. H1: At least one of the means is different. Explanation: Figure 17 shows the results of comparing pairs of means between different alloy sets. Each row indicates the difference between the two corresponding treatments. Alloys “1” and “4” have a mean difference of “2.4” (a relatively small value). Also, the significance level of “0.420” indicates that the null hypothesis cannot be rejected for the comparison of alloys “1” and “4.” There is no statistically significant difference between them. Alloy pairs “1” and “2,” “1” and “3,” “2” and “3,” “2” and “4,” and “3” and “4” have large mean differences with significance values of “0.000.” In these cases, the null hypothesis can be rejected, leading to the conclusion that they are statistically different. Also, the means plot (see Figure 18) shows that both alloys “1” and “4” have average mean values of pits very close to each other. Because alloys “1” and “4” have the lowest mean number of corrosion pits, they are the best candidates for the array. Depending on the relative costs of the two alloys, the one that is more cost effective can be selected to construct the array. Two-Way Analysis of Variance Two-way analysis of variance (Two-Way ANOVA) is an extension to the one-way analysis of variance. The difference is that instead of running the test by using a single independent variable, two or more independent variables can be used to run the test in two-way analysis of variance. There are several advantages in using several variables over using a one variable design. Some of the advantages are a two-variable design ANOVA is more efficient and it also helps increase statistical power of the result. Research Question # 6 Will typing ability and test method affect student test scores? To answer the question, an essay final is given to the class. Two test methods are used – half the students are assigned to write the final with a blue-book and the other half with notebook computers. In addition, the students are partitioned into three groups, namely: no typing ability, some typing ability, and highly skilled at typing. After evaluating the final, the mean score of each group is examined. H0: Typing ability and test method do not affect student test scores. H1: Typing ability and test method do affect student test scores. To run Two-Way ANOVA: 1. Open the “Two-Way-ANOVA.sav” file (see Figure 19).