Upcoming SlideShare
×

Like this document? Why not share!

# Spss step by-step tutorail

## on Dec 02, 2011

• 5,567 views

### Views

Total Views
5,567
Views on SlideShare
5,567
Embed Views
0

Likes
4
618
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Spss step by-step tutorailDocument Transcript

• SPSS Step-by-Step Tutorial: Part 1 For SPSS Version 11.5
• Table of Contents1 SPSS Step-by-Step 5 Introduction 5 Installing the Data 6 Installing files from the Internet 6 Installing files from the diskette 6 Introducing the interface 6 The data view 7 The variable view 7 The output view 7 The draft view 10 The syntax view 10 What the heck is a crosstab? 122 Entering and modifying data 13 Creating the data definitions: the variable view 13 Variable types 13SPSS Step-by-Step 3
• Variable names and labels 15 Missing values 15 Non-numeric numbers, or when is a number not a number? 15 Binary variables 15 Creating a new data set 16 Getting help in creating data sets and defining variables 22 Creating primary reference lists 24 Frequencies 24 Descriptive statistics: descriptives (univariate) 25 Recodes and Transformations 26 Backup the original file 26 Recoding existing variables 27 Recode income data 27 Recoding variables revisited 37 The one exception in recoding variables 37 The other exception 37 3 Charting your data 39 Using the automated chart function 40 Using the Interactive Chart function 42 Creating a chart from scratch 454 SPSS Step-by-Step
• 1 SPSS Step-by-Step Introduction SPSS (Statistical Package for the Social Sciences) has now been in development for more than thirty years. Originally developed as a programming language for con- ducting statistical analysis, it has grown into a complex and powerful application with now uses both a graphical and a syntactical interface and provides dozens of functions for managing, analyzing, and presenting data. Its statistical capabilities alone range from simple perentages to complex analyses of variance, multiple regressions, and general linear models. You can use data ranging from simple inte- gers or binary variables to multiple response or logrithmic variables. SPSS also provides extensive data management functions, along with a complex and powerful programming language. In fact, a search at Amazon.com for SPSS books returns 2,034 listings as of March 15, 2004. In these two sessions, you won’t become an SPSS or data analysis guru, but you will learn your way around the program, exploring the various functions for manag- ing your data, conducting statistical analyses, creating tables and charts, and pre- paring your output for incorporation into external files such as spreadsheets and word processors. Most importantly, you’ll learn how to learn more about SPSS. SPSS Step-by-Step 5
• Installing the Data Installing the Data The data for this tutorial is available on floppy disk (if you received this tutorial as part of a class) and on the Internet. Use one of the following procedures to install the data on your computer. Installing files from the Internet Before you begin to download the files, create a new folder on your computer’s hard disk named SPSSTutorialData. When you download the files, you’ll save them in this directory. 1. Go to ftp.datastep.com. Do not type “www” or “http://”. 2. When prompted for a user name, enter: SPSS 3. For the password, enter: tutorial. 4. To download each file, click it once, press Ctrl-C or select Edit > Copy from the menu. 5. Switch to a window for your computer and save the file in the directory named SPSSTutorialData. To do so, open the folder and press Ctrl-V or select Edit > Paste from the menu. 6. Once you have downloaded all the files, close the Internet browser. Installing files from the diskette 1. With the diskette in the floppy drive, open a window for the C drive on your computer (or any other drive where you want to save the data files). 2. Create a new folder named SPSSTutorialData. 3. Copy the files from the floppy disk to the SPSSTutorialData folder by copying and pasting or by dragging. Introducing the interface When you use SPSS, you work in one of several windows: the data view, the vari- able view, the output view, the draft output view, and the script view. Eventually you’ll also use the syntax editor (think: code) to save or refine your queries.6 SPSS Step-by-Step
• Introducing the interfaceThe data viewThe data view displays your actual data and any new variables you have created(we’ll discuss creating new variables later on in this session).1. From the menu, select File > Open > Data.2. In the Open File window, navigate to C:SPSSTutorialDataEmployee data.sav and open it by double-clicking. SPSS opens a window that looks like a standard spreadsheet. In SPSS, columns are used for variables, while rows are used for cases (also called records).3. Press Ctrl-Home to move to the first cell of the data view.4. Press Ctrl-End to move to the last cell of the data view.5. Press Ctrl-Home again to move back to the first cell.The variable view‘At the bottom of the data window, you’ll notice a tab labeled Variable View. Thevariable view window contains the definitions of each variable in your data set,including its name, type, label, size, alignment, and other information.1. Click the Variable View tab.2. Review the information in the rows for each variable.Note: While the variables are listed as columns in the Data View, they are listed as rows in the Variable View. In the Variable View, each column is a kind of variable itself, containing a specific type of information.3. Click the Data View tab to return to the data.4. Double-click the label id at the top of the id column. Notice that double-clicking the name of a variable in the data view opens the variable view window to the definition of that variable.5. Click the data view tab again.The output viewThe output window is where you see the results of your various queries such as fre-quency distributions, cross-tabs, statistical tests, and charts. If you’ve worked withExcel, you’re probably used to seeing all your work on one page, charts, data, andcalculations. In SPSS, each window handles a separate task. The output window iswhere you see your results.SPSS Step-by-Step 7
• Introducing the interface Try it: 1. From the menu, select Analyze > Descriptive Statistics> Crosstabs. 2. Click once on Employment, then click the small right arrow next to Rows to move the variable to the Rows pane (Figure 1). FIGURE 1. Moving a variable to the Rows pane 3. Click Gender, then click the small right arrow next to Columns to move the Gender variable to the Columns pane. (Figure 2).8 SPSS Step-by-Step
• Introducing the interface FIGURE 2. Selecting a column variable4. Click Display clustered bar charts (Figure 3). FIGURE 3. Selecting clustered bar charts click here to display the clustered bar chartsSPSS Step-by-Step 9
• Introducing the interface 5. Click OK. SPSS brings the output window to the front displaying two tables and the clustered bar chart you requested. Take a moment to review the contents of the tables and the chart. Notice that the red arrow next to the title Crosstabs corresponds to the Crosstabs icon in the left pane of the window. The left pane displays the contents of the right pane and is a convenient method of moving around among the various output you’ll be generating. Note: In some cases, you may see asterisks instead of numbers in a table cell. Asterisks indicate that the current column is too narrow to display the com- plete number. Widen the column by dragging its margin to the right. The draft view The draft view is where you can look at output as it is generated for printing. The draft view does not contain the contents pane or some of the notations present in the output pane. Try it: 1. From the menu, select File > New > Draft Output. SPSS opens a Draft Output window that contains its own menu. 2. From the menu, select Analyze > Descriptive Statistics > Crosstabs. Notice that the dialog box opens with your previous selections. 3. Click OK. From here you can select charts or tables, copy them, and paste them into other applications like spreadsheets or word processors. Note: If you want to maintain the correct spacing of the tables, use a non-propor- tional font like Courier New. The syntax view SPSS has never lost its roots as a programming language. Although most of your daily work will be done using the graphical interface, from time to time you’ll want to make sure that you can exactly reproduce the steps involved in arriving at certain conclusions. In other words, you’ll want to replicate your analysis. The best method of preserving the exact steps of a particular analysis is the syntax view. In the syn- tax view, you’ll preserve the code used to generate any set of tables or charts. Syntax is basically the actual computer code that produces a specific output. It looks like this:10 SPSS Step-by-Step
• Introducing the interface CROSSTABS /TABLES=jobcat BY gender /FORMAT= AVALUE TABLES /CELLS= COUNT /BARCHART .In the code shown above, SPSS is instructed to create crosstabs, using the variablejobcat, sorting the crosstabs by gender using a specific format, to put a count intoeach cell, and then to create a corresponding barchart.Note: Preserving the steps you take in arriving at a conclusion is especially impor- tant if you are writing for publication, peer review, or any other situation in which others might want to test your conclusions.In the next steps, you’ll create a simple chart and frequency distribution, save thesyntax and then recreate the chart and frequency distribution by running the savedsyntax.Try it:1. From the menu, select Analyze > Descriptive Statistics > Crosstabs. Notice that your previous selections are still present.2. This time, instead of clicking OK, click Paste.3. SPSS opens the Syntax Editor with the code you just pasted. When you select Paste instead of OK in dialog boxes like the Crosstabs box, the code generated by the function is pasted into the Syntax Editor. If you wanted, you could generate all your output from the syntax window alone. Gen- erally, however, you will probably work with your data and output until they are just the way you want them, then repeat the steps you took and paste the code into the syntax editor. You can then run the syntax at any time to recreate the output.4. If the cursor is not already located somewhere in the syntax, click anywhere in the word CROSSTABS, then click the small right arrow on the toolbar or select from the menu Run > Current. SPSS opens the draft output window with the same chart you created using the menu commands. This time, however, you cre- ated the output by running the syntax (code) you created with the Paste func- tion. Notice that you can scroll up to the previous output you created using the menu commands.SPSS Step-by-Step 11
• What the heck is a crosstab? What the heck is a crosstab? A crosstab (short for cross tabulation) is a summary table, with the emphasis on summary. Here’s an example: Employment Category * Gender Crosstabulation Count Gender Female Male Total Employment Clerical 206 157 363 Category Custodial 0 27 27 Manager 10 74 84 Total 216 258 474 Notice that the rows contain one set of categories (employment category) while the columns contain another (gender). In this crosstab, the cells contain counts, but in others you can use percentages, means, standard deviations, and the like. Here’s the important part: crosstabs are used for only categorical (discrete) data, that is, groups like employment categories or gender. You can’t use a crosstab for continuous data like temperature or dosage or income. BUT, you can change data like temperature or dosage or income into categories by creating groups, like income less than \$25,000, income between 25000 and 49999, income 50000 or higher. We’ll discuss these data conversions known as transformations or recodes later. For now, you just need to understand that crosstabs deal with groups or cate- gories. Now that you’ve seen the various windows you’ll be using, we’ll move on to the techniques you’ll use in SPSS for managing your data files.12 SPSS Step-by-Step
• 2 Entering and modifying data In this section, you’ll learn how to define variables and create a data set from scratch. Creating the data definitions: the variable view It’s impossible to talk about SPSS (or any analysis program) without talking about data and types of data. So here goes. Each particular type of information (such as income or gender or temperature or dosage) is called a variable. You can have vari- ous types of variables such as numeric variables (any number that you can use in a calculation), string variables (text or numbers that you can’t use in calculations), currency (numbers with two and only two decimal places) and variables with spe- cific formats. Variable types SPSS uses (and insists upon) what are called strongly typed variables. Strongly typed means that you must define your variables according to the type of data they will contain. You can use any of the following types, as defined by the SPSS Help file. SPSS Step-by-Step 13
• Creating the data definitions: the variable view • Numeric. A variable whose values are numbers. Values are displayed in stan- dard numeric format. The Data Editor accepts numeric values in standard for- mat or in scientific notation. • Comma. A numeric variable whose values are displayed with commas delimit- ing every three places, and with the period as a decimal delimiter. The Data Edi- tor accepts numeric values for comma variables with or without commas; or in scientific notation. • Dot. A numeric variable whose values are displayed with periods delimiting every three places, and with the comma as a decimal delimiter. The Data Editor accepts numeric values for dot variables with or without dots; or in scientific notation. (Sometimes known as European notation.) • Scientific notation. A numeric variable whose values are displayed with an embedded E and a signed power-of-ten exponent. The Data Editor accepts numeric values for such variables with or without an exponent. The exponent can be preceded either by E or D with an optional sign, or by the sign alone--for example, 123, 1.23E2, 1.23D2, 1.23E+2, and even 1.23+2. • Date. A numeric variable whose values are displayed in one of several calendar- date or clock-time formats. Select a format from the list. You can enter dates with slashes, hyphens, periods, commas, or blank spaces as delimiters. The cen- tury range for 2-digit year values is determined by your Options settings (from the Edit menu, choose Options and click the Data tab). • Custom currency. A numeric variable whose values are displayed in one of the custom currency formats that you have defined in the Currency tab of the Options dialog box. Defined custom currency characters cannot be used in data entry but are displayed in the Data Editor. • String. Values of a string variable are not numeric, and hence not used in calcu- lations. They can contain any characters up to the defined length. Uppercase and lowercase letters are considered distinct. Also known as an alphanumeric vari- able.1 Because SPSS uses strongly typed variables, you have to make sure that all the data in any field (variable) is consistent. 1. SPSS Base System Version 11.5 Help.14 SPSS Step-by-Step
• Creating the data definitions: the variable viewVariable names and labelsForget all the nice conveniences you find in Windows and the Mac for handlinglong, interesting file names, or even the leniency that some database applicationsprovide in naming fields. In SPSS, variable names are eight characters, period. Andno funny characters like spaces or hyphens. Here are the rules:• Names must begin with a letter.• Names must not end with a period.• Names must be no longer than eight characters.• Names cannot contain blanks or special characters.• Names must be unique.• Names are not case sensitive. It doesn’t matter if you call your variable CLI- ENT, client, or CliENt. It’s all client to SPSS.Missing valuesIf you do not enter any data in a field, it will be considered as missing and SPSSwill enter a period for you.Non-numeric numbers, or when is a number not a number?Some numbers aren’t really numbers. That is, they’re numbers but you can’t usethem in mathematical calculations. Take a phone number, for example, or anaccount number or a zip code. You can sort them, but you can’t add or subtract ormultiply them. Well, you could, but the result would be meaningless. In essence,these numbers are actually just text which happens to be numeric. A good exampleis an address, which contains both numbers and text. We refer to these variables asstring or text variables.Binary variablesBinary variables are a special subgroup of numeric variables. Sometimes you treatthem as strings and sometimes you treat them as numeric. For example, yes/no,male/female, and 0/1 are all binary variables. That is, they have two and only twopossible values. Obviously, you can’t do a calculation on yes/no or male/female,BUT and this is a very big and very important BUT, you can recode these variablesinto numeric values, like assigning a value of 0 to female and 1 to male.SPSS Step-by-Step 15
• Creating a new data set Recoding binary variables is a critically important part of data analysis. Suppose, for example, that all you want to know is whether a specific event occurred. For example, suppose you want to know whether a client ever applied for welfare. As it happens, the data set you’re using contains the date when each person applied for welfare for the first time, if they ever did; if they didn’t the field is blank. Using this date, you can create a new variable called, say, welfapp that contains a 1 if there is any value in the first welfare application field and a 0 if there isn’t. Now you have a simple value (0 or 1) that you can use for further calculations or subsetting. Let’s get back to the male/female issue for a moment. Say you have recoded the variable into a 0 for female and a 1 for male. If you calculate an average (mean) for this variable, what you now have is a proportion. Say the average of your new vari- able is .45. You now know that there are somewhat more men than women in your population. In other words, you have calculated a proportion. Creating a new data set If you’re doing original research or, in our case, creating databases for clients, there comes a time when you have to create your own data file from scratch. In this task, you’ll create a new data set, define a set of variables, and then enter some data in the variables. You’ll also create some automatic data entry constraints to improve the accuracy of your data entry. In this task, you will create four types of variables: numeric, date, string, and binary. 1. From the menu, select File > New > Data. If you’re asked to save the contents of the current file, click No. 2. When the new file opens in the Data View, click the Variable View tab at the bottom of the window. 3. With the cursor in the Name column on the first row (referring to the name of the variable) type: clientid 4. In the Type column, click the build button (“build button” is actually a Microsoft term, but since SPSS’s documentation doesn’t give the button a name, we’ll use “build”) to open the Variable Type dialog box. (Figure 4)16 SPSS Step-by-Step
• Creating a new data set FIGURE 4. Variable type dialog box5. Select (click) String. Notice that you can now define the length of the variable (Figure 5). FIGURE 5. Defining the length of a string variable6. Select all the text in the Characters field and type: 147. Click OK. The dialog box closes and the variable is now set to a length of 14 with no decimal places.8. Press tab or Enter three times to move to the label column.9. Type: Client ID This is the label that will appear on all output and in dialog boxes like those you used in crosstabs and charts.SPSS Step-by-Step 17
• Creating a new data set 10. Press tab or Enter three times to move to the “columns” column. “Columns” defines the width of the display of the variable, not its actual contents. The dis- play width affects how the column will be displayed in output like crosstabs and pivot tables. 11. Select all the text in the “columns” column and type: 14 12. Leave the remaining columns as they are, with left alignment and “nominal” as the measure. 13. On the next row, click in the name column and type: gender 14. Press tab or Enter to move to the next column. 15. Click the build button to open the Variable Type dialog box. 16. Select String and click OK to accept the width. 17. Click in the Label column for gender and type: Gender Notice that in the variable labels, you can use upper and lower case as well as spaces and punctuation. 18. Press tab or Enter to move to the Values column. 19. Click the build button in the Values column to open the Value Labels window (Figure 6). FIGURE 6. Value labels window Note: Variable labels determine how the name of the variable is displayed in out- put. Value labels determine how each value is displayed. Thus, setting a18 SPSS Step-by-Step
• Creating a new data set label of “Female” for “f” in the gender variable instructs SPSS to display “Female” as a column heading for all cases with a value of f in gender.20. In the Value field, type: f21. In the Value Label field, type: Female22. Click Add.23. In the Value field, type: m24. In the Value Label field, type: Male25. Click Add.26. The Value Labels window should now look like Figure 7. FIGURE 7. Completed Value Labels window27. Click OK.28. On the next row, click in the Name column and type: employed29. Press tab or Enter or click in the Type field.30. Click the build button to open the Variable Type window.31. Employed is going to be a numeric, binary variable, so leave numeric selected, but change Width to 1 and Decimal Places to 0.SPSS Step-by-Step 19
• Creating a new data set 32. Click OK. 33. Tab to or click in the Label field and type: Employed year-end 34. Press tab or Enter to move to the Values field and click the Build Button. 35. In the Value field, type: 1 36. In the Value Label field, type: Yes 37. Click Add. 38. In the Value field, type: 0 39. In the Value Label field, type: No 40. Click Add. 41. Click OK. 42. On the next row, click in the Name field and type: nextelig 43. Press tab or Enter or click in the Type field and click the build button to open the Variable Type window. 44. Select Date by clicking it. 45. In the pane to the right, select the date format mm/dd/yyyy as in Figure 8. FIGURE 8. Selecting a date format20 SPSS Step-by-Step
• Creating a new data set46. Click OK.47. Tab to or click in the Label field and type: Next eligibility date48. Tab to or click in the columns field and set the column with to 12.49. From the menu, select File > Save As.50. Navigate to the A: drive and name the file TestData.sav.51. Click OK.52. Click the Data View tab. Notice that you now have four columns in which you’ll enter data for each record.53. On the first row, click in the clientid column and type: 483920954. In the gender column, type: f55. Press tab. Notice that when you leave the field, SPSS updates the field to the value label you assigned for f.56. In the Employed field, click the drop-down arrow and select Yes.57. In the nextelig field, type 4/1/2004.58. Use the following table to complete the data entry for this file. clientid gender employed nextelig 5533990 m No 6/1/2004 5938209 f Yes 4/15/2004 9583902 m Yes 6/1/2004 You have now seen how you can define variables, assign labels to both vari- ables and values, and define constraints that will control the type of data that can be entered. In future research, you’ll probably receive a file that has already been entered in another application such as Access, Excel, or some other appli- cation. If you’re doing your own data entry in SPSS, however, you should be aware of its data entry capabilities.SPSS Step-by-Step 21
• Getting help in creating data sets and defining variables Getting help in creating data sets and defining variables Now that you have had some practice in creating a data set, let’s review the kind of help SPSS provides to answer your questions. 1. From the menu, select Help > Topics. 2. On the Index tab, click in the field named: Type in the keyword to find: 3. Type: variable attri 4. Double-click the highlighted text in the list (Variable attributes) to display the topics found. 5. SPSS opens the Topics Found window with Applying Variable Definition Attributes highlighted. 6. Click Display. 7. The Help window is updated to this topic (Figure 9).22 SPSS Step-by-Step
• Getting help in creating data sets and defining variables FIGURE 9. Help window for Applying Variable Definition Attributes click one of the highlighted topics for more information8. Click the highlighted item Displaying or Defining Variable Attributes.9. When the window is updated, click Show Me. SPSS now opens the tutorial window that is included as part of the application. Any time you see Show Me in a help window, you can click it to see that specific section of the tutorial.10. Click the Next arrow a few times to see how the steps are illustrated.11. Close the Show Me window (the Internet Explorer window) to return to the standard help window.12. Under Related Topics, click Value Labels. As you can see, you can follow the links in the help file, start a new search, or select any of the topics listed on the left to move around in the SPSS Help system.13. Close the help window.14. From the menu, select File > Open > Data.15. Navigate to the Employee data.sav file on the floppy disk and open it.SPSS Step-by-Step 23
• Creating primary reference lists Creating primary reference lists There is one set of outputs you’ll create that is more important than anything else, and that is the set of primary references. Primary references describe your overall data set. In other words, how many in all? How many in each category? What are the maximums and minimums? Means? Standard deviations? Here’s our rule: list out the summaries and put them somewhere where you refer to them quickly. You may not always want to print out all the details of your data set. For example, printing out every single income for a data set of one million people, would not be useful, economical, or nice to either your printer or the trees. So here are the basic rules: print frequencies for categorical variables and descriptive (also called univariate) statistics for continuous variables. In this exercise, we’ll use the sample Employee dat.sav file. Frequencies 1. If it’s not already open, open the Employee dat.sav file by selecting File > Open and navigating to C:SPSSTutorialDataEmployee data.sav. 2. From the menu, select File > New > Draft Output. 3. From the menu, select Analyze > Descriptive Statistics > Frequencies. 4. Double-click Gender, Employment Category, and Minority Classification to move them to the Variables list. 5. Click the check box labeled Display frequency tables. 6. Click Statistics. 7. Make sure all the check boxes are cleared (not checked). 8. Click Continue. 9. Click Charts. 10. If it is not already selected, select None by clicking it. 11. Click Continue. 12. Click OK. 13. From the menu, select File > Save As. 14. Navigate to C:SPSSTutorialData and save the file as AllFreqs. When you’re working with your own file, follow these steps, then print the output so you’ll have it handy.24 SPSS Step-by-Step
• Creating primary reference listsDescriptive statistics: descriptives (univariate)The next step is to print the descriptive or univariate statistics for the continuousvariables.1. From the menu, select File > New > Draft Output.2. From the menu, select Analyze > Descriptive Statistics > Descriptives.3. Click Reset to clear any previous selections.4. Double-click Current Salary, Beginning Salary, Months since Hire, and Previ- ous Experience to move them to the Variables list.5. Click Options.6. In the Descriptives: Options window, click Mean, Std. deviation, Variance, Range, Minimum, Maximum, Kurtosis, and Skewness (Figure 10). FIGURE 10. Selected measures for Options7. Click Continue.8. Click OK.9. When the resulting table is displayed, notice that the variables you selected are listed as rows, while the statistics are listed in columns.10. From the menu, select File > Save As.11. Navigate to C:SPSSTutorialData and save the file as AllDescriptives.12. Notice that the statistic Range displays the distance between the minimum and maximum.SPSS Step-by-Step 25
• Recodes and Transformations When you are working with your own file, be sure to print this output and save it someplace handy (we use binders for, well, just about everything). These two print- outs, frequencies and descriptives give you a picture of the overall shape of your data. Recodes and Transformations Sooner or later, no matter how carefully you planned your data design, you’ll prob- ably want to work with some variables in different forms. If you collected income or age data, for example, you might want to group the continuous variables into cat- egories. Or you might want to create a variable that combines various conditions, say, all minority managers by gender. This type of data manipulation is called transforming or recoding. In this exercise, you’ll create several new variables, some that indicate multiple conditions and some that recode continuous variables into categorical variables. Backup the original file The first step before making any changes to your data file is: BACK UP YOUR DATA. And the easiest way to back up your data is to save it under another name. 1. If you don’t have the data view open, select it from the menu by selecting Win- dow > Employee data.sav - SPSS Data Editor. 2. From the menu, select File > Save As, then navigate to C:SPSSTutorialData. 3. Name the new file EmployeeData01 (Figure 11).26 SPSS Step-by-Step
• Recodes and Transformations FIGURE 11. Naming the new file Enter new name4. Click Save. Notice that the title bar of your window now identifies the file as EmployeeData01.sav. Now you can begin your transformations.Recoding existing variablesRecoding refers to assigning codes (or different codes) to an existing variable.NEVER ever, ever, EVER recode your variables into the same variable name (withone exception). That way lies madness. And chaos. For one thing it deletes yourexisting data. And for another it destroys the history of the data. Always create anew variable to contain the new codes.Recode income data1. From the menu, select Transform > Recode > Into Different Variables to open the Recode window (Figure 12).SPSS Step-by-Step 27
• Recodes and Transformations FIGURE 12. The SPSS Recode window 2. Double-click Current Salary to move it to the Input Variable --> Output Vari- ables pane. 3. Click Old and New Values to open window where you’ll create the new codes (Figure 13). FIGURE 13. Old and New Values window 4. Click the second Range radio button (lowest through) to activate its field (Fig- ure 14).28 SPSS Step-by-Step
• Recodes and Transformations FIGURE 14. Entering lowest income range click here to activate field enter maximum value for lowest range here5. Click in the Lowest through range and type: 249996. Click in the Value field under New Value and type: 1 In the new field, any incomes less than 25000 will have a value of 1. (Figure 15) FIGURE 15. Defining the new value for the lowest income rangeSPSS Step-by-Step 29
• Recodes and Transformations 7. Click Add. Notice that the complete definition of old and new values appears in the Old --> New pane (Figure 16). In the next steps you’ll define three more income ranges. FIGURE 16. Recode window with Old --> New values displayed old and new values displayed here 8. Under Old Value, click the first Range radio button to active the minimum and maximum values (Figure 17). FIGURE 17. Activating minimum and maximum range values click here to activate range fields 9. In the first range field, type: 2500030 SPSS Step-by-Step
• Recodes and Transformations10. In the second range field, type: 4999911. Under New Value, type: 2 Your window should now look like Figure 18. FIGURE 18. Recode window with minimum and maximum range values12. Click Add. Notice that the new definition is added to the Old --> New pane.13. Under Old Value, click the third range radio button to activate the Range ... through Highest field (Figure 19). FIGURE 19. Activating the range through highest field click here to activate Range . . . through Highest fieldSPSS Step-by-Step 31
• Recodes and Transformations 14. In the field to the left of through Highest type: 50000 15. Under New Value, type: 3 16. Click Add. Again, the new definition is added to the Old --> New pane. In the next steps, you’ll create a value to accommodate any odd values that might have been entered into the file. Even if you’re sure there aren’t any, check anyhow. 17. Under Old Value, click the radio button for All other values. Notice that there is no range to enter. 18. Under New Value, type: 4 19. Click Add. Now all possible definitions are entered under the Old --> New pane (Figure 20). FIGURE 20. Completed definitions for income ranges complete list of income range definitions 20. Click Continue. The Old and New Values window closes and the original Recode into Different Variables window is displayed. 21. In the field for Output Variable Name type: incrange 22. In the field for Output Variable Label type: Income Range32 SPSS Step-by-Step
• Recodes and Transformations The label is the name that will be displayed on all output, so you’ll want to make sure it’s informative and correctly formatted.23. Click Change. The new name is now listed in the Numeric Variable --> Output Variable pane (Figure 21). FIGURE 21. Recode window with output variable defined original name and output name displayed here24. Click OK. The Recode window closes and the data view is displayed. Notice that there is now a new column on the right containing the new range codes. (Figure 22) FIGURE 22. Data view showing new variable new variable25. Noticed that the codes are displayed with two decimal places. These should be simple integer codes, so in the next step you’ll change the format of the vari- able.SPSS Step-by-Step 33
• Recodes and Transformations 26. Double-click the name incrange to open the Variable View with that variable selected (Figure 23). FIGURE 23. Variable view with incrange selected 27. Double-click the 2 in the Decimals field and type: 0 28. Press tab or Enter to leave the field. 29. Click the Data View tab to see the corrected data. In the next step, you’ll sort the cases in ascending and descending order of incrange to see how the values were applied and to see if there are any values that were not included. 30. Click anywhere in the incrange column. 31. From the menu select Date > Sort Cases to open the sort window (Figure 24). FIGURE 24. Sort Cases window 32. Scroll down to Income Range and double-click it to move it to the Sort by pane. 33. Click OK. The cases are now sorted according to the lowest income range value. 34. From the menu select Data > Sort cases. Notice that your previous choices are still selected. 35. Click once on Income Range in the Sort By pane. 36. Click Descending under Sort Order.34 SPSS Step-by-Step
• Recodes and Transformations37. Click OK. The cases are now sorted with the highest income range value listed first. If there had been any entries that did not fall into the expected categories, they would be listed first, having a value of 4.Now let’s put the new variable to work and display the distribution of cases byincome range.1. From the menu, select New > Draft Output.2. From the menu, select Analyze > Descriptive Statistics > Crosstabs.3. In the Crosstabs window, click once on Gender, then click the right arrow to move it into the Rows pane.4. Click once on Income Range, then click the right arrow to move it into the Col- umns pane.5. Click Cells to open the Crosstabs: Cell Display window (Figure 25). FIGURE 25. Crosstabs: Cell Display window6. In the Percentages pane, click the check box for Row. In this case, row percent- ages will display the percent within gender in each income range. For counts, make sure Observe is checked.7. Click Continue.8. Click OK. The Crosstabs window will close and the new crosstabs will be dis- played in the draft output window. Notice that the income ranges are listed as 1, 2, and 3. Not very informative. So let’s go back and assign value labels for the new variable.9. Close the draft output window without saving.10. In the Data View, double-click the column heading for incrange to open the Variable View for that variable.SPSS Step-by-Step 35
• Recodes and Transformations 11. In the Values column, click the build button to open the Value Labels dialog box. 12. In the Value field, type: 1 13. In the Value Label field, type: < \$25,000 14. Click Add. 15. In the Value field, type: 2 16. In the Value Label field, type: \$25,000 - \$49,999 17. Click Add. 18. In the Value field, type: 3 19. In the Value Label field, type: \$50,000 or more 20. Click Add. 21. In the Value field, type: 4 22. In the Value Label field, type: Other 23. Click Add. 24. Click OK. 25. From the menu, select File > New > Draft Output. 26. From the menu, select Analyze > Descriptive Statistics > Crosstabs. 27. Your previous selections are still available, so click OK. In this exercise. you have worked through the entire process of recoding values into a new target variable and practiced sorting the cases to find the minimum and max- imum values of the new variable. Finally, you used the newly defined income36 SPSS Step-by-Step
• Recoding variables revisitedranges to create a crosstab report showing the number and percent of employeeswithin each income category by gender.Recoding variables revisitedEarlier, we mentioned that there is an exception regarding recoding into the samevariables.The one exception in recoding variablesIn fact, there is a case where you might want to recode a value into the same vari-able, and that is the case of missing or unknown values.BACK UP YOUR DATA FIRST!In some cases, you might receive a data set which has missing or just plain odd val-ues in one or more variables. (This will, of course, be a data set you received fromsomeone else, as you would never do anything so silly.) What you can then do is,after identifying the odd values, recode them to a standard value that you will use torepresent missing data. You can use any value you want that does not already occuras a valid value in the existing data. For example, suppose you have a field that wassupposed to be numeric but someone entered hyphens to indicate that they saw thefield but didn’t have any data to enter. (Don’t laugh. We’re working on a projectwith just that problem.) You might then review all the data, find a value that doesnot already exist as a valid value and change the hyphens to that value. For exam-ple, if you work with legacy data sets, you’ll probably find a value of 9 or 99 usedto represent missing data.The other exceptionThere are cases (we hope they’re rare) where you’ll want to recode unacceptablevalues to a flag value that you can use in subsetting your data. For example, sup-pose you’re surveying income and you find that values range from 0 to 55000,except for ten values that are all greater than ten million. As it happens, you can’tgo back to confirm the odd values, but you don’t want to use them in your calcula-tions. In that case, you might want to recode anything over, say, 60000 to a flagvalue like 99999. In your calculations, you then have a standard value you can useto exclude any outliers or suspicious data from your analyses. As usual, however,be sure to back up your original file before recoding any variables.SPSS Step-by-Step 37
• Recoding variables revisited On the whole, however, we prefer to recode into different variables. BE SURE TO RECORD ALL CHANGES TO THE DATA SET.38 SPSS Step-by-Step
• 3 Charting your data Okay, you’ve done all the hard work, created the data collection instrument, con- ducted the interviews or surveys, entered the data, cleaned the data and made any necessary recodes or transformations, and now it’s time to find out what it all means. One of the best ways to get an idea of what your data looks like (literally) is to generate charts. Charts provide visual displays of comparisons and relationships. They can also be extremely misleading in the hands of a bad analyst, so be careful. If you’re going to be working with charts, two excellent sources of information and guidelines are Edward Tufte’s work on visual representations and Gerald Jones’s book How to Lie with Charts.1 A word on charts: keep them simple. The purpose of charts is to illuminate relation- ships and comparisons, not to obscure them. Eschew what Tufte calls chart junk and go for simple, clean, and clear designs. But that’s a whole other course. SPSS excels at charts. Many of its functions exceed even those of Microsoft Excel and Microsoft Access, which are pretty good on their own. SPSS provides three methods of creating charts: you can use the automated chart function, you can use the interactive chart function, or you can start with the blank chart and build it from 1. Jones, Gerald E., How to Lie with Charts, Sybex 1995. Tufte, Edward R., The Visual Dis- play of Quantitative Information, Graphics Press, 1983, Envisioning Information, Graph- ics Press, 1990, Visual Explanations, Graphics Press, 1997. SPSS Step-by-Step 39
• Using the automated chart function scratch. For the charts you’ll be creating here, you’ll use all three methods to create charts that illustrate gender distribution within job categories. Using the automated chart function 1. If it not already open, open the data file named Employee Data.sav in the folder named SPSSTutorialData. 2. From the menu, select File > New > Draft Output. 3. From the menu, select Graphs > Bar. (Figure 26) FIGURE 26. Creating a simple bar chart 4. Make sure Simple is selected. 5. Click Define. SPSS then displays the Define Simple Bar window (Figure 27) where you will begin to select the variables to be displayed.40 SPSS Step-by-Step
• Using the automated chart function FIGURE 27. Define simple bar window6. Click once on Employment Category, then click the right arrow to move it to the Category Axis field.7. Click OK. SPSS displays the completed chart. Not too interesting, right? Let’s make it a little more interesting.8. From the menu, select Graphs > Bar.9. This time, select (click) Clustered, then click Define.10. Click once on Employment Category, then click the right arrow to move it to the category axis.11. Click once on Gender, then click the right arrow to move it to Define Clusters By.12. Click OK. Much more interesting. In the next step, you’ll improve your chart with explanatory titles.13. From the menu, select Graphs > Bar.14. This time, select (click) Clustered, then click Define.15. Click once on Employment Category, then click the right arrow to move it to the category axis.16. Click once on Gender, then click the right arrow to move it to Define Clusters By.17. Click Titles.SPSS Step-by-Step 41
• Using the automated chart function 18. On the first line, type: Job Category Distribution by Gender 19. On the first line under Footnote, type: Print date: 3/16/2004 20. On the second line under Footnote, type: Data source: SPSS sample data file, Employee dat.sav 21. Click Continue. 22. Click OK. Using the Interactive Chart function While the automated chart function is handy, it limits the degree to which you can customize your chart. The interactive function provides more flexibility and power. In the interactive chart window, you’ll drag variables to the axis where you want them displayed. 1. From the menu, select Graphs > Interactive > Bar. 2. Click once on Employment Category and move it to the horizontal axis, as in Figure 28.42 SPSS Step-by-Step
• Using the automated chart function FIGURE 28. Moving a variable to the horizontal axis dragging a variable to the horizontal axis3. Click OK. Now you have a simple bar chart, more or less like the first one you created. So let’s add some information.4. From the menu, select Graphs > Interactive > Bar. Notice that when the window opens, it still contains the information from your last chart.5. This time, drag Gender to the field under Legend Variables called Color (Figure 29).SPSS Step-by-Step 43
• Using the automated chart function FIGURE 29. Dragging a variable to the legend color field drag a variable to the legend color field to add a grouping variable to the chart 6. Click OK. Now you have a chart with employment category broken out by gen- der. Now you could add several more grouping variables to the chart, but it would begin to look pretty cluttered. To solve this problem, Tufte came up with the idea of small multiples, the practice of displaying smaller charts with each chart displaying only one of the grouping variables. In SPSS, such charts are called panel charts. 7. From the menu, select Graphs > Interactive > Bar. 8. Notice that your previous choices are still displayed. This time, however, drag Gender from the Legend Variables field down to the field for Panel Variables as in Figure 30.44 SPSS Step-by-Step
• Using the automated chart function FIGURE 30. Dragging a variable to the Panel Variables field drag one or more variables to the Panel Variables pane to create a separate chart for each grouping variable9. Click OK. The Draft Output window now displays two charts, one with employment category distributions for women, the other with the same informa- tion for men.Creating a chart from scratchIn the Output window (not the Draft Output window) you can create a chart fromscratch, with complete control over every aspect of the graph.1. From the menu, select File > New > Output.2. From the menu in the Output window, select Insert > Interactive 2-D Graph. SPSS displays the new, empty graph. (Figure 31)SPSS Step-by-Step 45
• Using the automated chart function FIGURE 31. Blank interactive graph 3. Move your cursor slowly over the various icons displayed on the graph. Notice that SPSS displays a description of each icon. 4. Click the Assign Graph Variables icon (Figure 32).46 SPSS Step-by-Step
• Using the automated chart function FIGURE 32. Selecting the Assign Graph Variables tool click here to assign variables to the graph5. SPSS opens the Assign Graph Variables window in Figure 33. FIGURE 33. Assign Graph Variables windowSPSS Step-by-Step 47
• Using the automated chart function 6. From the left panel, drag Count [\$count] to the field for the y axis (Figure 34). FIGURE 34. Dragging a variable to the y axis 7. Now drag Employment Category to the x axis (Figure 35). FIGURE 35. Dragging a variable to the x axis. 8. Finally, drag Gender to the field for Legend Variables Color (Figure 36).48 SPSS Step-by-Step
• Using the automated chart function FIGURE 36. Dragging a variable to the Legend Variables Color field9. Click the X close button. Notice that the chart has the axis variables assigned but still has no data.10. From the menu, select Insert > Summary > Bar. Your initial chart is now com- plete.In fact, SPSS graphing functions go far beyond what we have explored here. Youcan customize your chart by creating different types of charts (cloud, scatterplots,line graphs, etc.) and by adding elements such as value labels, titles, notes, andmuch more. Now that you have a general feeling for how graphs work in SPSS,take some time to explore other functions. If you have any questions, be sure torefer to the Help menu.SPSS Step-by-Step 49
• SPSS Step-by-Step 50
• SPSS Step-by-Step Tutorial: Part 2 For SPSS Version 11.5
• 1 Transformations and recoding revisited 5 Introduction 5 Value labels 5 SPSS Tutorial and Help 6 Using online help 6 Using the Syntax Guide 7 Using the statistics coach 8 Moving around the output window 10 Sorting Revisited: Sorting by multiple variables 11 Utilities: variable and file information 12 Utilities > Variables 12 Utilities > File Info 13 Data Transformations 14 Computing new variables 14Performing calculations with a variable and a function 14Creating expressions with more than one variable 16 Conditional expressions 18 Creating subsets 20 Deeper into crosstabs 23 Crosstab Statistics 23 Crosstab cells 24 Adding layers to crosstabs 26 When to include zeros in a mean 27 Gender, geography, and exercise: the universal variables 28 Summary 282 Statistical procedures 31 Introduction 31 Measuring association 31 Bivariate correlations 32 Partial correlation 35
• Multiple correlation (multiple regression) 36 Crosstabs 37 Measuring differences 39 T-Tests 39 ANOVA 42One-Way ANOVA 43 Summary 47
• 1 Transformations and recoding revisited Introduction In the first session, we’ll explored the SPSS interface, some elimentary data man- agement and recodes, and some basic charting. In this second session, we’ll explore work with more complex data transformations like combining variables and subset- ting populations and work with some of the primary statistical functions. We’ll also look more closely at the online help and tutorial provided by SPSS. . But first, a small clean-up task from last week: displaying and hiding value labels. Value labels 1. Open SPSS. 2. Open the data file by selecting File > Open > Data and finding the file Employee data.sav in the folder named SPSSTutorialData. 3. Make sure you’re in the Data View of any data file. 4. From the menu, select View > Value Labels. If Value Labels is checked, the value labels will be displayed for variables for which you have defined value labels. If it is not checked, the actual values will be displayed. SPSS Step-by-Step 5
• SPSS Tutorial and Help 5. Select View from the menu again and make sure that Value Labels is checked. If it isn’t, click it once to select it. You can turn value labels on or off at any time during an SPSS session. SPSS Tutorial and Help SPSS provides extensive assistance through its online help, tutorial, syntax guide, and statistics coach. Using online help 1. From the menu, select Help > Topics. 2. In the keyword field, type: crosstabs Notice that SPSS begins matching topics as soon as you begin typing. 3. From the list below the keyword field, double-click assumptions. 4. From the keyword list, double-click formats. Notice that the resulting help window informs you that you can “arrange rows in ascending or descending order of the values of the row variable.” What’s wrong with this statement? Hint: In the rest of the computer world, “format” applies to how you display text or numbers. Arranging in ascending or descending order is called sorting. Lesson: If a standard already exists, use it; don’t confuse people by making it up as you go along. 5. From the Related topics list, click once on -Related procedures. The help win- dow now displays information about modeling relationships between two or more categorical variables. 6. From the Related topics list, click once on Model Selection Loglinear Analy- sis Data Considerations. The help window now displays more information about this technique. 7. In the keyword field, type: chi-square 8. From the keyword list, double-click Chi-square test. 9. In the next window, click Display. Notice that the help window now displays general information about the Chi-square test. In addition to the list of related topics, the window also contains a Show Me link.6 SPSS Step-by-Step
• SPSS Tutorial and Help10. Click Show Me. SPSS now opens the tutorial to the chi-square topic in the form of an Internet page.11. Click Next. In addition to an example of how to use a chi-square test, the win- dow also identifies the sample data file you can use to follow the example for yourself.12. Click Next.13. Read the text on the right side of the screen. Here is where the tutorial explains each step. And yes, that is a typo in “you must first be weight the cases …” Ignore the “be.”14. Click Next and read the steps.15. Click Next and read the steps.16. Click Next.17. Click Next. SPSS now displays the sample output.18. Close the tutorial window.19. Close the Help window.As with most help systems, you can use links to investigate topics related to thekeyword you selected. In SPSS, however, you can also open the online tutorial toget more information about using a specific procedure.Using the Syntax GuideIf you need information about SPSS syntax, you can open the online Syntax Guide.This guide explains each command and provides examples of its use. The SyntaxGuide is in Adobe Acrobat format. In this format, you can search the guide for spe-cific text or use the Bookmarks pane to find a specific command.1. From the menu, select Help > Syntax Guide > Base.2. Look at the bottom of the window where Acrobat lists the page count. Yes, that is 1490 pages. In other words, if you want, you can print the entire Syntax Guide.3. In the bookmarks pane, click the “+” to the left of UNIVERSALS. The topic opens to display its subtopics.4. Click the “+” to the left of Commands. (The Commands under Universals, that is, not the Commands further down the list.)5. Click Syntax Diagrams. This topic provides information about the basic struc- ture of SPSS syntax.SPSS Step-by-Step 7
• SPSS Tutorial and Help 6. Now click the “+” to the left of the top-level COMMANDS topic. The window opens to display the subtopics for COMMANDS. 7. Scroll down until you can see the CROSSTABS entry. 8. Click the “+” to the left of CROSSTABS. 9. Click the word CROSSTABS. The Crosstabs page is now displayed and pro- vides information about the complete format of the Crosstabs command. 10. Close the Syntax Guide window. Using the statistics coach One of the most useful functions in SPSS is the statistics coach, particularly when you’re just starting to work with the program. The statistics coach provides prompts at which you can select what you want to do, the kind of data you’re using, and the kind of output you want. 1. From the menu, select Help > Statistics Coach. SPSS opens the first Statistics Coach window (Figure 1). FIGURE 1. Statistics Coach, opening window8 SPSS Step-by-Step
• SPSS Tutorial and Help2. Take a moment to review the choices offered in this window.3. Click More Examples a few times and notice that different types output avail- able to you.4. For the moment, we’ll use the default task, Summarize, describe, or present data, so click Next.5. In this case, we want to create a summary of gender by job category. Both vari- ables are categorical, so click Next.6. This time we’ll change the output from the default, so select Charts and graphs by clicking its radio button.7. We want a simple two-dimensional chart, so click Next.8. Click Next.9. We want a bar chart, so click Finish.10. SPSS now opens the correct window for creating a bar chart with the type of data we have selected.11. Drag Employment Category to the horizontal axis.12. Drag Gender to the Legend Variables > Color field.13. Close the remaining help window.14. Click OK. The chart appears in the output window. Next, we’ll use the statistics coach for a more complicated task.15. From the menu, select Window > Employee Data.sav - SPSS Editor to get back to the Data window.16. From the menu select Help > Statistics Coach.17. Select Compare groups for significant differences.18. Click Next.19. We’re using categorical data, so click Finish. The How To window appears to guide you through the steps along with the Crosstabs window. (You might have to move them around a bit so you can see both windows.20. Select Gender and move it to the Rows pane.21. Select Employment Category and move it to the Columns pane.22. Click Statistics.23. In the Crosstabs: Statistics window, select Chi-square.24. Click Continue.25. In the How To window, click Tell Me More. SPSS now displays the Data Requirements window that applies to using Chi-square.26. Click Next. Surprise! There is no next topic. Ha, ha, SPSS. Very funny.SPSS Step-by-Step 9
• Moving around the output window 27. Click OK. 28. Click Back. Oh look! There is no Back. 29. Close the Data Requirements window. 30. In the Crosstabs window, click OK. The output is now displayed in the Output window. 31. Close any remaining Help windows. Moving around the output window Now that you have created some charts, crosstabs, and statistics results, it’s a good time to take a closer look at moving around the output window. 1. From the menu, select Window > Output1 - SPSS Viewer. 2. Move your cursor over the border between the panels, hold down the mouse button, and move the border to the right until you can read all the titles in the icons in the contents pane as in Figure 2. FIGURE 2. Changing pane width in the output window hold and drag here to change the pane size in the output window 3. Notice the icons on the left arranged in outline format. 4. Click the icon named Interactive Graph. The output displays moves to the first graph you created in this session.10 SPSS Step-by-Step
• Sorting Revisited: Sorting by multiple variables5. Notice the “-” to the left of the Output icon. The “-” indicates that a topic is fully expanded.6. Click the “-” next to Output. The “-” changes to a “+” and all the output is now hidden. If you ever “lose” output on the window, check to see if the output is hidden.7. Click “+” next to Output to expand the items again.8. Click the icon named “Crosstabs.”9. Holding down the left mouse button, drag “Crosstabs” up above “Interactive Chart.” You can use the drag function to arrange your output in any order you like.10. Below Crosstabs, click the icon for Title.11. Click the icon for Chi-square tests. Notice that the red arrow next to the icon corresponds to the red arrow in the actual output window.12. Under Interactive Graph, click Bar Chart.13. Above the actual chart, double-click the title (“Interactive Graph”).14. Select all the text and type: Distribution of job category by gender15. Click anywhere outside the title to apply the change.16. In the navigation pane, click Title under Interactive Graph.17. Wait about one second, then click Title again to activate the text.18. With the text selected, type: Distribution of job category by gender19. Press Enter or click anywhere outside the title to apply the change. Use the title function in the navigation pane to indicate what each piece of out- put it contains. “Distribution of job category by gender” is a lot more informa- tive than “Title.”Sorting Revisited: Sorting by multiple variablesSPSS provides sophisticated sorting functions. You can sort by multiple variables,and you can set the sort order for each variable. For example, you could sort inorder of increasing income, decreasing birthdate, and increasing expenditures. Inthe following task, you’ll sort the employee data set by gender (increasing) and cur-rent salary (decreasing).SPSS Step-by-Step 11
• Utilities: variable and file information 1. Switch back to the data view by selecting Window > Employee data.sav - SPSS Data Editor. 2. From the menu, select Data > Sort Cases. 3. Clear any criteria that might already be in the Sort by pane by double-clicking them. 4. Double-click gender and current salary to move them to the Sort by pane. 5. Click once on gender. 6. If it is not already selected, select Ascending. 7. Click once on current salary. 8. Click Descending. 9. Click OK. Notice that all female employees are now listed first, in descending salary order. Utilities: variable and file information Tucked quietly under the Utilities menu are two especially useful functions: Vari- ables and File Info. You can use these functions to get a snapshot of each variable in the file (Variables) and all variables together (File Info). Utilities > Variables The Variables function provides all the information about each variable in your data file, including any categorical codes and their value labels. 1. From the menu, select Utilities > Variables. (Figure 3)12 SPSS Step-by-Step
• Utilities: variable and file information FIGURE 3. Variables window2. In the variable list, click jobcat. Notice that the Variable Information pane dis- plays the variable name, label, defined missing value, measurement level, val- ues, and value labels.3. Click salary. This value is a scale variable (continuous) and so has no value labels.4. The Go To button takes you to the specific variable within a selected case or to the variable in the first case if no case is selected.5. Click Close.Utilities > File InfoThe File Info window provides information about all variables in the file. This isextremely useful information. We recommend that you print out the file definitionregularly and keep it close at hand.1. From the menu, select Utilities > File Info.2. Scroll through the output to see how each variable is described. Because this information goes to the output window, be sure that you print only the File Info output.3. In the navigation pane, click once on the File Information icon.4. Press Ctrl-P to open a print dialog window. Notice that under Print Range you can select All Visible Output or Selection. When you print the File Info, be sure to select Selection.5. Click Cancel.SPSS Step-by-Step 13
• Data Transformations Data Transformations SPSS provides a number of funtions you can use in computing new variables, including: • Tarithmetic funtions • statistical functions • string functions • date and time functions • distribution functions • random variable functions • missing value functions In this session, we’ll be looking at only the arithmetic functions. Computing new variables Performing calculations with a variable and a function In some cases, you might want to calculate new variables based on values in exist- ing variables and some arithmetic function like multiplying or dividing. For exam- ple, if you have a variable that contains an annual salary, you might want to calcuate a monthly salary. To create the new variable, you use the Compute func- tion. 1. In the Data window, select from the menu Transform > Compute. (Figure 4)14 SPSS Step-by-Step
• Data Transformations FIGURE 4. Compute window2. In the Target Variable field, type: salmonth3. Click Type & Label. (Figure 5) FIGURE 5. Compute: Type and label window4. In the Label field, type: Average monthly salary5. Click Continue.6. In the Compute Variable window, select Current Salary and move it to the Numeric Expression pane by clicking the right arrow.7. In the Numeric Expression pane, click the cursor after salary and type: /12SPSS Step-by-Step 15
• Data Transformations Your window should now look likeFigure 6 FIGURE 6. Entering a compute formula 8. Click OK. The Compute Variable window closes and the new variable is dis- played in the Data window. You can now use the new variable in procedures such as crosstabs or in further calculations. For example, you could create a new variable for monthly withholding that calculates withholding as a percentage of monthly salary. You could then subtrack the new withholding variable from the monthly salary to create still another variable for monthly net. Creating expressions with more than one variable Let’s use the previous example of calculating withholding and net to compute vari- ables based on more than one variable. First you’ll compute the withholding vari- able, then you’ll compute the net variable. 1. From the menu, select Transform > Compute. 2. In the Target Variable field, type: withhold 3. Click Type & Label. 4. In the Label field, type: Monthly withholding 5. Click Continue. 6. Select all the text in the Numeric Expression field and delete it.16 SPSS Step-by-Step
• Data Transformations7. Move the new variable, Average Monthly Salary, to the Numeric Expression field.8. Click after salmonth and type: * .05Note: If you haven’t worked with computer programs before to make calculations, the asterisk denotes multiplication. A double asterisk (**) denotes exponen- tiation. In SPSS, a vertical bar (|) denotes “OR”, and the ampersand (&) denotes “AND”.9. Click OK. The new variable appears in the data view. In the next step, you’ll use two variables to calculate a third.10. From the menu, select Transform > Compute.11. In the Target Variable field, type: netmonth12. Click Type & Label.13. In the Label field, type: Monthly net14. Click Continue.15. Select all the text in the Numeric Expression field and delete it.16. From the list of variables, select Average Monthly Salary and move it to the Numeric Expression field.17. Click after salmonth in the Numeric Expression field.18. Using the keypad in the Compute Variable window, click “-”.19. From the list of variables, select the new variable Monthly Withholding.20. Click the right arrow to move it to the Numeric Expression pane. Your Compute Variable window should now look like Figure 7.SPSS Step-by-Step 17
• Data Transformations FIGURE 7. Complete Compute Variable window for monthly net 21. Click OK. The new variable appears in the data view. Conditional expressions In some cases, you might want to look at only a specific subset of your data. Say you want to send a monthly newsletter to only female clerical staff. To identify these staff, you’ll calculate a new binary variable (one that has only two values) using the IF statement to set the condition. 1. From the menu, select Transform > Compute. 2. In the Target Variable field type: femclerk 3. Click Type & Label. 4. In the Label field type: Female Clerical 5. Click Continue. 6. Select all the text in the Numeric Expression field and delete it. 7. In the Numeric Expression field type: 1 8. Click If to open the Compute Variable: If Cases window (Figure 8).18 SPSS Step-by-Step
• Data Transformations FIGURE 8. Compute Variable: If Cases window9. Select Include if case satistifes condition.10. Double-click Gender to move it to conditions field.11. Click after Gender in the conditions field and type: = “f”Note: Whenever you create a condition, you must use the actual values in the vari- able, not their labels. Thus, setting a condition to gender = “Female” would not select any cases.12. Click after “f” and type a space.13. Using the keypad in the Compute Variable window, click &. You use the ampersand to add a second condition.14. From the field list, double-click Employment category to move it to the calcu- lation pane.15. In the calculation pane, type: =1 Note that you don’t use quotation marks this time because is a numeric variable.16. Click Continue.SPSS Step-by-Step 19
• Data Transformations 17. Click OK. The new variable appears in the Data window. Scroll through the records to see how the values in the new variable. Notice that cases where gen- der is not female and job category is not manager have only a period, indicating a missing value. Only those cases where gender is female and jobcat is manager contain a 1 in the new variable. Creating subsets In some instances, you might want to use only part of the file in an analysis. For example, you might want to look at changes in income among single working mothers. Or you might want to consider only staff born before a specific date. To select a subset of the cases in your file, 1. From the menu, select Data > Select Cases (Figure 9). FIGURE 9. Select cases window 2. Select If condition is satisfied by clicking its radio button. 3. Click If.... Notice that the Select Cases: If window looks exactly like the If win- dow you used in the earlier compute procedures. 4. From the variable list, double-click Date of Birth. 5. Click the cursor anywhere after bdate in the calculation pane.20 SPSS Step-by-Step
• Data Transformations6. Type (or select from the keypad): <7. Scroll through the Function menu and double-click DATE.MDY(month,day,year). (Figure 10) FIGURE 10. Selecting a function scroll here to select a function In the next step, you’ll set the date criterion. SPSS adds the function to the cal- culation pane, substituting question marks to indicate that you need to specify the values.8. Select the first question mark and type: 19. Select the second question mark and type: 110. Select the third question mark and type: 1940 Your completed window should look like Figure 11.SPSS Step-by-Step 21
• Data Transformations FIGURE 11. Completed Select Cases: If window 11. Click Continue. 12. Click OK. Notice that many of the records are marked with a diagonal line through the record number. These cases are excluded from any further calcula- tions until you specifically include them again. 13. To see the effect of the subset selection, right click the heading for bdate. 14. From the pop-up menu, select Sort Ascending. Notice that all employees born before 1940 are selected, except for the person with the missing date of birth. In the next step, you’ll instruct SPSS to include all cases until otherwise instructed. 15. From the menu select Data > Select Cases. 16. Select All Cases by clicking its radio button (Figure 12).22 SPSS Step-by-Step
• Deeper into crosstabs FIGURE 12. Selecting all cases click here to include all cases17. Click OK. The diagonal lines appear to be gone, but to be sure, right-click the heading of bdate again and select Sort Descending, so that the youngest employees are listed first. Notice that they are no longer excluded.Deeper into crosstabsCrosstab StatisticsWhen you the various statistical techniques, SPSS will frequently tell you whatkind of statistical tests are available for that procedure. For example, if you ask forcrosstabs, SPSS offers a number of statistics based on the type of data you’re using.(Figure 13)SPSS Step-by-Step 23
• Deeper into crosstabs FIGURE 13. Statistics indicated by data type in Crosstabs: Statistics window data type available tests For example, when you select a statistic like Chi-square, SPSS indiates the particu- lar Chi-square technique that should be used based on the type of data. If you are using nominal or ordinal data, SPSS provides a number of methods you can include in your output. To see how SPSS makes the selection of techniques and to see the description of the data types and corresponding statistics, try the following. 1. From the menu, select Help > Topics. 2. Click the Index tab and enter: crosstabs 3. From the list below your entry, select Statistics. SPSS Help displays a descrip- tion of the types of statistics to select based on the type of data you’re using. 4. Close the Help window. Crosstab cells Using the Crosstab cell display window (Figure 14) you can determine what data will be displayed in each cell of the crosstab.24 SPSS Step-by-Step
• Deeper into crosstabs FIGURE 14. Crosstab cell displayTry it:1. From the menu, select Analyze > Descriptive Statistics > Crosstabs.2. Move Gender to the Rows list.3. Move Employment Category to the columns list.4. Click Cells.5. Under Percentages, select Row, Column, and Total.6. Click Continue.7. Click OK. SPSS displays the completed crosstab in the output window. Gender * Employment Category Crosstabulation Employment Category Clerical Custodial Manager Total Gender Female Count 206 0 10 216 Row % 95.4% .0% 4.6% 100.0% Column % 56.7% .0% 11.9% 45.6% Total % 43.5% .0% 2.1% 45.6% Male Count 157 27 74 258 Row % 60.9% 10.5% 28.7% 100.0% Column % 43.3% 100.0% 88.1% 54.4% Total % 33.1% 5.7% 15.6% 54.4% Total Count 363 27 84 474 Row % 76.6% 5.7% 17.7% 100.0% Column % 100.0% 100.0% 100.0% 100.0% Total % 76.6% 5.7% 17.7% 100.0%SPSS Step-by-Step 25
• Deeper into crosstabs Notice that each cell contains the count, the percent of the row, the percent of the column, and the percent of the total. Crosstabs like this are useful for both a general overview and closer study of your data. For publication, however, you may want to simplify the output by including only a column or row percentage, depending on the issue you’re addressing. Suppose you only want to know the distribution of job categories within gender. In the next task, you’ll create a cross-tab that includes only row percentages. 8. From the menu, select Analyze > Descriptive Statistics > Crosstabs. 9. Move Gender to the Rows pane. 10. Move Employment to the Columns pane. 11. Click Cells. 12. Under percentages, make sure only Row is selected. Clear any others that are already selected. 13. Click Continue. 14. Click OK. Your new output will look like Figure 15. If you wanted to know per- centages of each job category across gender, you would select only column per- centages. FIGURE 15. Crosstab showing only row percentages Gender * Employment Category Crosstabulation Employment Category Clerical Custodial Manager Total Gender Female Count 206 0 10 216 Row % 95.4% .0% 4.6% 100.0% Male Count 157 27 74 258 Row % 60.9% 10.5% 28.7% 100.0% Total Count 363 27 84 474 Row % 76.6% 5.7% 17.7% 100.0% Adding layers to crosstabs So far, we have worked with just a single variable for rows. You can make your crosstabs much more specific, however, by adding multiple row variables or by adding layers. In this exercise, you’ll create layered crosstabs that provide a more detailed breakdown of the data. 1. From the menu, select Analyze > Descriptive Statistics > Crosstabs. Notice that the variables from the previous crosstab are still selected.26 SPSS Step-by-Step
• When to include zeros in a mean2. From the variable list, move Minority Classification to the Layer pane.3. Click OK. Notice that you’re still getting row percentages only because we did not reset the cells information. Notice also that the Layers variable becomes the uppermost level, followed by gender. In the next task, you’ll reverse that order.4. From the menu, select Analyze > Descriptive Statistics > Crosstabs. Notice that the variables from the previous crosstab are still selected.5. In the Rows pane, double-click Gender to move it back to the variable list.6. In the Layers pane, double-click Minority Classification to move it back to the variable list.7. Move Minority Classification to the Rows pane.8. Move Gender to the Layers pane.9. Click Cells.10. Under percentages, select Row, Column, and Total.11. Click Continue.12. Click OK. In the new output, the crosstab displays Minority Classification within Gender.You can continue to add layers and multiple rows to your crosstabs. RememberRobin’s rule, however: ALWAYS GET THE UNIVERSE FIRST. That is, print outthe most general crosstabs before getting into the detail.When to include zeros in a meanWe were once working on a project looking into some of the economic issues ofchild care, and one of the researchers asked if we should include in the calculationof the average price the number of people who didn’t use child care. The answer is:it depends. If you want to look at, say, an average price paid, then, no, you wouldnot include people who didn’t buy the product. If, on the other hand, you want toknow the per capita cost, then, yes, you would include everyone.The question involves the issue of whether to include zero values in calculating sta-tistics such as means or standard deviations or conducting statistical tests such as t-tests and ANOVAs. And to some degree the answer lies in the question itself. Ifyou say, “How much did people pay for child care?” (or gasoline or televisions orclothes) then you want to look at the actual purchase price among those who actu-ally purchased the product. If, for example, I tell you that the average per capitacost of gasoline is thirty cents a gallon, that’s not going to tell you what to expectSPSS Step-by-Step 27
• Gender, geography, and exercise: the universal variables the next time you drive up to the pump. What you really want to know is, what’s the average price today in this particular area. If, on the other hand, you’re working for the Council of Economic Advisors and you want to know how the cost of gaso- line factors into the overall expenditures of an average family, you do want to use a per capita cost. The other question to consider is whether zero is a valid value in your data set. In medical research, for example, particularly in dose-response research or lab values, zero is obviously a valid value. In other cases, such as a five-point Likert scale beginning with 1, zero is not a valid value and should be treated as missing on an error. In other words, the decision about whether to include zero in a particular test depends on whether it’s a valid value and on the particular question you’re asking. Gender, geography, and exercise: the universal variables There are certain variables that will affect nearly any statistic or test you use. Our particular favorites are gender, geography, and exercise. These are variables whose effect is so pervasive that failing to take them into account can seriously affect the validity of your research. There are others, of course, that will affect whatever data you work with to varying degrees. Age, of course, is certainly one, along with diet and ethnicity. If you’re conducting medical research, for example, you must always take into account age, gender, and ethnicity. More and more, however, the level of exercise is being included as a concomitant variable‘. We once had a research psy- chologist tell us that “Exercise is implicated in every variable we look at.” In social science research, age, gender, ethnicity, and education are critical factors. The moral: when you are designing a research project, make sure you have accounted for all the variables that might affect the outcomes, not just the ones of immediate interest. Summary If you think we have spent an awful lot of time dealing with data management and crosstabs, you’re right. You’ll spend about eighty percent (a very rough guess) of your time on these two tasks. Then, finally, when you have the data exactly the way you need it, you run a couple of quick statistics and then . . . you start all over again. Remember that data analysis is iterative, so get ready now to do the same types of tasks over and over and over.28 SPSS Step-by-Step
• SummaryAnd over.SPSS Step-by-Step 29
• Summary30 SPSS Step-by-Step
• 2 Statistical procedures Introduction This tutorial is not a replacement for a course in basic statistics or for a textbook on that subject. The primary emphasis here is on the use of SPSS to explore data and to answer some of the statistical questions you might ask about your data. In this chapter we will review some of the SPSS procedures for evaluating the asso- ciation among two or more variables, and procedures for measuring differences among groups. Measuring association Typically the association between two variables is evaluated by using a bivariate correlation procedure. If the two variables are continuous and you want to predict one variable using the value of the other, a simple linear regression or some method of curve estimation can be used. If there are more than two variables and they are continuous, use a partial or a mul- tiple correlation procedure. If you want to predict one of the variables using the val- ues of the other variables, a multiple regression can be used. SPSS Step-by-Step 31
• Measuring association If you have frequency distributions based upon one or more categorical variables, you should consider crosstabulation or Chi-square. A categorical variable could be one that naturally exists, such as eye color, or could be derived by “categorizing” a continuous variable. Bivariate correlations Bivariate correlations measure the degree of association between two variables. If the two variables are continuous, the Pearson product moment correlation is an appropriate measure. If they are not continuous (that is, if they are discrete or cate- gorical), it would be more appropriate to use Spearman’s rho or Kendall’s tau-b. The correlation coefficient, which ranges from -1 to +1 is both a measure of the strength of the relationship and the direction of the relationship. A correlation coef- ficient of 1 describes a perfect relationship in which every change of +1 in one vari- able is associated with a change of +1 in the other variable. A correlation of -1 describes a perfect relationship in which every change of +1 in one variable is asso- ciated with a change of -1 in the other variable. A correlation of 0 describes a situa- tion in which a change in one variable is not associated with any particular change in the other variable. In other words, knowing the value of one of the variables gives you no information about the value of the other. The correlation squared is another measure of the strength of the relationship. In fact, the correlation squared is the percent of the variance in the dependent vari- able that is accounted for or predicted by the independent variable. You can also determine the statistical significance of the correlation coefficient. If the direction of the association is hypothesized in advance, you can use a one-tailed test to determine whether the correlation is statistically significantly different from zero, otherwise use a two-tailed test. Correlation is not causation. If we looked at the correlation of the time the paper boy delivers the morning paper and the time of the sunrise, we would find a very strong positive correlation. And yet, we would be reluctant to claim that the news- paper boy causes the sun to rise. How to: In this exercise, you’ll calculate a Pearson correlation between current salary and months since employment, 1. In the data window, open the file named Employee data.sav in the folder named SPSSTutorialData.32 SPSS Step-by-Step
• Measuring association2. In the data window, from the menu select Analyze > Correlate > Bivariate (Fig- ure 16). FIGURE 16. Bivariate correlations window3. Double-click Current Salary to move it to the Variables list.4. Double-click Beginning Salary to move it to the Variables list.5. Click Options. (Figure 17) FIGURE 17. Bivariate correlations: options window6. In the Statistics pane, select Means and standard deviations by clicking its check box.7. Click Continue.8. Click OK. The output is displayed in the Output window. (Figure 18)SPSS Step-by-Step 33
• Measuring association FIGURE 18. Pearson correlation for current salary x beginning salary Correlations Beginning Current Salary Salary Current Salary Pearson Correlation Sig. (2-tailed) N Beginning Salary Pearson Correlation .880** Sig. (2-tailed) .000 N 474 **. Correlation is significant at the 0.01 level (2-tailed). 9. Notice that the correlation is particularly high (.880). The footnote to the table indiates that correlation is significant at the .01 level. (And, no, we can’t find any documentation on why SPSS has highlighted the particularly high correla- tion. Just another one of those moments of cryptic helpfulness.) 10. On the contents pane of the output window, click the Correlations icon indicated by the red arrow (you may have to scroll down a bit to see it), then click it again to allow you to change the name. 11. Type: Pearson: Curr Sal X Beg Sal 12. Press Enter or click anywhere away from the title to apply the change. Try to get in the practice of naming each portion of output so that you can quickly find what you’re looking for. Homework: The SPSS tutorial has an excellent section on using correlations. To find it, open the Help menu and enter correlations, bivariate as the search term. In the Help window, click Show Me.34 SPSS Step-by-Step
• Measuring associationPartial correlationPartial correlation is used to measure the association of two continuous variablesafter controlling for the association of other variables. Conceptually, what is beingdone is to first calculate the variance in the dependent variable that can beexplained or accounted for by all of the control variables. The variance accountedfor by the control variables is then removed from the dependent variable. Finally,the degree of association is measured between the variance remaining in the depen-dent variable and the non-controlled variable.How to: In this exercise, you’ll compare current salary to beginning salary, after controlling for previous experience.1. From the menu, select Analyze > Correlate > Partial. (Figure 19). FIGURE 19. Partial correlations window2. Notice that this time, in addition to selecting the variables to be compared, you can also select Controlling for.3. Select Current Salary and Beginning Salary and move them to the Variables pane.4. Select Previous Experience and move it to the Controlling For pane.5. Click Options to select the statistics you want displayed.6. Select Means and standard deviations by clicking its check box.7. Click Continue.8. Click OK. The new output is displayed in the output window.SPSS Step-by-Step 35
• Measuring association 9. In the contents pane of the output window, click Partial Corr, then click it again to activate it. At the end of the text, type: :Curr Sal X Beg Sal CF Prev Exp. 10. Click anywhere away from the title to apply the change. Note: During your analysis, you’re likely to generate a great number of charts, tables, and other output. Try to come up with a consistent abbreviation scheme that will help you figure out at a glance what you’re looking at. In the example above, the X stands for “by” and “CF” stands for “Controlling For.” Multiple correlation (multiple regression) Multiple correlation looks at the association between one continuous variable (often called the dependent variable) with a group of two or more continuous vari- ables (usually called predictors). One use for a multiple correlation is to find out if there is a relationship between an independent variable and a dependent variable after controlling for a subset of all other variables. In this sense the multiple correlation or multiple regression is used as a more sophisticated method of exploring partial correlations. When you run a step-wise multiple regression, SPSS will find the one variable in the group of predictors which has the highest correlation with the dependent vari- able. It will then statistically remove that variance from the dependent variable that the predictor variable accounts for. The procedure will then go to the list of remain- ing predictors and select the variable which has the highest correlation with the remaining variance in the dependent variable, remove that variance, then select the next predictor and so on until some criterion is met. Typical criteria that you can specify are the amount of additional variance accounted, the level of statistical sig- nificance for the change in variance accounted for, and the maximum number of predictors that can be selected. Example: In a study of the effectiveness of entitlement programs, you want to find out which set of variables can best predict client’s income once they are no longer receiving benefits. All entitlement data are quantitative, including time receiving benefits, individual benefit values, length of job training, and family size. A single categorical variable — minority/non-minority — is included in the calculations as a binary variable. In this example, post-eligibility income is the dependent variable,36 SPSS Step-by-Step
• Measuring associationwhile the independent (or predictor) variables include value of benefits, length ofeligibility, and the like.CrosstabsTo quote the famous purple book, “A crosstabulation is a joint frequency distribu-tion of cases according to two or more classificatory variables. The display of thedistribution of cases by their position on two or more variables is the chief compo-nent of contingency table analysis and is indeed the most commonly used analyticmethod in the social sciences.”1 [Emphasis added.]The Chi-square test can be used to determine whether the frequency distributions ofone or more categorical variables are statistically independent. The crosstab can beused to provide measures of the associations of categorical variables. Some of themeasures of association are the contingency coefficient, phi, tau, gamma, etc..These measures describe the degree to which the values of one variable predict orvary with those of another.Data requirements: Crosstabs require categorical data or continuous data recodedinto categories, such as income or age ranges. The frequencies for each variable inthe population should be approximately normal.Suppose we had a group of 300 people for whom we knew hair color and eye color.We could create a contingency table similar to the displayed in Table 1. In the table,it appears that non-blue eyes and brown hair tend to go together, but because thetotal number of people is different in each cell, it’s hard to arrive at a valid conclu-sion.2 TABLE 1. Sample crosstab Hair color: Eye color Blond Brown Total Blue 75 40 115 Non-blue 25 160 185 Total 100 200 3001. SPSS: Statistical Package for the Social Sciences, Second Edition, by Norman H. Nie, et al., McGraw Hill 1975, p. 218.2. This example is based on the example given in the Purple book, ibid., p. 219.SPSS Step-by-Step 37
• Measuring association To help us determine if the counts of people in these four cells are random, we can calculate the number of people we would expect in each cell if the distribution of people with a particular eye color were independent of the distribution of people with a particular hair color. The expected frequency for any cell in a contingency table is the product of the row frequency times the column frequency for that cell, divided by the total for the table. Thus, the expected frequency for blond hair/blue eyes would be 115 X 100/ 300 or 38.3 people. In fact, the observed frequency for this group is 75, indicating that the distributions are not independent. Something is going on here. TABLE 2. Observed/Expected Crosstab values Hair color: Eye color Blond Brown Total Blue Observed 75 40 115 Expected 38.8 76.7 Non-blue Observed 25 160 185The degrees of freedom for Expected 61.7 123.3a contingency table is the Total 100 200 300number of columns -1 timesthe number of rows -1. To test whether the observed frequencies are statistically different from theConceptually, the degrees expected frequency, we calculate a Chi-square: For each cell subtract the expectedof freedom is a count of frequency (count) from the observed frequency, square the difference, and dividehow many cells in which by the expected value for that cell, then sum these values for all the cells.you are free to enter anynumber you want given that The greater the discrepancy between the expected and actual frequencies, the largeryou know the margins. In the Chi-square becomes. Whether there will be a statistically significant differencegeneral it indicates the depends on the size of the difference and the degrees of freedom. As the degrees ofnumber of data points you freedom increase, the larger the value of Chi-square needed to be statistically sig-can specify before all nificant.remaining data points aredetermined. How to: In this exercise, you’ll use a data file for hair color and eye color that will generate the Chi-square table shown above. 1. From the menu, select File > Open > Data. 2. Navigate to the file named ChiSquare.sav and open it. 3. When prompted to save the current data, click Yes. 38 SPSS Step-by-Step
• Measuring differences4. From the menu, select Analyze > Descriptive Statistics > Crosstabs.5. Select Hair Color and move it to the Columns pane.6. Select Eye Color and move it to the Rows pane.7. Click Statistics.8. Select Chi-square by clicking its check box.9. Click Continue.10. Click Cells.11. Select Observed and Expected.12. Click Continue.13. Click OK. The Chi-square results appear in the Output window. Notice that all the significance levels are less than .001. Something is definitely going on here.Measuring differencesTypically, differences in one or more continuous dependent variables based on dif-ferences in one of more categorical variables are evaluated using a t-test or an anal-ysis of variance. If we have only one continuous dependent variable and only onecategorical independent variable with no more than two values, the t-test can beused to look for differences. If we have more that one dependent continuous vari-able or more than two values across the categorical independent variables or wehave both categorical and continuous independent variables, we need to use ananalysis of variance (ANOVA).T-TestsThere are three types of t-tests:An Independent Samples t-test is used when cases are randomly assigned to one oftwo groups. After a differential treatment has been applied to the two groups, ameasurement is taken which is related to the effect of the treatment. The t-test iscalculated to determine if any difference between the two groups is statistically sig-nificant.A Paired Samples t-test can be used to evaluate differences between two groupswho have been matched on one or more characteristics or evaluate differences inbefore/after measures on same person. If you want to use pre/post measures, makeSPSS Step-by-Step 39
• Measuring differences sure the post-test is the same as the pre-test. This is one of the most common errors in research. A One Sample t-test is used to evaluate whether the mean of a continuous depen- dent variable is different from zero. To test if the mean is different than some other value, subtract that value from each observation and the test to see if the mean of the new values is zero. In the independent samples t-test, the purpose of randomly assigning the people to groups is to control for the effect of other differences between people that might have affected the effect we’re measuring. Using the matched pairs has the same goal, but now the theory is that we have matched the subjects on the concomitant variables. What if you can’t randomly assign people to groups and you can’t match them? For example, if you want to examine salary differences based on gender? One thing you can do is assume that the only difference between people that could result in a dif- ference in income is gender. With that assumption in hand, we can do an indepen- dent sample t-test to see if there’s a gender difference in income. If you’re one of those folks who think that assumption doesn’t pass the “smell” test, you can go back to our earlier friend the correlation, put gender in as a binary vari- able (where 1 is female and 0 is not female), and look at the correlation. This does not mean that gender has caused the difference in salaries, but you can measure how well gender is associated with income level. More sophisticated analyses could be done by using partial correlations or multiple regression to control for con- comitant factors. How to: In this exercise, you’ll split a file based upon a single variable and test the set of cases for each value of the variable as a separate sample. First you’ll instruct SPSS to treat each group (in this case, each result from a specific machine) as a separate sample; this procedure is called splitting the file. Then you’ll run the T-test to determine whether all the machines are meeting the production specifications. 1. From the menu, select File > Open > Data. 2. Navigate to the folder named SPSSTutorialData and open brakes.sav. 3. From the menu, select Data > Split File to open the Split File window (Figure 20).40 SPSS Step-by-Step
• Measuring differences FIGURE 20. Split file window4. Click Compare Groups.5. Select Machine Number and move it to the Groups Based on pane. (Figure 21) FIGURE 21. Completed Split File window6. Click OK. Even though you haven’t created separate files, you have instructed SPSS to treat each group within the file as if it were a separate sample, with a group being defined by the machine number.7. From the menu, select Analyze > Compare Means > One-Sample T Test (Figure 22). You’re going to test against a known value which in this case is the diame- ter of the disc brake.SPSS Step-by-Step 41
• Measuring differences FIGURE 22. One-sample T-Test window 8. Select Disc Brake Diameter and move it to the Test Variables pane. 9. Select the text in the Test Value field and type: 322 10. Click Options to set the confidence level. (Figure 23) FIGURE 23. One-sample T-Test: Options window 11. Change the confidence interval to 90. 12. Click Continue. 13. Click OK. The test results are displayed in the output window. Homework: The exercise above is based on the SPSS tutorial. To find the tutorial, select from the menu Help > Topics. In the keyword field, type One-Sample T Test. Click Show Me. ANOVA If we have more than one dependent continuous variable or more than two values across the categorical independent variables or we have both categorical and con- tinuous independent variables, we need to use an analysis of variance to measure42 SPSS Step-by-Step
• Measuring differencesdifferences. There are a number of particularly useful special cases for the generalanalysis of variance model.If we have only one dependent continuous variable and one independent categoricalvariable we can use a One-Way Analysis of Variance or One-Way ANOVA. If wehave only one dependent continuous variable, but more than one independent cate-gorical variable we can use a general Univariate Analysis of Variance or UnivariateANOVA. If we have one or more independent continuous variables, we can use theUnivariate Analysis of Covariance or Univariate ANCOVAMore advanced models are available for more than one dependent continuous vari-able. If the model has one or more independent categorical variables, we would usea Multivariate Analysis of Variance or MANOVA. If the model also included oneor more continuous independent variables, we would use a Multivariate Analysis ofCovariance or MANCOVA.1One-Way ANOVAOne-way analysis of variance is an extension of the t-test in that, in a t-test youhave two groups, one that received a treatment and one that did not, while in a one-way analysis of variance you have more than two groups where the groups receiveddifferent variation of the same treatment. For example, a treatment factor could bewhether you fry donuts in vegetable oil, butter, or lard.How to: In this exercise, you’re going to identify the most effective number of training hours to achieve a given result, with results scored between zero and 100, 100 being the optimal score.1. From the menu, select File > Open > Data. Navigate to the folder named SPSS- TutorialData and open Training.sav.2. First you’ll graph the means and standard error, so select Graphs > Error Bar (Figure 24).1. For more on determine the statistical procedure to use with your data, consult DataStep’s Statstical Selection Guide, included with this tutorial.SPSS Step-by-Step 43
• Measuring differences FIGURE 24. Error bar window 3. You’ll use the default choices, so click Define. (Figure 25) FIGURE 25. Error bar graph: defining summaries for groups of cases 4. Select Skills Training group and move it to the Category Axis field. 5. Select Score on training exam and move it to the Variable field. 6. In the Bars represent field, open the drop-down list and select Standard error of mean. (Figure 26)44 SPSS Step-by-Step
• Measuring differences FIGURE 26. Completed Error Bar window7. Click OK. The results are displayed in the output window. (Figure 27) FIGURE 27. Standard error graph for skills training groups 90 Mean +- 2 SE Score on training exam 80 70 60 50 N= 20 20 20 1 2 3 Skills training groupSPSS Step-by-Step 45
• Measuring differences Note that the three bars do not represent equal variance among the three groups; instead, variance decreases as days of training increase. These data may not be appropriate for ANOVA. In the next step, you’ll run the ANOVA to test the data. 8. From the menu, select Analyze > Compare Means > One-Way ANOVA. (Figure 28) FIGURE 28. One-Way ANOVA window 9. Move Score on training exam to the Dependent List. 10. Move Skills training group to the Factor list. 11. Click Options. (Figure 29) FIGURE 29. One-Way ANOVA Options window46 SPSS Step-by-Step
• Summary12. Select Descriptive and Homogeneity of variance test by clicking their check boxes.13. Click Continue.14. Click OK. The results appear in the output window.Homework: This exercise is based on the One-Way ANOVA test in the SPSS tutorial. The SPSS tutorial explains the meaning of the results and the displayed output. To use the tutorial, select Help > Topics. In the keyword field, type One-Way ANOVA. In the Topics Found window, double-click One-Way ANOVA. In the Help window, click Show Me to view the tutorial.SummaryThis chapter has provided you with a brief introduction to some of the capabilitiesof the SPSS application. You should now feel comfortable navigating the interface,creating contingency tables (crosstabs), conducting some of the most common sta-tistical tests, and working with charts to present your data in a graphial format.Most importantly, you should now know how to learn more about the SPSS func-tions. You might even want to check out some of those 2,032 books at Ama-zon.com. Yes, you have noticed correctly. Sometime between last week and thisweek we lost two books in the search. Who knows, perhaps you’ll write the nextone.SPSS Step-by-Step 47
• Summary48 SPSS Step-by-Step