# Kofi nyanteng cleaning and screning data using spss

Simple notes on cleaning data using SPSS

# Kofi nyanteng cleaning and screning data using spss

1. 1. Yacar-Yacara Consults – 2015 +2330501377013 YACAR-YACARA CONSULTS yacaraconsults@yahoo.com Screening and Cleaning Data Before you start to analyse your data it is essential that you check your data set for errors. It is very easy to make mistakes when entering data and unfortunately some errors can completely mess up the entire analyses. For example, entering 35 when you mean to enter 3 can distort the results of the analysis. The data screening process involves a number of steps: Step 1: Checking for errors. First, you need to check each of your variables for scores that are out of range (i.e. not within the range of possible scores). Step 2: Finding the error in the data file. Second, you need to find where in the data file this error occurred (i.e. which case is involved). Step 3: Correcting the error in the data file. Finally, you need to correct the error in the data file itself. Step 1: Checking for errors When checking for errors you are primarily looking for values that fall outside the range of possible values for a variable. For example, if sex is coded 1=male, 2=female, you should not find any scores other than 1 or 2 for this variable. Scores that fall outside the possible range can distort your statistical analyses so it is very important that all these errors are corrected before you start. To check for errors you will need to inspect the frequencies for each of your variables. This includes all of the individual items that make up the scales. Errors must be corrected before total scores for these scales are calculated.
2. 2. Yacar-Yacara Consults – 2015 +2330501377013 Checking categorical variables From the main menu at the top of the screen click on:  Analyze  Descriptive Statistics, then  Frequencies  Choose the variables that you wish to check (e.g. sex, marital, educ.).  Click on the arrow button to move these into the variable box.  Click on the Statistics button. Tick Minimum and Maximum in the Dispersion section.  Click on Continue and then on OK There are two parts to the output. The first table provides a summary of each of the variables you requested. The remaining tables give you a break-down, for each variable, of the range of responses (these are listed using the value label, rather than the code number that was used). Check your minimum and maximum values to see whether they are within the range of possible scores on that variable? Check this against your codebook. Check the number of valid cases and missing cases to see if there are missing cases. If this occurs, open your Data Editor window, move down to the empty case row, click in the shaded area where the case number appears and press Delete on your keypad. Rerun the Frequencies procedure again to get the correct values. Checking continuous variables From the menu at the top of the screen click on  Analyze,  Descriptive statistics,  Descriptives  Click on the variables that you wish to check. Click on the arrow button to move them into the Variables box (e.g. age).  Click on the Options button. You can ask for a range of statistics, the main ones at this stage are mean, standard deviation, minimum and maximum. Click on the statistics you wish to generate.  Click on Continue, and then on OK.
3. 3. Yacar-Yacara Consults – 2015 +2330501377013 Check the minimum and maximum values. Do these make sense? Does the mean score also make sense? Step 2: Finding and correcting the error in the data file How can you find out where the mistake is in your data set?  Make sure that the Data Editor window is open and on the screen in front of you with the data showing.  Click on the variable name of the variable in which the error has occurred (e.g. sex).  Click once to highlight the column.  Click on Edit from the menu across the top of the screen. Click on Find  In the Search for box, type in the incorrect value that you are looking for (e.g. 3).  Click on Find next. SPSS will scan through the file and will stop at the first occurrence of the value that you specified. Take note of the ID number of this case (from the first row). You will need this to check your records or questionnaires to find out what the value should be.  Click on Search Forward again to continue searching for other cases with the same incorrect value. You may need to do this a number of times before you reach the end of the data set.  To correct, place the cursor in the cell, make sure that it is highlighted and then just type in the correct value.
4. 4. Yacar-Yacara Consults – 2015 +2330501377013 Descriptive Statistics Descriptive statistics have a number of uses. These include: i. To describe the characteristics of your sample in the Method section of your report; ii. To check your variables for any violation of the assumptions underlying the statistical techniques that you will use to address your research questions; and iii. To address specific research questions. Tools  Frequencies  Descriptives  Explore  Cross tabulation  Ratio Frequencies Descriptive Statistics Frequencies (mostly for categorical variables)  Analyze, then click on  Descriptive Statistics,  Frequencies.  Select the categorical variables you are interested in (e.g. sex) and move them into the Variables box.  Click on the Statistics button. In the Dispersion section tick minimum and maximum.  Click on Continue and then OK. Descriptive (mostly for continuous variables) Analyze, then click on  Descriptive Statistics,  Descriptives.  Select all the continuous variables that you wish to obtain descriptive statistics  Click on the arrow button to move them into the Variables box (e.g. age).  Click on the Options button.  Click on mean, standard deviation, minimum, maximum,  Click on Continue, and then OK
5. 5. Yacar-Yacara Consults – 2015 +2330501377013 Explore It gives a detailed descriptive analysis of the variables. It presents a number of information about the variables. It provides results for:  Mean  Median  Variance  Percentile  Kurtosis  Skewness  Etc.. Analyze, then click on  Descriptive Statistics,  Explore  Select all the variables that you wish to explore  Click on the arrow button to move them into the Variables box (e.g. age).  Click on the Options button.  Click on all the descriptive tools you want to explore ,  Click on Continue, and then OK Cross Tabulation Cross tabulation is used to find the relationship between categorical variables. It gives a descriptive presentation of the relationship between variables.  Analyze, then click on  Descriptive Statistics,  Cross tabs.  Select the categorical variables you are interested in (e.g. sex) and move them into the Variables boxes.  Click on the Statistics button. In the Dispersion section tick Minimum and Maximum.  Click on Continue and then OK.
