3. SPSS is
A software package (program) used for statistical analysis.
Long produced by SPSS Inc., (Incorporation )it was acquired by IBM in 2009
($ 1.2 billion in cash). The
SPSS Statistics.
current versions are officially named IBM
widely used for statistical analysis in social science, market researchers, health
researchers,
data miners
survey companies, government, education researchers, marketing,
4. Why SPSS???
Popular and has been used extensively in medical and biological
researches.
More user-friendly (data analysis presentation) than other
statistical software (e.g., S-Plus, R, SAS) (Drop down menus not
commands!).
Contain all statistical procedures which the researcher is in need
6. Data editor has two “views”
Data View Shows actual data values
Variable View
variables.
Shows variable information for all
Two tabs at on the bottom of the left hand side switching between
them.
13. Variable View
create variable names and define the attributes of each variable.
Name name for each variable.
only of letters, and the underbar (_)
No pure numbers - space - dot .
Type
Width
Decimals
Label
specify type the variable. string
space the entries in the Data
or numeric type.
View will be for this variable.
decimal places will be shown for this variable in the Data View.
give a variable a label or title. makes all output much more
readable.
14. Values specify numerical values for each category of categorical variable.
For example for the variable SEX 1 for Male, 2 for Females .
Missing specify values for a variable indicate missing data.
Columns spaces will be allocated for the variable in the Data View. different
from width in that width limits the number of spaces for the actual number.
Columns limits how many spaces will be visible in the Data View.
Align either left aligns, centers, or right aligns the entries for the variable.
15. Measure
A scale variable
type of variable.
is a quantitative variable.
categorical variable where the categories have aAn ordinal variable
natural order such as
A nominal variable
poor, fair, good, better, best.
categorical variable no natural order to the categories,
such as male, female.
27. 27
Importing data from Excel spreadsheet into SPSS.
In SPSS, go to:
File, Open, Data
Select Type of file (for example, Excel) you want to open
Select File name you want to open
28. DATA CLEANING
• Check for data entry and coding errors
• Wild code checking ( codes beyond the specified codes)
• Consistency checking
36. 36
Data merging in SPSS (1)
1. Make sure that both files are sorted by Key variable in ascending order
37. 37
Data merging in SPSS (2)
4. Select the dataset you want to merge into the working file.
38. 38
Data merging in SPSS (3)
5. Click on Match cases on key variables in sorted files,
6. Click on Both files provide cases
7. Highlight ID in the excluded variables box, then click ► near key Variables
44. Go to
• Analyze
• Descriptive statistics
• Explore
• Put suitable variables in dependent list and factor list (e.g. area
and gender)
• Click plots>histogram>normality plots with tests
• Click continue
• Click ok
45. Now see output ….
• Focus on skewness and kurtosis value
• See statistic and their standard error
• The skewness and kurtosis measure in spss should be as possible
close to 0.
• In reality however data are often skewed or kurtotic
• A small departure from 0 is not a problem as long as their
measures are not too large as compared to their S.E.
• Divide measure by its S.E.
• This will give you the z value which should lie between -1.96 to 1.96
to be insignificant.
46. Now see test of normality
• If Sapiro Wilk value is more than 0.05 then we accept that the
data are approximately normally distributed.
• See histograms, should have more or less normal curve.
• Now see normal Q-Q plot, the dots should be along the line for
normal distribution.
• Skip detendred Q-Q plots
• Inspect the box plot, they should be approximately
symmetrical
48. How to detect
• Analyse
• Descriptive statistics
• Explore
• Put suitable variables on dependent list
• In display, select plots
• Go to plots
• Select dependents together and deselect all others (can see on same graph)
• Paste (see the syntax of the test you perform)
• Select and run
• See output
• Circles are outliers asterisk are extreme outliers
49. It will show the number of case
• Now you have to deal with it
Might be due to
wrong typing
Measurement error
• Go to data view, see the number of case and check the value if
you can do something to be it in range
• OR
50. Create z scores
• Analyze
• Descriptive statistics
• Descriptive
• Check standardized value as variables
• Click OK
• Right click on the z score
• Sort ascending or descending
• Anything greater than 3.29 is outlier (serious outlier) whereas
greater than 2.58 (or 2.5) we have outlier.
51. A general heuristic is that if more than 1% of all the cases have z-scores greater than +2.58 (or
just +2.5), then we have an outlier problem. If any are more than +3.29 (or just +3), then we
have serious outliers (and most likely candidates for remedial action).
• If our rule is to remove all z-scores outside 2.5, then if the SD is 9
and the mean is 60, then: 9 X 2.5 = 22.5. Add this to the mean: 60
+ 22.5 = 82.5. So remove all cases with a mean larger than 82.5 (do
the same for the bottom end of the scale).
• The major strategies are:
Remove the outlier
Transform the data
Just investigate to determine the scope of outliers and keep the
findings in the back of your mind for later action or non-action.