This document provides training on descriptive statistics for analyzing continuous variables in SPSS. It defines key descriptive statistics like mean, median, mode, variance, standard deviation. It introduces the Explore function for examining a continuous variable according to categories of a categorical variable. Examples are provided to demonstrate how to generate summary statistics for a continuous variable, interpret outputs, and explore relationships between one continuous and one categorical variable.
1. Data analysis: Explore
GAP Toolkit 5
Training in basic drug abuse data management
and analysis
Training session 9
2. Objectives
• To define a standard set of descriptive statistics
used to analyse continuous variables
• To examine the Explore facility in SPSS
• To introduce the analysis of a continuous variable
according to values of a categorical variable, an
example of bivariate analysis
• To introduce further SPSS Help options
• To reinforce the use of SPSS syntax
8. Measures of central tendency
• Most commonly:
– Mode
– Median
– Mean
• 5 per cent trimmed mean
9. The mode
• The mode is the most frequently occurring value in a
dataset
• Suitable for nominal data and above
• Example:
– The mode of the first most frequently used drug is Alcohol,
with 717 cases, approximately 46 per cent of valid responses
10. Bimodal
• Describes a distribution
• Two categories have a large number of cases
• Example:
– The distribution of Employment is bimodal, employment and
unemployment having a similar number of cases and more
cases than the other categories
11. The median
• The middle value when the data are ordered from low to
high is the median
• Half the data values lie below the median and half
above
• The data have to be ordered so the median is not
suitable for nominal data, but is suitable for ordinal
levels of measurement and above
12. Example: median
• Seizures of opium in Germany, 1994-1998
(Kilograms)
• Source: United Nations (2000). World Drug Report 2000 (United Nations publication,
Sales No. GV.E.00.0.10).
Year 1994 1995 1996 1997 1998
Seizure 36 15 45 42 286
13. • Sort the seizure data in ascending order
• The middle value is the median; the median annual
seizures of opium for Germany between 1994 and 1998
was 42 kilograms
Year 1995 1994 1997 1996 1998
Seizure 15 36 42 45 286
Ranked: 1 2 3 4 5
14. The mean
• Add the values in the data set and divide by the number
of values
• The mean is only truly applicable to interval and ratio
data, as it involves adding the variables
• It is sometimes applied to ordinal data or ordinal scales
constructed from a number of Likert scales, but this
requires the assumption that the difference between the
values in the scale is the same, e.g. between 1 and 2 is
the same as between 5 and 6
16. The 5 per cent trimmed mean
• The 5 per cent trimmed mean is the mean calculated on
the data set with the top 5 per cent and bottom 5 per
cent of values removed
• An estimator that is more resistant to outliers than the
mean
17. 95 per cent confidence interval for the mean
• An indication of the expected error (precision) when
estimating the population mean with the sample mean
• In repeated sampling, the equation used to calculate the
confidence interval around the sample mean will contain
the population mean 95 times out of 100
18. Measures of dispersion
• The range
• The inter-quartile range
• The variance
• The standard deviation
19. The range
• A measure of the spread of the data
• Range = maximum – minimum
20. Quartiles
• 1st quartile: 25 per cent of the values lie below the value
of the 1st quartile and 75 per cent above
• 2nd quartile: the median: 50 per cent of values below
and 50 per cent of values above
• 3rd quartile: 75 per cent of values below and 25 per
cent of the values above
21. Inter-quartile range
• IQR = 3rd Quartile – 1st Quartile
• The inter-quartile range measures the spread or range
of the mid 50 per cent of the data
• Ordinal level of measurement or above
22. Variance
• The average squared difference from the mean
• Measured in units squared
• Requires interval or ratio levels of measurement
1
2
n
X
Xi
23. Standard deviation
• The square root of the variance
• Returns the units to those of the original variable
1
2
n
X
Xi
24. Example: standard deviation and variance
Seizures of opium in Germany, 1994-1998
Year Seizure Deviations Squared
deviations
1994 36 -48.8 2381.44
1995 15 -69.8 4872.04
1996 45 -39.8 1584.04
1997 42 -42.8 1831.84
1998 286 201.2 40481.44
Total 424 0 51150.8
Count 5 5
Mean 84.8 Variance 10230
Standard
deviation
101
25. Distribution or shape of the data
• The normal distribution
• Skewness:
– Positive or right-hand skewed
– Negative or left-hand skewed
• Kurtosis:
– Platykurtic
– Mesokurtic
– Leptokurtic
26. • Symmetrical data: the mean, the median and the mode
coincide
Mean
Median
Mode
f(X)
X
The normal distribution
27. Right-hand skew (+)
• Right-hand skew: the extreme large values drag the
mean towards them
f(X)
X
Mode Median Mean
28. Left-hand skew (-)
• Left-hand skew: the extreme small values drag the
mean towards them
Mode
Mean Median X
f(X)
34. Gender Statistic Std. Error
AGE Male Mean 31.43 .340
95% Confidence Interval for
Mean
Lower Bound 30.76
Upper Bound 32.09
5% Trimmed Mean 31.03
Median 30.00
Variance 144.286
Std. Deviation 12.012
Minimum 1
Maximum 70
Range 69
Interquartile Range 19.00
Skewness .370 .069
Kurtosis -.573 .138
Female Mean 33.39 .789
95% Confidence Interval for
Mean
Lower Bound 31.84
Upper Bound 34.94
5% Trimmed Mean 32.77
Median 33.00
Variance 193.593
Std. Deviation 13.914
Minimum 14
Maximum 77
Range 63
Interquartile Range 23.00
Skewness .472 .138
Kurtosis -.602 .376
Descriptives
36. Boxplot of Age vs Gender
311
1247
N =
Gender
Female
Male
Age
100
80
60
40
20
0
-20
183
Median
Inter-quartile range
Outlier
37. Syntax: Explore
EXAMINE
VARIABLES=age BY gender /ID=id
/PLOT BOXPLOT HISTOGRAM
/COMPARE GROUP
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
38. Summary
• Measures of central
tendency
• Measures of variation
• Quantiles
• Measures of shape
• Bivariate analysis for a
categorical independent
variable and continuous
dependent variable
• Histograms
• Boxplots