Chapter 022

1Copyright © 2013, 2009, 2005, 2001, 1997 by Saunders, an imprint of Elsevier Inc.
Chapter 22
Using Statistics To Describe
Variables

Using Statistics to Describe
Variables
 Two major classes of statistics
 Descriptive statistics
• To reveal characteristics of the sample dataset
 Inferential statistics
• To gain information about effects in the population being
studied

Using Statistics to Describe
 All quantitative research uses descriptive
statistics
 For description of the sample
 For initial description of variables
 For analysis of the primary research problem
 Descriptive statistics for descriptive research
 Inferential statistics for interventional and
correlational research

Using Statistics to Summarize
Data
 Terms: the number of elements in a sample is
the “n” of the sample
 Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31, 55, 43,
47, 67, 39, 52, 48, 36, 42, 61, 57
 n = 20
 Descriptive statistics
 Frequency distributions
 Measures of central tendency
 Measures of dispersion

Frequency Distributions
 Table or figure (line graph, pie chart, etc.)
 Continuous variable: the higher numbers
represent more of that variable, and the lower
numbers represent less of that variable

Frequency Table
 Listing every possible value in the first
column of numbers, and the frequency (tally)
of each value as the second column of
numbers
 Data set: 45, 26, 59, 51, 42, 28, 26, 32, 31,
55, 43, 47, 67, 39, 52, 48, 36, 42, 61, 57
(ages)
 Sort from lowest to highest values
 Tally each value

Ungrouped Frequency
Distribution
 List all categories of the variable
on which they have data, and tally
each datum on the listing
Age Frequency
26 2
28 1
31 1
32 1
36 1
39 1
42 2
43 1
45 1
47 1
48 1
51 1
52 1
55 1
57 1
59 1
61 1
67 1

Grouped Frequency Distribution
 Categories are grouped into ranges
 Ranges must be mutually exhaustive and
mutually exclusive
Age Frequency
20 - 29 3
30 - 39 4
40 - 49 6
50 - 59 5
60 - 69 2

Grouped Frequency Distribution
with Percentages
Adult Age
Range
Frequency
(f)
Percentage (%)
Cumulative
Percentage
20 – 29 3 15 15
30 – 39 4 20 35
40 – 49 6 30 65
50 – 59 5 25 90
60 – 69 2 10 100
Total 20 100

Frequency Distributions
Presented in Figures
 Graphs
 Charts
 Histograms
 Frequency polygons

Line Graph

Frequency Table of Smoking
Status
Smoking Status Frequency Percent
Current smoker 1 10
Former Smoker 6 60
Never Smoked 3 30
Total 10 100

Histogram of Smoking Status

Measures of Central Tendency
 Statistics that provides the center or hallmark
value of a data set
 Mode
 Median (MD)
 Mean

Mode
 The most common value in a data set
 Bimodal: two modes exist
 Multimodal: more than two modes

Median (MD)
 The middle value in the data set (after sorting
values from lowest to highest)
 If the “n” is even, the two values in the middle
are averaged
 The 50th percentile

Mean
 Arithmetic average of all a variable's values
 Most commonly reported measure of central
tendency
 Sum of the scores divided by the number of
values in the data set
 Formula:

When to Use Mean
 Mean: normally distributed values measured
at the interval or ratio level
 Ordinal level data from a rating scale If
 The n is large
 The data are normally distributed
 Small values denote very little of the measured
quantity; large ones denote a lot
 Mean is sensitive to extreme scores such as
outliers

When to Use Median and Mode
 Median: used for non-normal distributions
with small n
 Mode: used for nominal values

Using Statistics to Explore
Deviations in the Data
 Using measures of central tendency to
describe the nature of a data set obscures
the impact of extreme values or deviations in
the data
 Measures of dispersion, provide important
insight into nature of the data

Measures of Dispersion
 Quantifications of how tightly clustered
around the mean the sample is:
 Tightly clustered = fairly homogeneous
 Widely dispersed = heterogeneous
 Range
 Difference score
 Variance
 Standard deviation

Range
 Presented in two ways:
 The lowest score and the highest score (2 through 17)
 The difference between the highest and the lowest
score (range of 15)

Difference Score
 Subtract the mean from each score
 Sometimes referred to as a deviation score
 The difference score is positive when score is
above the mean, and negative when score is
below the mean
 The total of all the difference scores is zero
 Formula:

Mean Deviation
 Average difference score, using the absolute
values
 Example:

Variance (s²)
 Variance commonly used
 “s2”
is used to represent a sample variance
 “σ2”
is used to represent population variance
 Always a positive value, has no upper limit
 Bigger variances = more spread
 Formula:

Standard Deviation (s)
 Square root of the variance
 Sometimes reported as SD
 Most commonly reported measure of
dispersion
 Formula:

The Normal Curve

The Normal Curve (Cont’d)
 Represents the frequency distribution of a variable
that is perfectly normally distributed
 Signifies:
 The mean is the most commonly occurring value
 There are just as many values above the mean as there are
below the mean
 When frequency table is constructed, values are perfectly
symmetric
 68% of values are –1 to +1 standard deviations from mean
 95% of values are –2 to +2 standard deviations from mean

z-Score

z-Score (Cont’d)
 Synonymous with a standard deviation unit
 A z value of 1.0 represents 1 standard deviation
unit above the mean
 A z value of –1.0 represents 1 standard deviation
unit below the mean
 Formula:

Sampling Error
 Described by the statistic “standard error”
 Standard error of the mean is calculated to determine
the magnitude of the variability associated with the
mean
 Formula:
where
 = standard error of the mean
 s = standard deviation
 n = sample size

Confidence Interval
 Determines how closely a sample value
approximates a population value
 Can be created for many statistics, such as a
mean, proportion, and odds ratio
 Using a table of statistical values, the t-value
is accessed, for the desired interval, usually
95%

Confidence Interval (Cont’d)
 To calculate a 95% confidence interval
around a mean, for example:
 Calculate the mean
 Calculate the standard error of the mean
 Calculate the degrees of freedom (df) [df = n – 1]
 Look up the two-tailed t-value for p < 0.05

Degrees of Freedom
 The number of independent pieces of
information that are free to vary
 For confidence interval, the degrees of
freedom (df) are n – 1
 This means that there are n – 1 independent
observations in the sample that are free to vary (to
be any value) to estimate the lower and upper
limits of the confidence interval

Chapter 022

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Chapter 022

Similar to Chapter 022 (20)

More from stanbridge

More from stanbridge (20)

Chapter 022