This document provides an overview of key concepts in statistics, including:
- Descriptive statistics involves collecting and analyzing data without inferences, while inferential statistics analyzes a subset of data to make inferences about the whole.
- Parameters describe populations and statistics describe samples.
- Levels of measurement include nominal, ordinal, interval, and ratio.
- Measures of location summarize data distribution and include minimum, maximum, percentiles, deciles, and quartiles.
- Measures of variation describe data spread and include range, inter-quartile range, variance, standard deviation, and coefficient of variation. Variance and standard deviation are particularly important measures.
2. Areas of Statistics
Descriptive statistics
methods concerned w/
collecting, describing, and
analyzing a set of data
without drawing
conclusions (or inferences)
about a large group
Inferential statistics
methods concerned
with the analysis of a
subset of data leading
to predictions or
inferences about the
entire set of data
3. Key Definitions
Parameters are numerical measures
that describe the population or universe
of interest. Usually donated by Greek
letters; (mu), (sigma), (rho),
(lambda), (tau), (theta), (alpha) and
(beta).
Statistics are numerical measures of a
sample
5. Nominal Level of Measurement
The nominal level of measurement is
characterized by data that consists of
names, labels, or categories only. The
data cannot be arranged in an ordering
scheme.
Example: gender, civil status, nationality,
religion, etc.
6. Ordinal Level of Measurement
The ordinal level of measurement
involves data that may be arranged in
some order, but differences between data
values either cannot be determined or are
meaningless.
Example: good, better or best speakers; 1
star, 2 star, 3 star movie; employee rank
7. Interval Level of Measurement
The interval level of measurement is like
the ordinal level, with the additional
property that meaningful amounts of
differences between data can be
determined. However, there are no
inherent (natural) zero starting point.
Example: body temperature, year (1955,
1843, 1776, 1123, etc.)
8. Ratio Level of Measurement
The ratio level of measurement is the
interval modified to include the inherent
zero starting point. For values at this
level, differences and ratios are
meaningful.
Example: weights of plastic, lengths of
videos, distances traveled
9. Median
Divides the observations into two equal
parts
If the number of observations is odd, the
median is the middle number.
If the number of observations is even, the
median is the average of the 2 middle
number
10. Measures of Location
A Measure of Location summarizes a data set
by giving a value within the range of the data
values that describes its location relative to the
entire data set arranged according to magnitude
(called an array).
Some Common Measures:
Minimum, Maximum
Percentiles, Deciles, Quartiles
11. Maximum and Minimum
Minimum is the smallest value in the
data set, denoted as MIN.
Maximum is the largest value in the
data set, denoted as MAX.
12. Percentiles
Numerical measures that give the
relative position of a data value
relative to the entire data set.
Divide an array (raw data arranged
in increasing or decreasing order
of magnitude) into 100 equal parts.
The jth percentile, denoted as Pj, is
the data value in the the data set
that separates the bottom j% of the
data from the top (100-j)%.
13. Deciles
Divide an array into ten equal
parts, each part having ten
percent of the distribution of
the data values, denoted by Dj.
The 1st decile is the 10th
percentile; the 2nd decile is the
20th percentile…..
14. Quartiles
Divide an array into four equal parts,
each part having 25% of the distribution
of the data values, denoted by Qj.
The 1st quartile is the 25th percentile;
the 2nd quartile is the 50th percentile,
also the median and the 3rd quartile is
the 75th percentile.
15. Measures of Variation
A measure of variation is a single
value that is used to describe the
spread of the distribution
16. Two Types of Measures of
Dispersion
Absolute Measures of Dispersion:
Range
Inter-quartile Range
Variance
Standard Deviation
Relative Measure of Dispersion:
Coefficient of Variation
17. Variance
important measure of variation
shows variation about the mean
Population variance
Sample variance
N
X
N
i
i
1
2
2
)(
1
)(
1
2
2
n
xx
s
n
i
i
18. Standard Deviation (SD)
most important measure of variation
square root of Variance
has the same units as the original data
Population SD
Sample SD
N
X
N
i
i
1
2
)(
1
)(
1
2
n
xx
s
n
i
i
19. Remarks on Standard Deviation
If there is a large amount of variation,
then on average, the data values will be
far from the mean. Hence, the SD will be
large.
If there is only a small amount of
variation, then on average, the data
values will be close to the mean. Hence,
the SD will be small.
20. Guiding Principle
The larger the value of the
measure, the more dispersed (more
varied) the observations are.