Review in
STATISTICS
Areas of Statistics
Descriptive statistics
 methods concerned w/
collecting, describing, and
analyzing a set of data
without drawing
conclusions (or inferences)
about a large group
Inferential statistics
 methods concerned
with the analysis of a
subset of data leading
to predictions or
inferences about the
entire set of data
Key Definitions
 Parameters are numerical measures
that describe the population or universe
of interest. Usually donated by Greek
letters;  (mu),  (sigma),  (rho), 
(lambda),  (tau),  (theta),  (alpha) and
 (beta).
 Statistics are numerical measures of a
sample
Levels of
Measurement
Nominal Level of Measurement
 The nominal level of measurement is
characterized by data that consists of
names, labels, or categories only. The
data cannot be arranged in an ordering
scheme.
 Example: gender, civil status, nationality,
religion, etc.
Ordinal Level of Measurement
 The ordinal level of measurement
involves data that may be arranged in
some order, but differences between data
values either cannot be determined or are
meaningless.
 Example: good, better or best speakers; 1
star, 2 star, 3 star movie; employee rank
Interval Level of Measurement
 The interval level of measurement is like
the ordinal level, with the additional
property that meaningful amounts of
differences between data can be
determined. However, there are no
inherent (natural) zero starting point.
 Example: body temperature, year (1955,
1843, 1776, 1123, etc.)
Ratio Level of Measurement
 The ratio level of measurement is the
interval modified to include the inherent
zero starting point. For values at this
level, differences and ratios are
meaningful.
 Example: weights of plastic, lengths of
videos, distances traveled
Median
 Divides the observations into two equal
parts
 If the number of observations is odd, the
median is the middle number.
 If the number of observations is even, the
median is the average of the 2 middle
number
Measures of Location
 A Measure of Location summarizes a data set
by giving a value within the range of the data
values that describes its location relative to the
entire data set arranged according to magnitude
(called an array).
Some Common Measures:
 Minimum, Maximum
 Percentiles, Deciles, Quartiles
Maximum and Minimum
 Minimum is the smallest value in the
data set, denoted as MIN.
 Maximum is the largest value in the
data set, denoted as MAX.
Percentiles
 Numerical measures that give the
relative position of a data value
relative to the entire data set.
 Divide an array (raw data arranged
in increasing or decreasing order
of magnitude) into 100 equal parts.
 The jth percentile, denoted as Pj, is
the data value in the the data set
that separates the bottom j% of the
data from the top (100-j)%.
Deciles
 Divide an array into ten equal
parts, each part having ten
percent of the distribution of
the data values, denoted by Dj.
 The 1st decile is the 10th
percentile; the 2nd decile is the
20th percentile…..
Quartiles
 Divide an array into four equal parts,
each part having 25% of the distribution
of the data values, denoted by Qj.
 The 1st quartile is the 25th percentile;
the 2nd quartile is the 50th percentile,
also the median and the 3rd quartile is
the 75th percentile.
Measures of Variation
 A measure of variation is a single
value that is used to describe the
spread of the distribution
Two Types of Measures of
Dispersion
Absolute Measures of Dispersion:
 Range
 Inter-quartile Range
 Variance
 Standard Deviation
Relative Measure of Dispersion:
 Coefficient of Variation
Variance
 important measure of variation
 shows variation about the mean
Population variance
Sample variance
N
X
N
i
i

 1
2
2
)( 

1
)(
1
2
2




n
xx
s
n
i
i
Standard Deviation (SD)
 most important measure of variation
 square root of Variance
 has the same units as the original data
Population SD
Sample SD
N
X
N
i
i

 1
2
)( 

1
)(
1
2




n
xx
s
n
i
i
Remarks on Standard Deviation
 If there is a large amount of variation,
then on average, the data values will be
far from the mean. Hence, the SD will be
large.
 If there is only a small amount of
variation, then on average, the data
values will be close to the mean. Hence,
the SD will be small.
Guiding Principle
 The larger the value of the
measure, the more dispersed (more
varied) the observations are.

Statistics review

  • 1.
  • 2.
    Areas of Statistics Descriptivestatistics  methods concerned w/ collecting, describing, and analyzing a set of data without drawing conclusions (or inferences) about a large group Inferential statistics  methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data
  • 3.
    Key Definitions  Parametersare numerical measures that describe the population or universe of interest. Usually donated by Greek letters;  (mu),  (sigma),  (rho),  (lambda),  (tau),  (theta),  (alpha) and  (beta).  Statistics are numerical measures of a sample
  • 4.
  • 5.
    Nominal Level ofMeasurement  The nominal level of measurement is characterized by data that consists of names, labels, or categories only. The data cannot be arranged in an ordering scheme.  Example: gender, civil status, nationality, religion, etc.
  • 6.
    Ordinal Level ofMeasurement  The ordinal level of measurement involves data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless.  Example: good, better or best speakers; 1 star, 2 star, 3 star movie; employee rank
  • 7.
    Interval Level ofMeasurement  The interval level of measurement is like the ordinal level, with the additional property that meaningful amounts of differences between data can be determined. However, there are no inherent (natural) zero starting point.  Example: body temperature, year (1955, 1843, 1776, 1123, etc.)
  • 8.
    Ratio Level ofMeasurement  The ratio level of measurement is the interval modified to include the inherent zero starting point. For values at this level, differences and ratios are meaningful.  Example: weights of plastic, lengths of videos, distances traveled
  • 9.
    Median  Divides theobservations into two equal parts  If the number of observations is odd, the median is the middle number.  If the number of observations is even, the median is the average of the 2 middle number
  • 10.
    Measures of Location A Measure of Location summarizes a data set by giving a value within the range of the data values that describes its location relative to the entire data set arranged according to magnitude (called an array). Some Common Measures:  Minimum, Maximum  Percentiles, Deciles, Quartiles
  • 11.
    Maximum and Minimum Minimum is the smallest value in the data set, denoted as MIN.  Maximum is the largest value in the data set, denoted as MAX.
  • 12.
    Percentiles  Numerical measuresthat give the relative position of a data value relative to the entire data set.  Divide an array (raw data arranged in increasing or decreasing order of magnitude) into 100 equal parts.  The jth percentile, denoted as Pj, is the data value in the the data set that separates the bottom j% of the data from the top (100-j)%.
  • 13.
    Deciles  Divide anarray into ten equal parts, each part having ten percent of the distribution of the data values, denoted by Dj.  The 1st decile is the 10th percentile; the 2nd decile is the 20th percentile…..
  • 14.
    Quartiles  Divide anarray into four equal parts, each part having 25% of the distribution of the data values, denoted by Qj.  The 1st quartile is the 25th percentile; the 2nd quartile is the 50th percentile, also the median and the 3rd quartile is the 75th percentile.
  • 15.
    Measures of Variation A measure of variation is a single value that is used to describe the spread of the distribution
  • 16.
    Two Types ofMeasures of Dispersion Absolute Measures of Dispersion:  Range  Inter-quartile Range  Variance  Standard Deviation Relative Measure of Dispersion:  Coefficient of Variation
  • 17.
    Variance  important measureof variation  shows variation about the mean Population variance Sample variance N X N i i   1 2 2 )(   1 )( 1 2 2     n xx s n i i
  • 18.
    Standard Deviation (SD) most important measure of variation  square root of Variance  has the same units as the original data Population SD Sample SD N X N i i   1 2 )(   1 )( 1 2     n xx s n i i
  • 19.
    Remarks on StandardDeviation  If there is a large amount of variation, then on average, the data values will be far from the mean. Hence, the SD will be large.  If there is only a small amount of variation, then on average, the data values will be close to the mean. Hence, the SD will be small.
  • 20.
    Guiding Principle  Thelarger the value of the measure, the more dispersed (more varied) the observations are.