GROUP NO: 04
• Descriptive statistics used in censuses taken by the
Babylonians and Egyptians between 4500 and 3000
• In addition, the Roman Emperor Augustus
(27 B.C.—A.D. 17) conducted surveys on births and
deaths of the citizens of the empire, as well as the
number of livestock each owned and the crops each
citizen harvested yearly.
Data: the information that has been collected from an
experiment, a survey, a historical record, etc.
A variable is a characteristic or attribute that can
assume different values.
A statistic is a characteristic or measure obtained by
using the data values from a sample.
A parameter is a characteristic or measure obtained
by using all the data values from a
consists of the
Presentation of data.
Summarize,describe and characterize the sample being
Determine if the sample is normally distributed (bell
curve) most statistical tests require the sample to have
Determine if the sample can be compared to the larger
Are displayed as tables, charts, percentages, frequency,
distributions and reported as measures of central tendency
Central tendancy- the sample mean, mode, median
Measures of Position
Measures of variability- range,varience and
Exploratory Data Analysis
The mean is the sum of the values, divided by the
total number of values.
The symbol represents the sample mean.
=Sum of all data value
= number of data in sample
=number of data items in population
The mean is sensitive to extreme scores
(outliers) in the sample
For a population, the Greek letter (mu) is
used for the mean.
The median is the midpoint of the data array. The
symbol for the median is MD
the middle value or 50th procentile (the value of the
observation, that divides the sorted data in almost
• The median is not sensitive to extreme scores
• When n odd: median is the middle observation
• When n even: median is the average of values of two
The value that occurs most often in data set is called the mode.
The midrange is defined as the sum of the lowest
and highest values in the data set,
divided by 2. The symbol MR is used for the
Min and max
The range is the highest value minus the lowest value.
The symbol R is used for the range.
The variance is the average of the squares of the distance
each value is from the mean. The symbol for the
population variance is
The standard deviation is the square root of the variance.
The symbol for the
The formula for the sample variance, denoted by ,
The standard deviation of a sample (denoted by s)
Applications of the Variance and Standard
1. To determine the spread of the data
2. To determine the consistency of a variable
ex: in the manufacture of fittings, such as nuts and
bolts, the variation in the diameters must be small,
or the parts will not fit together.
3. To determine the number of data values thatfall
within a specified interval in a distribution
68% of the population in a normal distribution is within
1 standard deviation of the mean
The coefficient of variation, denoted by CVar, is the
standard deviation divided by the
mean. The result is expressed as a percentage.
A z score or standard score
Quartiles and Deciles
A z score or standard score for a value is obtained by
subtracting the mean from the
value and dividing the result by the standard deviation.
The symbol for a standard score
is z. The formula is
The z score represents the number of standard
deviations that a data value falls above or below the
Percentiles divide the data set into 100 equal groups.
Quartiles divide the distribution into four groups,
separated by Q1, Q2, Q3
An outlier is an extremely high or an extremely low
data value when compared with the rest of the data
An outlier can strongly affect the mean and standard
deviation of a variable
Descriptive statistics consists of the collection, organization, summarization, and
presentation of data.
A data set that has only one value that occurs with the greatest frequency is said to
If a data set has two values that occur with the same greatest frequency, both values
are considered to be the mode and the data set is said to be bimodal. If a data set has more
than two values that occur with the same greatest frequency, each value is used as the
mode, and the data set is said to be multimodal. When no data value occurs more than
once, the data set is said to have no mode. A data set can have more than one mode or no
mode at all. These situations will be shown in some of the examples that follow.