Stat Chapter 3.pptx, proved detail statistical issues
1.
Chapter three
Numerical DescriptiveMeasures
Objectives
Describe data using measures of central tendency, such as
the mean, median, mode, and midrange.
Summarize data using measures of variation, such as the
range, variance, and standard deviation.
Determine the position of a data value in a data set using
various measures of position, such as percentiles, deciles and
quartiles.
2.
A. Measure ofcentral tendency
A measure of central tendency is very important
tool that refer to the center of a histogram or a
frequency distribution curve.
Such measures are the mean, the median, and the
mode for the two cases (grouped and ungrouped
data sets).
3.
The mean
◦ Themost commonly used measure of central tendency is called
mean (or the average).
• Also known as arithmetic average: it is the most common
measure.
• Calculated by adding all the values in the group & then
dividing by the number of values.
• Helps to summarizing the essential features and enables
comparison.
4.
Cont…
Mean isthe sum of the values divided by the
number of values. The mean of a set of
numbers x1, x2... xn is typically denoted by " ".
This mean is a type of arithmetic mean.
It is the "standard" average, often simply
called the "mean".
The mean for an ungrouped data is obtained
by dividing the sum of all values by the
number of values in that data set.
5.
Cont…
The Meanfor Ungrouped Data
calculated as
Mean for population data:
Mean for sample data: x
̄ =
Example; Find the mean score of 10 students
in a midterm exam in a class if their scores
are
6.
Cont…
25 27 3023 16 27 29 14 20
28
=
Example2. According to example 1, if we
take a sample of 4 students from the class
and find their scores to be: 23, 27, 16, and 29.
Find the mean of this scores.
x
̄ =
x
̄ ==95/4=23.75
7.
ii. Weighted Mean
If 𝑥1 , 𝑥2 , … , 𝑥𝑛 and 𝑤1 , 𝑤2 , … , 𝑤𝑛
are
represent values of the
items
the
corresponding weights, then the weighted mean, (𝑥ҧ𝑤 ) is
given by
Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are A,
B, D and C respectively. If the respective credits received for these courses are 4, 4, 3
and 2, determine the approximate average mark the student has got for the course.
Solution:
=
𝟏𝟔+𝟏𝟐+𝟑+𝟒
𝟑𝟓 𝟏𝟑
𝟏𝟑
= = 2.69. That is, Average mark of the student is
2.69.
wi
wi
xi
w1 w2 wn
w1 x1 w2 x2 wn xn
xw
wi
wi
xi
w1 w2 wn
w1 x1 w2 x2 wn xn
xw
𝑥𝑖 4 3 1 2
𝑤𝑖 4 4 3 2
𝑥𝑖𝑤𝑖 16 12 3 4
8.
iii. Combined mean
When a set of observations is divided into k groups and x
̄ 1𝑛1 is the mean
of n1
& group 1, x
̄ 2𝑛2 is the mean of n2 & group2, …, x
̄ k𝑛k is the mean of nk &
group k, then the combined mean, denoted by x
̄ c, of all observations
taken together is given by
=
x
̄ 1𝑛1 + x
̄ 2𝑛2 + ⋯ + x
̄ 𝑘 𝑛𝑘
𝑛1 + 𝑛2 + ⋯
+ 𝑛𝑘
Example: There are two classes, Class A and Class B. Class A has 30 students
with an average score of 70 on a test. Class B has 20 students with an average
score of 80. What is the combined average score for both classes?
Solution:
= 74.
The combined mean of the entire students will be
74.
X
̄ c
X
̄ c = =3700/50
9.
Note:
If a constantc is added to or subtracted from every value in the data set, the
mean increases or decreases by that constant:
New Mean=Old Mean + c, for added;
New Mean=Old Mean - c, for subtracted
If each value in the data set is multiplied by a constant k, the mean is also
multiplied by k: New Mean=k × Old Mean.
Question 1: If the mean of a data set is 50, what will the new mean be if a constant
value of 5 is added to every value in the data set?
Given mean = 50 and constant = 5; New mean = 50 + 5 = 55.
10.
The mid range
Themidrange (MR) is defined as the sum of the lowest and highest
values in the data set divided by 2.
MR = Lowest value + Highest value
2
Example: Find the midrange (MR) for the following data:
11, 13, 20, 30, 9, 4, 15
Solution: The lowest value is 4, and the highest value is 30, then
MR = 4 + 30 = 34 = 17
2 2
Note that, this measure (MR) is weak as a measure of central ten-
dency since it is depends only on two values among of all values in
the data set.
11.
Mean for Groupeddata
If data are given in the form of continuous frequency distribution,
the
sample mean can be computed as
x
̄ =
σ𝑖=1
𝑘
�
�
𝑖
𝑥𝑖 𝑓𝑖 𝑥1𝑓1+𝑥2𝑓2+ …
+𝑥𝑘 𝑓𝑘
σ𝑖=1
𝑘 =
𝑓1+𝑓2+ …+
𝑓𝑘
, 𝑥𝑖 𝑓𝑖 - is the product of mid-
point &
freq.
Solution:
The formula to be used for the
mean is as follows:
x
̄ =
σ𝑖=1
fi
𝑥
𝑓
𝑖
𝑖
σ𝑖=1
𝑘
x
̄ =
σ𝑖=1
𝑘
fi
𝑥
𝑓
𝑖
100
σ𝑖=1
𝑘
𝑖
= x
̄
655
8 = 65.58.
Class boundary 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (fi) 5 18 42 20 8 7 100
xi 61 63 65 67 69 71
xifi 305 1134 2730 1340 552 497
12.
Median
• Isthe value of the middle item of series when it is
arranged in ascending or descending order.
• It divides the series into two half.
• It is positional average.
• It is the middle value of the distribution when all items are
arranged in either ascending or descending order in terms
of value.
Where n is odd
12
1
2
th
n
Med value
03/08/2025 By: Menberu T.
13.
Cont…
Example: Findthe median for the data set:
312, 257, 421, 289, 526, 374, 497
Solution: First, the data set after we have ranked in increasing order is:
x1 x2 x3 x4 x5 x6 x7
257 289 312 374 421 497 526
Median=374
Since there are 7 values in this data set, so the fourth term a 7+ 1 = 4k
in the ranked data is the median.Therefore the median is
median = ( )th item= = 4th
item = 374
14.
Cont…
Median ofEven Numbers
Step 1: Arrange the data either in ascending or in descending
order.
Step 2: If the number of observations (say n) are even, then
identify (n/2)th and [(n/2) + 1]th observations.
Step 3: The average of the above two observations (which
are identified in step 2) is the median of the given data.
15.
Cont…
Example: Findthe median for the data set:
8, 12, 7, 17, 14, 45, 10, 13, 17, 13, 9, 11
Solution: First, we rank the data in
increasing order:
Since there are 12 values in this data set, the
median is given by the average of the two
middle values whose ranks are
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
7 8 9 10 11 12 13 13 14 17 17 45
16.
Median for groupeddata
For grouped data, the median is obtained
by the following formula.
Median=L+()h
Where L= lower limit of the median class
n= number of observation
f=frequency of the median class
cf=cumulative frequency of the class
preceding the median class
h=class width
17.
Example: Water percentagein the body of species of Fish is given below.
Calculate the median.
Solution: Construct the less than cumulative frequency distribution,
then:
Since n = 50, 50/2 = 25
l=35
f=16
h=9
Cf=24
Median=L+()h = =35+()9 = 35.56
~
x
Class interval 15-24 25-34 35-44 45-54 55-64 Total
Frequency 7 17 16 6 4 50
Class Interval 15-24 25-34 35-44 45-54 55-64 Total
Frequency 7 17 16 6 4 50
Cumulative Freq. 7 24 40 46 50
18.
The mode
Themode is another measure of central tendency and it is
known as the most common value in a data set.
Data set with none mode: In such data set each value
occurring only once.
Data set with one mode: In such data set only one value
occurring with the highest frequency. The data set in this
case is called unimodal.
Data set with two modes: In such data set two values that
occur with the same (highest) frequency. The distribution, in
this case, is said to be bimodal.
Data set with more than two modes: In such data set more
than two values occurs with the same (highest) frequency,
then the data set contains more than two modes and it is said
to be multimodal.
19.
Cont…
Example: Findthe mode for the given data set:
22, 19, 21, 19, 27, 21, 29, 22, 19, 25, 21, 22, 25
Solution: Since each of the three values, 19 (occur three
times), 21 (occur three times), and 22 (occur three times)
occurs with a highest frequency in their neighborhoods,
therefore, each of these is a mode, that is the modes for this
data set are: 19, 21, and 22.
20.
Mode for groupeddata
The formula for calculating the mode of grouped data
is:
In this formula, the variables are:
• L:The lower limit of the modal class
• h:The size of the class interval
• f1:The frequency of the modal class
• f0:The frequency of the class preceding the modal
class
• f2:The frequency of the class succeeding the modal
class
21.
Example : Thefollowing table shows the distribution of scores obtained by
students in an exam:
What is the mode of the exam
scores?
Answer:
• L = lower boundary of the modal class = 70
• f1 = frequency of the modal class = 25
• f0 = frequency of the class before the modal class =
12
• f2= frequency of the class after the modal class = 10
• h = class width = 10
• Using formula: Mode = 75.
Score Range Number of Students (Frequency)
50 – 60 8
60 – 70 12
70 – 80 25
80 – 90 10
90 - 100 5
=75
22.
Relationships Between Mean,Median and Mode:
The relationships between mean, median & mode is defined
as Mode is
equal to the difference between 3 times the median & 2 times the
mean.
That is, Mean – Mode = 3 (Mean – Median) OR;
Mode = 3 Median – 2 Mean.
Example : If the difference between mean and mode of a
population is 48 and the median is 12, then the mean is
Solution:
Mean – Mode = 3(Mean – Median);
48 = 3(Mean – 12);
16 = Mean – 12;
Mean = 28.
23.
B. Measures ofdispersion
• An average can represent a series only as best as a single
figure can, but it certainly cannot reveal the entire story of
any phenomenon under study
• It shows the degree by which numerical data tend to spread
around an average value/mean .
• Averages do not tell anything about the scatterness of
observations within the distribution.
• In order to measure the degree of scatter, the statistical device
called measures of dispersion are calculated.
23
03/08/2025 By: Menberu T.
24.
Range =highest value – lowest value
It shows the difference b/n the highest value and the
lowest value, hence it is the weakest measure of
dispersion
Variance
First calculate the mean, then deduct the mean from
each value in the group square the result and divide
the result by the number of values.
The variance is used as a measure of how far a set of
numbers are spread out from each other.
It describes how far the numbers lie from the mean
(expected value).
24
03/08/2025 By: Menberu T.
25.
Standard deviation
The most reliable measurement of the degree to which the
data is spread around the mean
Putting the variance in square root
25
2
1
( )
( )
n
i
i
x x
Var x
n
03/08/2025 By: Menberu T.
26.
Example: please, findthe mean, median, mode, range, variance
and standard deviation for the following row data?
03/08/2025 By: Menberu T. 26
ID Age of respondent
1 53
2 44
3 56
4 70
5 45
6 62
7 36
8 23
9 56
10 55
27.
Solution: A) Mean
=∑xi/n = 53 + 44 + 56 +70 + 45 + 62 + 36 +23 + 56 + 55/10 = 500/10 = 50
B) Median, first we should arrange the raw data in ascending or descending order
as follow:
23, 36, 44, 45, 53, 55, 56, 56, 62, 70, since n is order, therefore
Median = 53 + 55/2 = 54
C) Mode, we find the most frequently occur, 56 is the mode of the given data
since it is more frequently occur and It is uni-modal.
D) Range = largest value-lowest value = 70-23 =47
E) Variance = ∑(xi- )2/n
03/08/2025 By: Menberu T. 27
Measure of dispersionfor Grouped Data
• Sample Variance Formula for Grouped Data (σ2
) = ∑ f(mi – )
x
̄ 2
/(n-1)
• Population Variance Formula for Grouped Data (σ2
) = ∑ f(mi – )
x
̄ 2
/n
• where,
• f is the frequency of each interval
• mi is the midpoint of the ith
interval
• x
̄ is the mean of the grouped data
03/08/2025 By: Menberu T. 29
30.
Cont…
• Find thevariance and the standard deviation for the following frequency dist
of a sample:
03/08/2025 By: Menberu T. 30
Class Frequency fm
5 – 9 2
10 – 14 4
15 – 19 7
20 – 24 3
25 – 29 1
30 – 34 3
Total 20
Cont…
• Variance= =1105/19=58.158
• Standard deviation=7.626
03/08/2025 By: Menberu T. 32
33.
C. Measures ofrelationship
1. Coefficient of variance
It (CV) is a normalized measure of dispersion.
It is also known as unitized risk or the variation coefficient.
It is defined as the ratio of the standard deviation to the mean.
CV is a relative measure of dispersion, V, defined by:
33
SD
CV
Mean
03/08/2025 By: Menberu T.
34.
Example: If thestandard deviation of a given distribution is 0.20
and the mean is 0.50, what is the coefficient of variation (CV)?
CV = (0.20/0.50)*100% = 40%
2. Covariance
Covariance between X and Y refers to a measure of how much
two variables change together.
Covariance indicates how two variables are related.
A positive covariance means the variables are positively related,
while a negative covariance means the variables are inversely
related.
The formula for calculating covariance of sample data is shown
below.
34
03/08/2025 By: Menberu T.
35.
35
Note: for population(N) and for sample
(n-1)
Often the numbers have no meaning. Thus
we focus on the sign.
03/08/2025 By: Menberu T.
36.
3. correlation
Covariance onlyshows the direction. It has no upper and lower
bound.
Correlation tells the degree to which the variables tend to move
together.
The most familiar measure of dependence between two quantities is
the "Pearson's correlation."
It is obtained by dividing the covariance of the two variables by the
product of their standard deviations.
The Pearson correlation is defined only if both of the standard
deviations are finite ፥ፍልሕ፡ህ፡and both of them are nonzero.
The correlation coefficient is symmetric: corr(X, Y) = corr(Y, X).
36
03/08/2025 By: Menberu T.
37.
The Pearson correlationis +1 if there is perfect positive linear
relationship, −1 if there is perfect negative linear relationship.
If the variables are independent, Pearson's correlation
coefficient is 0.
The sample correlation coefficient is written
37
03/08/2025 By: Menberu T.
38.
The correlation betweentwo random variables, X and Y, is
a measure of the degree of linear association between
the two variables.
The population correlation, denoted by , can take on any
value from -1 to 1.
indicates a perfect negative linear relationship
-1 < < 0 indicates a negative linear relationship
indicates no linear relationship
0 < < 1 indicates a positive linear relationship
indicates a perfect positive linear relationship
The absolute value of indicates the strength or exactness of the
relationship.
38
03/08/2025 By: Menberu T.
Skewness
It refersto symmetry or asymmetry of the distribution.
A distribution is symmetric if its left half is a mirror image of
its right half.
The skewness value can be positive or negative.
A symmetric distribution with a single peak and a bell shape is
known as a normal distribution.
D. Shape of Frequency Distribution
03/08/2025 By: Menberu T. 40
41.
Kurtosis:
It refersto peakedness/flatness of the distribution.
Higher kurtosis means more of the variance is the result
of infrequent extreme deviation.
The fourth standardized moment is defined as
4
1
4
( )
( 1)
n
i
i
x x
KU
n S
03/08/2025 By: Menberu T. 41