Summary statistics (1)

MEASURES OF CENTRAL TENDENCY
• We depend on volumes of data to make
various strategic decisions in business.
• Dealing with large volumes of data comes with
various challenges
• To the production foreman, detail is of
essence but
• To top management, it is better to summarise
the data for easy management because prime
interest is on overall profitability

• Two items are always of utmost importance:
– Measure of central tendency (about the middle of the
distribution)
– Measure of dispersion (about the centre)

Mean, Mode, Median and Geometric
Mean
• The Mean
– It is the sum of the data divided by the number of items
constituting the data
Mean = n
∑ xi/n This the same as x1 + x2 + x3….+ xn
i=1
– If we have the following set of measurement:
n= 2, 4, 2, 3, 3, 5, 2, the mean is calculated as
follows:

Ẋ= 2+4+2+3+3+5+2/7
Ẋ= 21/7
Ẋ= 3

The mode
• It is the frequently occurring value in a set of
measurement
• If we have the following set of measurement:
n= 2, 4, 2, 3, 3, 5, 2, the mode is calculated as
follows:
1. Arrange the values in terms of magnitude
2, 2, 2, 3, 3, 4, 5
1. Determine the measure that occurs most
frequently - 2.
In the above, the mode is 2.

• There is no doubt that the mode is important but:
– It may not be unique to the rest of the set of values
– It cannot be expressed algebraically hence very few
statistical operations are developed around it.

The median
• It measures the centrality of values if they are
arranged in an ascending order of magnitude,
taking into consideration the oddity of some
values
• or the arithmetic average of the middle two
numbers if the set contains only even
numbers.

Example
• Calculate the median of the values below:
2, 4, 2, 3, 3, 5, 2,
Solution:
2, 2, 2, 3, 3, 4, 5
Median = 3 or
3+3
2
= 6/2
= 3

• The median divides the set of values into two
halves: one containing values below the median
and the other containing values above the
median.
• We could also have quartiles (division into 4),
deciles (division into tenths) and percentiles
(division into hundredths).
• The disadvantage of the median is that it involves
laborious arrangement of figures in their order of
magnitude.

The Mean of Grouped Data
• Raw data may be presented in the form of a
frequency table as in the example below about daily
receipts of a shopping mall for 500 days.
Daily receipts (₵) Number of days
< 0<100 10
100<200 30
200<300 50
300<400 80
400<500 100
500<600 85
600<700 75
700<800 40
800<900 25
900<1000 5

The formula:
Mean = Ẋ is given as below if r class intervals are
numbered i,…,r and mi, fi are the mid points of and the
number of measurements in the ith interval respectively.
n n
Ẋ= ∑ fi mi ∑ fi
i=n i=n
thus, if n is the total number of measurements, the
group mean is
n
Ẋ= ∑ fi mi /n
i=n

Daily receipts (₵) Number of days (fi ) Midpoints (mi ) fi mi
< 0<100 10 50 500
100<200 30 150 4,500
200<300 50 250 12,500
300<400 80 350 28,000
400<500 100 450 45,000
500<600 85 550 46,750
600<700 75 650 48,750
700<800 40 750 30,000
800<900 25 850 21,250
900<1000 5 950 4,750
500
∑ f m 241,650

n
Ẋ= ∑ fi mi /n
i=n
Ẋ = 241, 650/500
Ẋ = GHC 483.30

Dispersion
• A measure of centrality alone does not provide a sufficiently
adequate summary of a set of values
• Consider the two sets of values below
(a) -2, -1, 0, 0, 1, 2
(b) 0, 0, 0, 0, 0, 0
In both sets, the mean, mode and median are all 0.
The difference in character doesn’t lie in their centrality but in
their variation about the central value
In measurement of dispersion, there are various statistics:
– Standard Deviation and Variance are of utmost importance
– The Range (difference between the highest and lowest
measurement
– Inter-quartile range (difference between the median of the
higher and lower quartiles ).

Standard Deviation and Variance
• Finding the standard deviation (δ2) of a set of values
given as:
2, 4, 2, 3, 3, 5, 2,
– Find the mean of the distribution
2+4+2+3+3+5+2/7 =21/7= 3
– Find the deviation of each measure from the mean
-1, 1, -1, 0, 0, 2, -1
– Square the deviation of each measure from the mean
1, 1, 1, 0, 0, 4, 1

– Sum the squared deviations
1+1+1+0+0+4+1 =8
– Divide the sum of the deviations by the total
number of measurements to give you the variance
(δ) = 8/7=1.142857
– Take the square root of the variance to deal with
any distortion = √8/7 or √1.142857 =1.069045
Therefore, δ2 = 1.069045
NOTE: S.D = δ2

Formula
• This is given by the formula:
n n n
s = √ ∑ (xi- Ẋ) 2 0R δ2 = √ ∑ xi
2 - (∑ xi - Ẋ) 2
i=1 n i=1 i=1
n n
n
Ẋ= (∑ xi) /n
i=n
Large sample:
S = √ ∑f(X- Ẋ)2
∑f
Sample:
S = √ ∑f(X- Ẋ)2
n-1

REGRESSION AND CORRELATION
ANALYSIS
• Statistics involve analysis of variables :
– Dependent variable
– Independent variable
• Regression analysis concerns an explanation of the exact dependence of
one variable on another)
• Correlation analysis measures the degree of dependence of one variable
on the other
• Both regression and correlation analysis study the form of association
between a set of variables
• The power of these analysis is prediction of the effect of a given variable
on another variable given the former variable.
• E.g. we can predict output levels given the man hours, we could predict
sales volume given an amount of money spent on promotion.

Linear regression
Y = mX + c
This is a simple linear equation where X and Y are variables and m
and c are constants.
Y = 4x +6
Y = 3.5x +7.2
Y = 13.8x + 76.1
are all examples of linear equations that can be represented
graphically to show a straight line relationship but the objective is a
determination of a Best Fit on a scatter diagram through partial
differentiation.
It is also possible to have non-linear relationships whose graphical
representation is not in the form of a straight line.

Least Squares Regression
• In the least squares regression analysis, we are given two
functions to solve e.g.
∑yi = nc + m ∑ xi ………………………………(1)
∑xi yi = c∑ xi + m∑ xi
2 ……………………….(2)
To minimize,
m= n ∑xi yi – (∑ xi )(∑yi )………………………………(1
n ∑ xi
2 – (∑ xi
2 )
c = (∑yi)(∑ xi
2) - (∑ xi)(∑xi yi)
n(∑ xi
2) - (∑ xi ) 2
= 1/n (∑yi - m ∑ xi ) …………………………….(2)
OR Y= (∑XY/ ∑Y)X,
Where: x =X- Ẋ and y = y-Ӯ

Example
• The output of A&B Co is given in table 1. You are to
provide the best fit using the least square approach.
Weeks Total Output (X)
Independent variable
Total Cost (Y) –Dependent
Variable
1 2 11.2
2 3 15.6
3 5 20.3
4 4 20.8
5 1 7.8
6 3 10.6
7 2 12.3
8 4 21.5
9 5 22
10 6 27.6

SOLUTION
Weeks Total
Output (X)
Total Cost
(Y)
XY X2 Y2
1 2 11.2 22.4 4 125.44
2 3 15.6 46.8 9 243.36
3 5 20.3 101.5 25 412.09
4 4 20.8 83.2 16 432.64
5 1 7.8 7.8 1 60.84
6 3 10.6 31.8 9 112.36
7 2 12.3 24.6 4 151.29
8 4 21.5 86 16 462.25
9 5 22 110 25 484
10 6 27.6 165.6 36 761.76
TOALT 35 169.7 679.7 145 3,246.03

• Given:
Y= (∑XY/ ∑Y)X, Where: x =X- Ẋ and y = y-Ӯ
Then,
Y= (679.7/169.7)35
Or
m=(10 x 679.7) – (35x169.7)/(10x145) – (35)2 = 3.811
c= 1/10({169.7 – (3.811 x 35)} = 3.63
Y=3.811X + 3.63

Summary statistics (1)

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Summary statistics (1)

Similar to Summary statistics (1) (20)

Summary statistics (1)