SlideShare a Scribd company logo
1 of 24
SUMMARY STATISTICS
July 2014
MEASURES OF CENTRAL TENDENCY
• We depend on volumes of data to make
various strategic decisions in business.
• Dealing with large volumes of data comes with
various challenges
• To the production foreman, detail is of
essence but
• To top management, it is better to summarise
the data for easy management because prime
interest is on overall profitability
• Two items are always of utmost importance:
– Measure of central tendency (about the middle of the
distribution)
– Measure of dispersion (about the centre)
Mean, Mode, Median and Geometric
Mean
• The Mean
– It is the sum of the data divided by the number of items
constituting the data
Mean = n
∑ xi/n This the same as x1 + x2 + x3….+ xn
i=1
– If we have the following set of measurement:
n= 2, 4, 2, 3, 3, 5, 2, the mean is calculated as
follows:
Ẋ= 2+4+2+3+3+5+2/7
Ẋ= 21/7
Ẋ= 3
The mode
• It is the frequently occurring value in a set of
measurement
• If we have the following set of measurement:
n= 2, 4, 2, 3, 3, 5, 2, the mode is calculated as
follows:
1. Arrange the values in terms of magnitude
2, 2, 2, 3, 3, 4, 5
1. Determine the measure that occurs most
frequently - 2.
In the above, the mode is 2.
• There is no doubt that the mode is important but:
– It may not be unique to the rest of the set of values
– It cannot be expressed algebraically hence very few
statistical operations are developed around it.
The median
• It measures the centrality of values if they are
arranged in an ascending order of magnitude,
taking into consideration the oddity of some
values
• or the arithmetic average of the middle two
numbers if the set contains only even
numbers.
Example
• Calculate the median of the values below:
2, 4, 2, 3, 3, 5, 2,
Solution:
2, 2, 2, 3, 3, 4, 5
Median = 3 or
3+3
2
= 6/2
= 3
• The median divides the set of values into two
halves: one containing values below the median
and the other containing values above the
median.
• We could also have quartiles (division into 4),
deciles (division into tenths) and percentiles
(division into hundredths).
• The disadvantage of the median is that it involves
laborious arrangement of figures in their order of
magnitude.
The Mean of Grouped Data
• Raw data may be presented in the form of a
frequency table as in the example below about daily
receipts of a shopping mall for 500 days.
Daily receipts (₵) Number of days
< 0<100 10
100<200 30
200<300 50
300<400 80
400<500 100
500<600 85
600<700 75
700<800 40
800<900 25
900<1000 5
The formula:
Mean = Ẋ is given as below if r class intervals are
numbered i,…,r and mi, fi are the mid points of and the
number of measurements in the ith interval respectively.
n n
Ẋ= ∑ fi mi ∑ fi
i=n i=n
thus, if n is the total number of measurements, the
group mean is
n
Ẋ= ∑ fi mi /n
i=n
Daily receipts (₵) Number of days (fi ) Midpoints (mi ) fi mi
< 0<100 10 50 500
100<200 30 150 4,500
200<300 50 250 12,500
300<400 80 350 28,000
400<500 100 450 45,000
500<600 85 550 46,750
600<700 75 650 48,750
700<800 40 750 30,000
800<900 25 850 21,250
900<1000 5 950 4,750
500
∑ f m 241,650
n
Ẋ= ∑ fi mi /n
i=n
Ẋ = 241, 650/500
Ẋ = GHC 483.30
Dispersion
• A measure of centrality alone does not provide a sufficiently
adequate summary of a set of values
• Consider the two sets of values below
(a) -2, -1, 0, 0, 1, 2
(b) 0, 0, 0, 0, 0, 0
In both sets, the mean, mode and median are all 0.
The difference in character doesn’t lie in their centrality but in
their variation about the central value
In measurement of dispersion, there are various statistics:
– Standard Deviation and Variance are of utmost importance
– The Range (difference between the highest and lowest
measurement
– Inter-quartile range (difference between the median of the
higher and lower quartiles ).
Standard Deviation and Variance
• Finding the standard deviation (δ2) of a set of values
given as:
2, 4, 2, 3, 3, 5, 2,
– Find the mean of the distribution
2+4+2+3+3+5+2/7 =21/7= 3
– Find the deviation of each measure from the mean
-1, 1, -1, 0, 0, 2, -1
– Square the deviation of each measure from the mean
1, 1, 1, 0, 0, 4, 1
– Sum the squared deviations
1+1+1+0+0+4+1 =8
– Divide the sum of the deviations by the total
number of measurements to give you the variance
(δ) = 8/7=1.142857
– Take the square root of the variance to deal with
any distortion = √8/7 or √1.142857 =1.069045
Therefore, δ2 = 1.069045
NOTE: S.D = δ2
Formula
• This is given by the formula:
n n n
s = √ ∑ (xi- Ẋ) 2 0R δ2 = √ ∑ xi
2 - (∑ xi - Ẋ) 2
i=1 n i=1 i=1
n n
n
Ẋ= (∑ xi) /n
i=n
Large sample:
S = √ ∑f(X- Ẋ)2
∑f
Sample:
S = √ ∑f(X- Ẋ)2
n-1
REGRESSION AND CORRELATION
ANALYSIS
• Statistics involve analysis of variables :
– Dependent variable
– Independent variable
• Regression analysis concerns an explanation of the exact dependence of
one variable on another)
• Correlation analysis measures the degree of dependence of one variable
on the other
• Both regression and correlation analysis study the form of association
between a set of variables
• The power of these analysis is prediction of the effect of a given variable
on another variable given the former variable.
• E.g. we can predict output levels given the man hours, we could predict
sales volume given an amount of money spent on promotion.
Linear regression
Y = mX + c
This is a simple linear equation where X and Y are variables and m
and c are constants.
Y = 4x +6
Y = 3.5x +7.2
Y = 13.8x + 76.1
are all examples of linear equations that can be represented
graphically to show a straight line relationship but the objective is a
determination of a Best Fit on a scatter diagram through partial
differentiation.
It is also possible to have non-linear relationships whose graphical
representation is not in the form of a straight line.
Least Squares Regression
• In the least squares regression analysis, we are given two
functions to solve e.g.
∑yi = nc + m ∑ xi ………………………………(1)
∑xi yi = c∑ xi + m∑ xi
2 ……………………….(2)
To minimize,
m= n ∑xi yi – (∑ xi )(∑yi )………………………………(1
n ∑ xi
2 – (∑ xi
2 )
c = (∑yi)(∑ xi
2) - (∑ xi)(∑xi yi)
n(∑ xi
2) - (∑ xi ) 2
= 1/n (∑yi - m ∑ xi ) …………………………….(2)
OR Y= (∑XY/ ∑Y)X,
Where: x =X- Ẋ and y = y-Ӯ
Example
• The output of A&B Co is given in table 1. You are to
provide the best fit using the least square approach.
Weeks Total Output (X)
Independent variable
Total Cost (Y) –Dependent
Variable
1 2 11.2
2 3 15.6
3 5 20.3
4 4 20.8
5 1 7.8
6 3 10.6
7 2 12.3
8 4 21.5
9 5 22
10 6 27.6
SOLUTION
Weeks Total
Output (X)
Total Cost
(Y)
XY X2 Y2
1 2 11.2 22.4 4 125.44
2 3 15.6 46.8 9 243.36
3 5 20.3 101.5 25 412.09
4 4 20.8 83.2 16 432.64
5 1 7.8 7.8 1 60.84
6 3 10.6 31.8 9 112.36
7 2 12.3 24.6 4 151.29
8 4 21.5 86 16 462.25
9 5 22 110 25 484
10 6 27.6 165.6 36 761.76
TOALT 35 169.7 679.7 145 3,246.03
• Given:
Y= (∑XY/ ∑Y)X, Where: x =X- Ẋ and y = y-Ӯ
Then,
Y= (679.7/169.7)35
Or
m=(10 x 679.7) – (35x169.7)/(10x145) – (35)2 = 3.811
c= 1/10({169.7 – (3.811 x 35)} = 3.63
Y=3.811X + 3.63

More Related Content

What's hot

What's hot (14)

Lesson 8 zscore
Lesson 8 zscoreLesson 8 zscore
Lesson 8 zscore
 
regression
regressionregression
regression
 
Central tendency
Central tendencyCentral tendency
Central tendency
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
 
Q.t
Q.tQ.t
Q.t
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Central tendency
Central tendencyCentral tendency
Central tendency
 
z-scores
z-scoresz-scores
z-scores
 
The Normal Distribution and Other Continuous Distributions
The Normal Distribution and Other Continuous DistributionsThe Normal Distribution and Other Continuous Distributions
The Normal Distribution and Other Continuous Distributions
 
S1 pn
S1 pnS1 pn
S1 pn
 
Measure of-central-tendency-ppt
Measure of-central-tendency-pptMeasure of-central-tendency-ppt
Measure of-central-tendency-ppt
 
Measures of central tendency by MHM
Measures of central tendency by MHMMeasures of central tendency by MHM
Measures of central tendency by MHM
 

Similar to Summary statistics (1)

STATISTCAL MEASUREMENTS.pptx
STATISTCAL MEASUREMENTS.pptxSTATISTCAL MEASUREMENTS.pptx
STATISTCAL MEASUREMENTS.pptxShyma Jugesh
 
Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviationAdebanji Ayeni
 
Lecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptxLecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptxNabeelAli89
 
MEASURE OF CENTRAL TENDENCY
MEASURE OF CENTRAL TENDENCY  MEASURE OF CENTRAL TENDENCY
MEASURE OF CENTRAL TENDENCY AB Rajar
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfRavinandan A P
 
DESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdff
DESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdffDESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdff
DESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdffmenaguado
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdfAmanuelDina
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptxKinzaSuhail2
 
Interpolation and Extrapolation
Interpolation and ExtrapolationInterpolation and Extrapolation
Interpolation and ExtrapolationVNRacademy
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptxjeyanthisivakumar
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptxTeddyteddy53
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
Biostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacyBiostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacyBatizemaryam
 
ch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingwindri3
 
measures-of-variability-11.ppt
measures-of-variability-11.pptmeasures-of-variability-11.ppt
measures-of-variability-11.pptNievesGuardian1
 

Similar to Summary statistics (1) (20)

STATISTCAL MEASUREMENTS.pptx
STATISTCAL MEASUREMENTS.pptxSTATISTCAL MEASUREMENTS.pptx
STATISTCAL MEASUREMENTS.pptx
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
 
Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviation
 
Statistics
StatisticsStatistics
Statistics
 
Lecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptxLecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptx
 
MEASURE OF CENTRAL TENDENCY
MEASURE OF CENTRAL TENDENCY  MEASURE OF CENTRAL TENDENCY
MEASURE OF CENTRAL TENDENCY
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
 
DESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdff
DESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdffDESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdff
DESCRIPTIVE-STATISTICS.pptxxxxxxcxxxcxdff
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptx
 
Interpolation and Extrapolation
Interpolation and ExtrapolationInterpolation and Extrapolation
Interpolation and Extrapolation
 
Variability
VariabilityVariability
Variability
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptx
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptx
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
Biostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacyBiostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacy
 
ch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursingch-4-measures-of-variability-11 2.ppt for nursing
ch-4-measures-of-variability-11 2.ppt for nursing
 
measures-of-variability-11.ppt
measures-of-variability-11.pptmeasures-of-variability-11.ppt
measures-of-variability-11.ppt
 

Summary statistics (1)

  • 2. MEASURES OF CENTRAL TENDENCY • We depend on volumes of data to make various strategic decisions in business. • Dealing with large volumes of data comes with various challenges • To the production foreman, detail is of essence but • To top management, it is better to summarise the data for easy management because prime interest is on overall profitability
  • 3. • Two items are always of utmost importance: – Measure of central tendency (about the middle of the distribution) – Measure of dispersion (about the centre)
  • 4. Mean, Mode, Median and Geometric Mean • The Mean – It is the sum of the data divided by the number of items constituting the data Mean = n ∑ xi/n This the same as x1 + x2 + x3….+ xn i=1 – If we have the following set of measurement: n= 2, 4, 2, 3, 3, 5, 2, the mean is calculated as follows:
  • 6. The mode • It is the frequently occurring value in a set of measurement • If we have the following set of measurement: n= 2, 4, 2, 3, 3, 5, 2, the mode is calculated as follows: 1. Arrange the values in terms of magnitude 2, 2, 2, 3, 3, 4, 5 1. Determine the measure that occurs most frequently - 2. In the above, the mode is 2.
  • 7. • There is no doubt that the mode is important but: – It may not be unique to the rest of the set of values – It cannot be expressed algebraically hence very few statistical operations are developed around it.
  • 8. The median • It measures the centrality of values if they are arranged in an ascending order of magnitude, taking into consideration the oddity of some values • or the arithmetic average of the middle two numbers if the set contains only even numbers.
  • 9. Example • Calculate the median of the values below: 2, 4, 2, 3, 3, 5, 2, Solution: 2, 2, 2, 3, 3, 4, 5 Median = 3 or 3+3 2 = 6/2 = 3
  • 10. • The median divides the set of values into two halves: one containing values below the median and the other containing values above the median. • We could also have quartiles (division into 4), deciles (division into tenths) and percentiles (division into hundredths). • The disadvantage of the median is that it involves laborious arrangement of figures in their order of magnitude.
  • 11. The Mean of Grouped Data • Raw data may be presented in the form of a frequency table as in the example below about daily receipts of a shopping mall for 500 days. Daily receipts (₵) Number of days < 0<100 10 100<200 30 200<300 50 300<400 80 400<500 100 500<600 85 600<700 75 700<800 40 800<900 25 900<1000 5
  • 12. The formula: Mean = Ẋ is given as below if r class intervals are numbered i,…,r and mi, fi are the mid points of and the number of measurements in the ith interval respectively. n n Ẋ= ∑ fi mi ∑ fi i=n i=n thus, if n is the total number of measurements, the group mean is n Ẋ= ∑ fi mi /n i=n
  • 13. Daily receipts (₵) Number of days (fi ) Midpoints (mi ) fi mi < 0<100 10 50 500 100<200 30 150 4,500 200<300 50 250 12,500 300<400 80 350 28,000 400<500 100 450 45,000 500<600 85 550 46,750 600<700 75 650 48,750 700<800 40 750 30,000 800<900 25 850 21,250 900<1000 5 950 4,750 500 ∑ f m 241,650
  • 14. n Ẋ= ∑ fi mi /n i=n Ẋ = 241, 650/500 Ẋ = GHC 483.30
  • 15. Dispersion • A measure of centrality alone does not provide a sufficiently adequate summary of a set of values • Consider the two sets of values below (a) -2, -1, 0, 0, 1, 2 (b) 0, 0, 0, 0, 0, 0 In both sets, the mean, mode and median are all 0. The difference in character doesn’t lie in their centrality but in their variation about the central value In measurement of dispersion, there are various statistics: – Standard Deviation and Variance are of utmost importance – The Range (difference between the highest and lowest measurement – Inter-quartile range (difference between the median of the higher and lower quartiles ).
  • 16. Standard Deviation and Variance • Finding the standard deviation (δ2) of a set of values given as: 2, 4, 2, 3, 3, 5, 2, – Find the mean of the distribution 2+4+2+3+3+5+2/7 =21/7= 3 – Find the deviation of each measure from the mean -1, 1, -1, 0, 0, 2, -1 – Square the deviation of each measure from the mean 1, 1, 1, 0, 0, 4, 1
  • 17. – Sum the squared deviations 1+1+1+0+0+4+1 =8 – Divide the sum of the deviations by the total number of measurements to give you the variance (δ) = 8/7=1.142857 – Take the square root of the variance to deal with any distortion = √8/7 or √1.142857 =1.069045 Therefore, δ2 = 1.069045 NOTE: S.D = δ2
  • 18. Formula • This is given by the formula: n n n s = √ ∑ (xi- Ẋ) 2 0R δ2 = √ ∑ xi 2 - (∑ xi - Ẋ) 2 i=1 n i=1 i=1 n n n Ẋ= (∑ xi) /n i=n Large sample: S = √ ∑f(X- Ẋ)2 ∑f Sample: S = √ ∑f(X- Ẋ)2 n-1
  • 19. REGRESSION AND CORRELATION ANALYSIS • Statistics involve analysis of variables : – Dependent variable – Independent variable • Regression analysis concerns an explanation of the exact dependence of one variable on another) • Correlation analysis measures the degree of dependence of one variable on the other • Both regression and correlation analysis study the form of association between a set of variables • The power of these analysis is prediction of the effect of a given variable on another variable given the former variable. • E.g. we can predict output levels given the man hours, we could predict sales volume given an amount of money spent on promotion.
  • 20. Linear regression Y = mX + c This is a simple linear equation where X and Y are variables and m and c are constants. Y = 4x +6 Y = 3.5x +7.2 Y = 13.8x + 76.1 are all examples of linear equations that can be represented graphically to show a straight line relationship but the objective is a determination of a Best Fit on a scatter diagram through partial differentiation. It is also possible to have non-linear relationships whose graphical representation is not in the form of a straight line.
  • 21. Least Squares Regression • In the least squares regression analysis, we are given two functions to solve e.g. ∑yi = nc + m ∑ xi ………………………………(1) ∑xi yi = c∑ xi + m∑ xi 2 ……………………….(2) To minimize, m= n ∑xi yi – (∑ xi )(∑yi )………………………………(1 n ∑ xi 2 – (∑ xi 2 ) c = (∑yi)(∑ xi 2) - (∑ xi)(∑xi yi) n(∑ xi 2) - (∑ xi ) 2 = 1/n (∑yi - m ∑ xi ) …………………………….(2) OR Y= (∑XY/ ∑Y)X, Where: x =X- Ẋ and y = y-Ӯ
  • 22. Example • The output of A&B Co is given in table 1. You are to provide the best fit using the least square approach. Weeks Total Output (X) Independent variable Total Cost (Y) –Dependent Variable 1 2 11.2 2 3 15.6 3 5 20.3 4 4 20.8 5 1 7.8 6 3 10.6 7 2 12.3 8 4 21.5 9 5 22 10 6 27.6
  • 23. SOLUTION Weeks Total Output (X) Total Cost (Y) XY X2 Y2 1 2 11.2 22.4 4 125.44 2 3 15.6 46.8 9 243.36 3 5 20.3 101.5 25 412.09 4 4 20.8 83.2 16 432.64 5 1 7.8 7.8 1 60.84 6 3 10.6 31.8 9 112.36 7 2 12.3 24.6 4 151.29 8 4 21.5 86 16 462.25 9 5 22 110 25 484 10 6 27.6 165.6 36 761.76 TOALT 35 169.7 679.7 145 3,246.03
  • 24. • Given: Y= (∑XY/ ∑Y)X, Where: x =X- Ẋ and y = y-Ӯ Then, Y= (679.7/169.7)35 Or m=(10 x 679.7) – (35x169.7)/(10x145) – (35)2 = 3.811 c= 1/10({169.7 – (3.811 x 35)} = 3.63 Y=3.811X + 3.63