MEDIAN:The median is the middle value of the series when the variablevalues are placed in order of magnitude.The median is defined as a value which divides a set of data into twohalves, one half comprising of observations greater than and theother half smaller than it. More precisely, the median is a value at orbelow which 50% of the data lie.The median value can be ascertained by inspection in many series.For instance, in this very example, the data that we obtained was:EXAMPLE-1:The average number of floors in the buildings at the centre of a city:5, 4, 3, 4, 5, 4, 3, 4, 5, 20, 5, 6, 32, 8, 27 Arranging these values in ascending order, we obtain3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 8, 20, 27, 32Picking up the middle value, we obtain the median equal to 5.
Interpretation:The median number of floors is 5. Out of those 15 buildings, 7 haveupto 5 floors and 7 have 5 floors or more. We noticed earlier thatthe arithmetic mean was distorted toward the few extremely highvalues in the series and hence became unrepresentative. Themedian = 5 is much more representative of this series.
Height of buildings (number of floors)3344 7 lower4555 = median height5568 7 higher202732
Retail price of motor-car (£)(several makes and sizes)415480 4 above525608719 = median price1,0902,059 4 above4,0006,000
A slight complication arises when there are even numbers ofobservations in the series, for now there are two middle values. Number of passengers travelling on a bus at six Different times during the day 4 9 14 = median value 18 23 47 14 + 18 Median = = 16 passengers 2
Median in Case of a Frequency Distribution of a ContinuousVariable:In case of a frequency distribution, the median is given by theformula ~ hn X = l + − cWhere f 2 l =lower class boundary of the median class (i.e. that class for whichthe cumulative frequency is just in excess of n/2).h=class interval size of the median classf =frequency of the median classn=Σf (the total number of observations)c =cumulative frequency of the class preceding the median class
Example:Going back to the example of the EPA mileage ratings, we have No. Mileage Class Cumulative of Rating Boundaries Frequency Cars 30.0 – 32.9 2 29.95 – 32.95 2 33.0 – 35.9 4 32.95 – 35.95 6 36.0 – 38.9 14 35.95 – 38.95 20 39.0 – 41.9 8 38.95 – 41.95 28 42.0 – 44.9 2 41.95 – 44.95 30In this example, n = 30 and n/2 = 15.Thus the third class is the median class. The median lies somewherebetween 35.95 and 38.95. Applying the above formula, we obtain
~ 3 X = 35.95 + (15 − 6 ) 14 = 35.95 + 1.93 = 37.88 ~ 37.9 −Interpretation:This result implies that half of the cars have mileage less than or upto 37.88 miles per gallon whereas the other half of the cars hasmileage greater than 37.88 miles per gallon. As discussed earlier, themedian is preferable to the arithmetic mean when there are a fewvery high or low figures in a series. It is also exceedingly valuablewhen one encounters a frequency distribution having open-endedclass intervals.The concept of open-ended frequency distribution can beunderstood with the help of the following example.
WAGES OF WORKERS IN A FACTORY Monthly Income No. of (in Rupees) Workers Less than 2000/- 100 2000/- to 2999/- 300 3000/- to 3999/- 500 4000/- to 4999/- 250 5000/- and above 50 Total 1200In this example, both the first class and the last class are open-ended classes. This is so because of the fact that we do not haveexact figures to begin the first class or to end the last class. Theadvantage of computing the median in the case of an open-endedfrequency distribution is that, except in the unlikely event of themedian falling within an open-ended group occurring in thebeginning of our frequency distribution, there is no need to estimatethe upper or lower boundary.’.
This is so because of the fact that, if the median is falling in anintermediate class, then, obviously, the first class is not beinginvolved in its computation.The next concept that we will discuss isthe empirical relation between the mean, median and the mode.This is a concept which is not based on a rigid mathematicalformula; rather, it is based on observation. In fact, the word‘empirical’ implies ‘based on observation
QUARTILES The quartiles, together with the median, achieve the division of the totalarea into four equal parts. The first, second and third quartiles are given by the formulae: First quartile: h n Q1 = l + − c f 4 Second quartile (i.e. median): h 2n h Q2 = l + − c = l + ( n 2 − c ) f 4 f
Third quartile: h 3n Q3 =l + −c f 4 f 25% 25% 25% 25% ~ X Q 1 Q2 = X Q3
The deciles and the percentiles given the division of the total areainto 10 and 100 equal parts respectively. h n D1 = l + − c f 10 h 2n D2 = l + − c f 10 h 3n D3 = l + −c f 10
h n P1 = l + − c f 100 h 2n P2 = l + − c f 100
Again, it is easily seen that the 50th percentile is the same as the median,the 25th percentile is the same as the 1st quartile, the 75th percentile is thesame as the 3rd quartile, the 40th percentile is the same as the 4th decile,and so on. All these measures i.e. the median, quartiles, deciles andpercentiles are collectively called quantiles. The question is, “What isthe significance of this concept of partitioning? Why is it that we wish todivide our frequency distribution into two, four, ten or hundred parts?”The answer to the above questions is: In certain situations, we may beinterested in describing the relative quantitative location of a particularmeasurement within a data set. Quantiles provide us with an easy way ofachieving this. Out of these various quantiles, one of the most frequentlyused is percentile ranking.
THE MODE:The Mode is defined as that value which occurs most frequently in aset of data i.e. it indicates the most common result.EXAMPLE:Suppose that the marks of eight students in a particular test are asfollows: 2, 7, 9, 5, 8, 9, 10, 9Obviously, the most common mark is 9. In other words, Mode = 9.
THE MODE IN CASE OF THE FREQUENCY DISTRIBUTION OF ACONTINUOUS VARIABLE:In case of grouped data, the modal group is easily recognizable (theone that has the highest frequency).At what point within the modal group does the mode lie?The answer is contained in the following formula: Mode: ˆ f m − f1 X = 1+ xh ( fm − f1 ) + ( fm − f2 )
ˆ f m − f1 X = 1+ xh ( fm − f1 ) + ( fm − f2 )Wherel = lower class boundary of the modal class,fm = frequency of the modal class,f1 = frequency of the class preceding the modal class,f2 = frequency of the class following modal class, andh = length of class interval of the modal class
Class Boundaries No. of Cars Mileage Rating 30.0 – 32.9 29.95 – 32.95 2 33.0 – 35.9 32.95 – 35.95 4 = f1 36.0 – 38.9 35.95 – 38.95 14 = fm 39.0 – 41.9 38.95 – 41.95 8 = f2 42.0 – 44.9 41.95 – 44.95 2It is evident that the third class is the modal class.The mode lies somewhere between 35.95 and 38.95.In order to apply the formula for the mode, we note thatfm = 14, f1 = 4 and f2 = 8. ˆ 14 − 4 X = .95 + 35 ×3 (14 −4 ) + 14 − ) ( 8 10 = .95 + 35 × 3 10 + 6 = .95 + .875 35 1 = .825 37
DESIRABLE PROPERTIES OF THE MODE: •The mode is easily understood and easily ascertained in case of adiscrete frequency distribution. •It is not affected by a few very high or low values. The question arises, “When should we use the mode?” The answer to this question is that the mode is a valuable concept incertain situations such as the one described below: Suppose the manager of a men’s clothing store is asked about theaverage size of hats sold. He will probably think not of the arithmetic orgeometric mean size, or indeed the median size. Instead, he will in alllikelihood quote that particular size which is sold most often. This averageis of far more use to him as a businessman than the arithmetic mean,geometric mean or the median. The modal size of all clothing is the sizewhich the businessman must stock in the greatest quantity and variety incomparison with other sizes. Indeed, in most inventory (stock level)problems, one needs the mode more often than any other measure of centraltendency. It should be noted that in some situations there may be no mode ina simple series where no value occurs more than once.
Measures of VariabilityConsider the following two data sets.Set I: 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11Set II: 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8Compute the mean, median, and mode of each of the two data sets. Asyou see from your results, the two data sets have the same mean, the samemedian, and the same mode, all equal to 6. The two data sets also happento have the same number of observations, n =12. But the two data sets aredifferent. What is the main difference between them?
Measures of VariabilityConsider the following two data sets.Set I: 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11Set II: 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8Compute the mean, median, and mode of each of the two data sets. Asyou see from your results, the two data sets have the same mean, the samemedian, and the same mode, all equal to 6. The two data sets also happento have the same number of observations, n =12. But the two data sets aredifferent. What is the main difference between them?Figure shows data sets I and II. The two data sets have the same centraltendency, but they have a different variability. In particular, we see thatdata set I is more variable than data set II. The values in set I are morespread out: they lie farther away from their mean than do those of set II.
Short cut Formula for variance and Standard deviation x 2 x 2 ∑ δ = 2 ∑ − n n and x 2 x 2 ∑ S= ∑ − n n
Life (in No. of Bulbs Mid-point Hundreds of fx fx2 f x Hours) 0–5 4 2.5 10.0 25.0 5 – 10 9 7.5 67.5 506.25 10 – 20 38 15.0 570.0 8550.0 20 – 40 33 30.0 990.0 29700.0 40 and over 16 50.0 800.0 40000.0 100 2437.5 78781.25 fx 2 78781.25 2437.5 2 ∑ ∑ fx 2 S= − S= − n n 100 100 =13.9 hundred hours= 1390 hours
Coefficient of S.D:The coefficient of variation CV describesthe standard deviation as a percent of the mean. S tan dand Deviation CV = ×100 Mean The table at the left shows the heights (in inches) and weights (in pounds) of the members of a basketball team. Find the coefficient of variation for each data set. What can you conclude?