Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this document? Why not share!

- Measures of central tendency by kreshajay 546 views
- Measures of central tendency - STAT... by indianeducation 594 views
- Mean, median, mode, Standard deviat... by Renzil D'Cruz 7967 views
- Statistik ppg bab 1-hantar by Miccaill Casparov 16652 views
- Introduction to statistics and Meas... by www.shakehandwith... 6943 views
- Measure of central tendency (Mean, ... by www.shakehandwith... 33861 views

1,844 views

1,772 views

1,772 views

Published on

No Downloads

Total views

1,844

On SlideShare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

44

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Topic 4 Measures of Central Tendency LEARNING OUTCOMES By the end of this topic, you should be able to: 1. explain the concept of measure of central tendency in the description of data distribution; 2. obtain mean, mode, median and quantiles; 3. state the empirical relationship between mean, mode and median; 4. calculate the inter-quartile range; and 5. estimate the median and quartiles from cumulative distribution. INTRODUCTIONIn this topic, you will learn about the position measurements consisting of mean,mode, median, and quantiles. The quantiles will include quartiles, deciles andpercentiles. A good understanding of these concepts is important as they will helpyou to describe the data distribution. What is the role of position measurements to describe data distribution?
- 2. 38 TOPIC 4 MEASURE OF CENTRAL TENDENCY 4.1 MEASUREMENT OF CENTRAL TENDENCYThese measurements are real numbers located on the horizontal line where theoriginal raw data are plotted. Sometimes, the above real line is called line of data.The numbers are obtained by using an appropriate formula. The numbers such asmean, mode and median are examples of above measurement. The numbers ingeneral can be used to describe the property or characteristics of the datadistribution. The figures further can be used to infer the characteristic of thepopulation distribution.Some roles of the above measurements are:(a) Describing a Quantitative Feature of Sample Data Let us discuss about mean of distribution data. As this number is calculated by averaging of all data, it therefore can be considered as the centre of the whole observations. This number will tell us in general that most likely all observations should be scattered around the mean. For example, if the mean of a given data set is 40, then we could expect that majority of observations must be located around the number 40 as their centre position. The second quantitative feature is that all observations must have the same order as their mean. This means that the observations are possibly of two digits and fairly close to the mean 40. A possible data set is: 30, 32, 36, 38, 40, 41, 43, 42, 48 and 50. However, three digits number or of higher order such as 100, 1000 are less likely to belong to any set of data whose mean is 40. The third feature is that as a centre, the mean is actually tells us the location or position of data distribution. For data set having mean 60, its distribution will be located to the right of the above data set. Similarly, data set having mean 10 will be located to the left of the previous data set.(b) Describing the Proportion Feature of the Data Set Supposing the raw data has been arranged in ascending order and plotted on the line of data. Then, The number Q1 located on the data line which makes the first 25% (i.e. a proportion of one fourth) of the data comprise of observations having values less than Q1is called the first quartile. The second quantity Q2 located on the data line which makes about 50% (i.e. a proportion of one half) of the data comprise of observations having values less than Q2 is called the second quartile. The second
- 3. TOPIC 4 MEASURE OF CENTRAL TENCENCY 39 quartile is also called the median of the distribution which divides the whole distribution into two equal parts. The third quantity Q3 located on the data line which makes about 75% (i.e. a proportion of three fourth) of the data comprise of observations having values less than Q3 is called the third quartile. The above three quartiles are common quantities beside the mean and standard deviation used to describe the distribution of data. It is clearly understood that the three quartiles divide the whole distribution into four equal parts. Figure 4.1 shows the positions of the first two quartiles. Can you locate the third quartile on the same figure? There are many other quantities describing proportions such as deciles and percentiles. We have nine deciles which divide the whole distribution into ten equal parts. As for the percentiles, there are 99 percentiles which divide the whole distribution into 100 equal parts. Deciles and percentiles will be described in Section 4.5. Figure 4.1: The positions of the first two quartiles of the books distribution on weekly sales given in Topic 24.2 THE MEANThe mean or arithmetic mean of a set of n numbers x1,x2,...,xn which given asymbol μ (read miu) is defined as the average of all numbers and given by thefollowing formula:
- 4. 40 TOPIC 4 MEASURE OF CENTRAL TENDENCY Formula 4.1In this module, all calculations will involve all observations therefore we considerthe given data as a population. In case of sample mean, which we are not using,the denominator n – 1, instead of n. Example 4.1Calculate the mean of set numbers 3, 6, 7, 2, 4, 5, and 8.SolutionBy using the Formula 4.1, the arithmetic mean is given by 3 6 7 2 4 5 8 35 5.0 7 7Example 4.1(a)Find the mean of books on weekly sales.Solution The mean = μ = 3296 = 65.92 66 books 50
- 5. TOPIC 4 MEASURE OF CENTRAL TENCENCY 414.2.1 The Mean of Repeated NumbersSupposing we have k different numbers with frequencies of repetition as given inthe following table: Numbers x1 x2 … xk-1 xk Frequency f1 f2 … fk-1 fkThen their mean is given by the following Formula 4.1(a). f1 x1 f 2 x2 ... f k 1 xk 1 f k xk fi xi f1 f 2 ... f k 1 f k fi where i 1, 2,..., k Formula 4.1(a)Here the total frequencies f1 f2 f3 ... fk 1 fk = f i = n; the total number of observations. Example 4.2Obtain the mean of the following set of data:2, 3, 4, 7, 4, 5, 2, 6, 5, 7, 7, 6, 5, 8, 3, 5, 4, 9, 5, 7, 3, 5, 8, 4 , 6, 2, 9Solution(a) Arrange the data in ascending order, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 9(b) Form the frequency table of each individual number Number(x) 2 3 4 5 6 7 8 9 Frequency (f) 3 3 4 6 3 4 2 2
- 6. 42 TOPIC 4 MEASURE OF CENTRAL TENDENCY(c) By using Formula 4.1(a), the mean is = 3(2) + 3(3) + 4(4) + 6(5) + 3(6) + 4(7) + 2(8) + 2(9) 3 + 3 + 4 + 6 + 3 + 4 + 2 + 2 141 5.2 27As we can see that the numbers are scattered around the mean value 5.2. Thenumbers are also of the same order of their mean value as shown by Figure 4.2. Figure 4.2: Centre of dataFor small size data, scatter plot as shown in Figure 4.2 can be used to clarify theconcept of mean which play the role as centre of distribution. However, for largesize data, histogram of frequency distribution will be more appropriate.By looking at the Formulas 4.1 and 4.1(a) the calculation of mean involves alldata from the smallest to the largest value in the data set. Thus, either extremelylarge value data or extremely small value data, even both of them will affect thevalue of mean.Example 4.2(a)Here is the set of annual incomes of four employees of a company:RM4,000; RM5,000; RM5,500; and RM30,000.(a) Obtain the mean of the annual income.(b) Give your comment on the values of the income.(c) Give your comment on the value of the mean obtained. Can the value of mean play the role as centre of the given data set?
- 7. TOPIC 4 MEASURE OF CENTRAL TENCENCY 43Solution(a) The mean is given by, = RM4000 + RM5000 + RM5500 +RM30000 = RM11,125 4(b) The RM30,000 is extremely large as compared to the other three income. Apparently this data does not belong the group of the first three income.(c) The extreme value RM30,000 is shifting the actual position of mean to the right. Since majority of the income is less than RM6,000 therefore the figure RM11,125 is not appropriate to be called the centre of the first three income. It would be better if the fourth employee is removed from the group. This will make the mean of the first three income become RM4,833 which is more appropriate to represent the centre of the majority income i.e. the first three income.The Mean of Grouped DataIn case we have a large number of data, it should be grouped into K classes. Eachclass will be represented by its class mid-point. Let the K class mid-points bex1,x2,...,xk and their respective frequencies be f1,f2,...,fk, as given in the followingtable: Class mid-points x1 x2 … xk-1 xk Frequency f1 f2 … fk-1 fkThen, the mean of the data will be estimated by the mean of the above mid-pointsand is given by the following Formula 4.1(b). f1 x1 f 2 x2 ... f k 1 xk 1 f k xk fi xi f1 f 2 ... f k 1 f k fi Where i 1, 2,..., k Formula 4.1(b)
- 8. 44 TOPIC 4 MEASURE OF CENTRAL TENDENCYJust to recall that all observations in each class have been “forgotten” andreplaced by the class mid-point. As such, the mean that we will obtain throughFormula 4.1(b) is just an approximation to the actual mean of the data. Example 4.3Let us refer back to the frequency of books on weekly sales given in Table 2.6 ofTopic 2 and obtain the approximate mean number of books on weekly sales. Thetable is copied to Table 4.1 below together with mid-points of each class. Table 4.1: The Frequency Distribution of Books on Weekly Sales Class Class Mid- Frequency f x point (x) (f) (f Multiplies x) 34 - 43 38.5 2 77 44 - 53 48.5 5 242.5 54 - 63 58.5 12 702 64 - 73 68.5 18 1233 74 - 83 78.5 10 785 84 - 93 88.5 2 177 94 - 103 98.5 1 98.5 Sum 50 3315SolutionFinding the mean using Formula 4.1(b). f 1 x1 f 2 x 2 ... f k 1 x k 1 f k x k 3315 = 66.3 66 books; f 1 f 2 ... f k 1 f k 50(As a comparison, the actual mean is 65.92 = 66 books)
- 9. TOPIC 4 MEASURE OF CENTRAL TENCENCY 45The Mean of Combined Sets of DataSuppose we have two sets of data with the following characteristics: Set 1: size of data is n1; either known sum of data, H1, or given its mean 1 ; Set 2: size of data is n2; either known sum of data, H2, or given its mean 2 ;Then for Set 1, we have a relationship, H1 1 = H1 n1 1 ; n1and for Set 2, we have a relationship, H2 2 = H2 n2 2 . n2Now the combined Set 1 and Set 2 will have a total size of n1 n 2 , and thecombined mean is given by: H1 H2 n1 n2 Formula 4.2(a)Or, (n1 1) (n 2 2 ) n1 n2 Formula 4.2(b)The Formulas 4.2(a) and 4.2(b) can easily be extended for any number of datasets.
- 10. 46 TOPIC 4 MEASURE OF CENTRAL TENDENCY Example 4.4There are five Tutorial Groups of students taking first year statistics. Theirrespective number of students are 40, 41, 42, 38, and 39. They have taken finalexamination in a given semester and their respective mean score are 62, 67, 58,70, and 65. Obtain the overall mean score of all students in the aboveexamination.SolutionIn this problem we are given five groups or classes. For each class, the questionprovides the total number of students and class mean score. Therefore, with someextension, we can use Formula 4.2(b) and the overall mean is: (n1 1 ) (n 2 2 ) ( n3 3 ) (n 4 4 ) ( n5 5 ) n1 n2 n3 n 4 n5 40(62) 41(67) 42(58) 38(70) 39(65) 12858 64.29 40 41 42 38 39 200 Which of the following calculation methods is much easier: Calculation of mean of repeated numbers; calculation of mean of group data; or calculation of mean of combined data set?4.3 MEDIANMedian is another measure of central tendency which can be used to describethe distribution of data as we can say that about 50% of the data have values lessthan the value of median, and another 50% of the data have values larger than thevalue of median. Since the calculation of median does not involve allobservations, it therefore is not affected by extreme values of data.
- 11. TOPIC 4 MEASURE OF CENTRAL TENCENCY 47Definition of Median When all observations are arranged in ascending (or may be descending order), then median is defined as the observation at the middle position (for odd number of observation), or it is the average of two observations at the middle (for even number of observations).Clearly, for odd number of observations there will always be an observation at themiddle position. Whereas for even number of observations, there will be noobservation at the middle position. Instead, we will have two observations at themiddle; the average of these two middle observations will become the median. Letn be the number of observations, then the median will be at the position (n + 1)/2.Calculating Median of Ungrouped DataFor ungrouped data, the median is calculated direct from its definition with thefollowing steps: Step 1 : Arrange the given data in ascending order. Step 2 : Get the position of the median. Then Step 3 : Identify the median, or calculate the average of the two middle observations, when the number are even. Example 4.5Obtain the median of the following sets of data.(a) 2, 3, 4, 7, 4, 5, 2, 6, 5, 7, 7, 6, 5, 8, 3, 5, 4, 9, 5, 7, 3, 5, 8, 4 ,6 ,2, 9 (Data as given in Example 2.1.2)(b) 3, 4, 7, 5, 8, 9, 10, 11, 2, 12SolutionSet (a)Step 1 : 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 2, 9Step 2 : In this set there are n = 27 observations. Thus the position of median is 14th.
- 12. 48 TOPIC 4 MEASURE OF CENTRAL TENDENCYStep 3 : The median is number 5 i.e. the fourth number 5.Set (b)Step 1 : 2, 3, 4, 5, 7, 8, 9, 10, 11, 12,Step 2 : In this set there are n = 10 observations. Thus the position of median is at (10 + 1)/2 = 5.5. This position is at the middle between 5th position and 6th position. The observation at the 5th position is number 7, and observation at position 6th is number 8.Step 3 : Thus the median is the average (7 + 8)/2 = 7.5 which is at the position 5.5.Calculating Median of a Grouped DataWhen the data size is large, it is common to group the data into several classes.The methods of grouping data have been explained in Topic 2. Suppose we haveK class mid-points x1,x2,...,xk and their respective frequencies be f1,f2,...,fk. Themedian is obtained by the following steps:Step 1 : Get the position of the median: The position of median is at (n + 1)/2. Let us call this number by M0.Step 2 : Class median. Class median is the class where the median is located. It is important to identify this class as follows: (a) Accumulating the frequencies until the SUM exceed M0. (b) The last frequency that makes the condition in (a) happens will be the frequency of the median class. (c) Then make the following records: (i) lower boundary LB of the median class, (ii) class frequency fm of the median class, (iii) C, the class interval or class width of the median class, and (iv) FB the SUM of frequency before condition in (a) happens.Step 3 : Calculate the median using the following formula: The median, ~ is x
- 13. TOPIC 4 MEASURE OF CENTRAL TENCENCY 49 n 1 FB ~ 2 x LB C fm Formula 4.3For above illustration, we will be using frequency of books on weekly sales inTable 2.6 in Topic 2.Step1 : The position of median is at (n + 1)/2 = (50 + 1)/2 = 25.5 = M0.Step 2 : Getting SUM, (a) SUM = f1 + f2 + f3= 19 (< M0 = 25.5); and f1 + f2 + f3+ f4 = 19 + 18 = 37 (> M0 = 25.5). (b) The fourth frequency makes the SUM greater than M0 therefore the fourth class will be the median class. (c) The median class is 64 – 37, with the following records: fm = 18, LB = 63.5, C = 10, FB = 19,Step 3 : The calculation using Formula 4.3: The median is, ~ 25.5 19 x 63.5 10 = 67.11 67 books. 184.4 MODMod for a set of data is the observation (or the number) which has the largestfrequency. Set of data having only one mode is called unimodal data. A set of datamay have two modes, and the set is called bimodal data. In the case of more thantwo modes, the set will be called multimodal data.
- 14. 50 TOPIC 4 MEASURE OF CENTRAL TENDENCY4.4.1 Mode of Ungrouped DataFor the set of data with moderate number of observations, mode can be obtaineddirect from its definition. The data should first be arranged in ascending (ordescending) order. Then the mode will be the observation(s) which occurs mostfrequently. Example 4.6Obtain the mode of the following data set:(i) 2, 3, 4, 7, 4, 5, 2, 6, 5, 7, 7, 6, 5, 8, 3, 5, 4, 9, 5, 7, 3, 5, 8, 4 ,6 (This data is taken from Example 2.1.5)(ii) 2, 3, 4, 7(iii) 2, 3, 4, 4, 4,4,4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 10, 12Solution(i) 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 2, 9 Since number 5 occurs six times (the highest frequency) therefore the mode is 5.(ii) 2, 3, 4, 7 There is no mode for this data set.(iii) 2, 3, 4, 4, 4,4,4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 10, 12 This set is bimodal data, and the modes are 4, and 9.4.4.2 Mode of Grouped DataIn the case of a large number of data, it is common to group it into several classes.Then the class mode is the one that possesses highest frequency. This means, aclass mode is the class whereby the mode of the distribution is located. Then themode can be obtained by using the following formula:
- 15. TOPIC 4 MEASURE OF CENTRAL TENCENCY 51 B ˆ The mode, x LB C , B A Formula 4.4where LB is the lower boundary of the class mode B is the different between the frequency of the class mode and the frequency of the class immediately before class mode A is the different between the frequency of the class mode and the frequency of the class immediately after class mode C is the class width of the class mode.The following example demonstrates how to use the above Formula 4.4 Example 4.7Find the mode of frequency distribution of books on weekly sales given in Table2.6 of Topic 2.SolutionBy referring to frequency Table 2.6, we get the following figures: The class mode is 64 - 73; Its lower boundary is LB = 63.5; Class width is C = 10; and B = 18 - 12 = 6; A = 18 - 10 = 8.Then from Formula 4.4, we have the mode as
- 16. 52 TOPIC 4 MEASURE OF CENTRAL TENDENCY B ˆ x LB C B A 6 = 63.5 + 10 6 8 = 63.5 + (60/14) = 67.79 68 books.4.4.3 The Relationship between Mean, Mode and MedianSometimes for the unimodal distribution, we may have two types of relationshipswhich are location relationship and empirical relationship.(a) Location Relationships There are three different cases that can occur as follows: Symmetrical Distribution The graph of this type of distribution is as shown in Figure 4.4, Case (a). In this case, the above three measurements have the same location on the horizontal axis. Thus, we have an empirical relationship. Mean = Mode = Median; i.e. x = x = ~ ; ˆ x Left Skewed Distribution The graph of this type of distribution is as shown in Figure 4.4, Case (b). In this case, the above three measurements have different locations on the horizontal axis with the following empirical relationship. Mean < Median < Mode; i.e. x < ~ < x ; x ˆ Right Skewed Distribution The graph of this type of distribution is as shown in Figure 4.4, Case (c). In this case, the above three measurements have different locations on the horizontal axis with the following empirical relationship:
- 17. TOPIC 4 MEASURE OF CENTRAL TENCENCY 53 Mean > Median > Mode ; i.e. x > ~ > x ; x ˆ(b) Empirical Relationship. For unimodal distribution which is moderately skewed (fairly close to symmetry), we have the following empirical relationship between mean, mode, and median. (Mean – Mode) 3(Mean – Median), or ( x - x ) 3 ( x - ~ ). ˆ x Formula 4.5Where, x = min; x = mod; and ~ = median. ˆ xThis means that if the Formula 4.5 is fulfilled, then we say that the givendistribution is moderately skewed. Look at the following cases showing theposition of mean, mode, and median.Case (a): The mean, mode, and median located almost at the same position pointThis case happens when the above three quantities are approximately equalvalues. i.e
- 18. 54 TOPIC 4 MEASURE OF CENTRAL TENDENCYCase (b): The Mean is smaller than the median, and in turn the median is smallerthan the modeCase (c): The Mean is larger than the median, and in turn the median is larger thanthe mode Figure 4.3: The positions of Mean ( x ), Mode ( x ), and the Median ( ~ ) ˆ x Briefly, discuss with your friends in MyLMS regarding the advantages as well as disadvantages using Mean, Mode, and Median.
- 19. TOPIC 4 MEASURE OF CENTRAL TENCENCY 55 4.5 DECILES AND PERCENTILESThe above three types of quantilies are used to summarise frequency distribution.Each of them divides the whole distribution into a certain number of equalproportions which normally be termed in percentages. For example, deciles asfrom the root word ‘decimals’ will divide the whole distribution into 10 equalparts.Figure 4.4 below shows how the nine deciles divide the whole frequencydistribution into ten equal parts each of 10% portion. Figure 4.4: Deciles divide the whole frequency distribution into ten equal parts(a) Deciles Deciles are from the root word ‘decimal’ which means tenths. This indicates that deciles consist of 9 ordered numbers D1, D2,…, D8 , and D9 which divide the whole frequency distribution into 10 (or 9 + 1) equal parts. Again here, each part is termed in percentages. Thus we have the first 10% portion of observations having values less than or equal D1 and about 20% of observations having values less than or equal D2 and so forth. The last 10% of observations have values greater than D9. Then we called D1, D2,…, D8 , and D9 as the First, Second, Third, …, and the ninth deciles. Notice that D5 is actually equal to Q2. See Figure 4.4 above.(b) Percentiles Percentiles are from the root word ‘percent’ means hundredths. This indicates that percentiles consist of 99 ordered numbers P1, P2,…, P98 and P99 which divide the whole frequency distribution into 100 (or 99 + 1) equal parts. Again here, parts are termed in percentages of 1% each. Thus, we have the first 1% portion of observations having values less than or equal to P1, about 20% portion of observations having values less than or equal to
- 20. 56 TOPIC 4 MEASURE OF CENTRAL TENDENCY P20 and so forth; and the last 1% of observations having values greater than P99. Then we called P1, P2,…, P98 and P99 as the First, Second, Third, …, and the ninety-ninth percentiles. Notice that P10 is equal to D1, P25 is equal to Q1 and so on.To test whether you understand the concepts, let us think of the followingproblems. Why do the followings occur? (a) Q2, D5 and P50 are the same number. (b) D1 = P10; D2 = P20; D10 = P90; and Q3 = P75. For illustration purpose, we focus on the calculation of quartiles as the other two can be calculated in a similar way. Students are advised to refer to Statistik Perihalan dan Kebarangkalian written by Mohd. Kidin Shahran, DBP, 2002 (reprint).We will not discuss further on Deciles and Percentiles. You can refer to any textbooks for further detail.4.5.1 Quartiles of Ungrouped DataIn the case of moderately large data size it is not necessary to group it into severalclasses. It may follow the steps below:Step 1 : Identify any quartile and find its position/location. Let Qr be the required quartile, then its position is given by r (n 1) , 4 Formula 4.6
- 21. TOPIC 4 MEASURE OF CENTRAL TENCENCY 57 where r = 1 for first quartile, r = 2 for second quartile, and r = 3 for third quartile.Step 2 : Arrange the data in ascending order.Step 3 : Obtain the quartile. Example 4.8Obtain the quartiles of the following set of data. 12, 13, 12, 14, 14, 24, 24, 25, 16, 17, 18, 19, 10, 13, 16, 20, 20, 22SolutionStep 1 : The position of quartiles The data size, n = 18, First quartile, r = 1. r 1 Position = (n 1) (18 1) 4.75 = 4 + 0.75, 4 4 Q1 is at the position between fourth and fifth, and it is 0.75 above the fourth position. Second quartile, r = 2. r 2 Position = (n 1) (18 1) 9.50 = 9 + 0.5, 4 4 Q2 is at the position between ninth and tenth, and it is 0.5 above the ninth position. Third quartile, r = 3. r 3 Position = (n 1) (18 1) 14.25 = 14 + 0.25, 4 4 Q3 is at the position between fourteenth and fifteenth, and it is 0.25 above the fourteenth position.
- 22. 58 TOPIC 4 MEASURE OF CENTRAL TENDENCYStep 2 : Arrange the data in ascending order 10, 11, 12, 12, 13, 13, 14, 14, 16, 17, 18, 19, 20, 20, 22, 24, 24, 25Step 3 : Q1 is at the position between fourth and fifth, and it is 0.75 above fourth. Number at the fourth position = 12; Number at the fifth position = 13; Q1 = 12 + (0.75) (13 – 12) = 12 .75. Q2 is at the position between ninth and tenth, and it is 0.5 above ninth. Number at the ninth position = 16; Number at the tenth position = 17; Q2 = 16 + (0.5) (17 – 16) = 16.5. Q3 is at the position between fourteenth and fifteenth, and it is 0.25 above fourteenth. Number at the fourteenth position = 20; Number at the fifteenth position = 22; Q3 = 20 + (0.25) (22 – 20) = 20.5.4.5.2 Quartiles of Grouped DataWhen the data size is large, it is common to group the data into several classes.The methods of grouping data have been explained in Unit 1. Suppose we have Kclass mid-points x1,x2,...,xk and their respective frequencies be f1,f2,...,fk. Thequartiles are obtained by the following steps:Step 1 : From Formula 4.6, the position of the first quartile is given by: r (n 1) , with r = 1. 4 Let us call this number by Q01.Step 2 : Class of first quartiles. Class first quartiles are the class where the first quartile is located. It is important to identify this class as follows: (a) Accumulating the frequencies until the SUM exceed Q01. (b) The last frequency that makes the condition in (a) happens will be the frequency of the Q1.
- 23. TOPIC 4 MEASURE OF CENTRAL TENCENCY 59 (c) Then make the following records: (i) lower boundary LB of the first quartile class; (ii) class frequency fQ of the first quartile class; (iii) C, the class interval or class width of the first quartile class; and (iv) FB the SUM of frequency before condition in (a) happens.Step 3 : Calculate the first quartile using the following formula: The first quartile, Q1 is (n 1) FB 4 Q1 LB C fQ Formula 4.6(a)For illustration, we will be using frequency table on weekly book sales given inTable 2.6, in Topic 2.Step1 : The position of Q1 is at (n + 1)/4 = (50 + 1)/4 = 12.75 = Q01.Step 2 : Getting SUM, (a) SUM = f1 + f2 = 7 (< Q01 = 12.75); and f1 + f2 + f3 = 19 (> Q01 = 12.75). (b) The fourth frequency makes the SUM greater than Q01 therefore the third class will be the class of the first quartile. (c) The Q1 class is 54 – 63, with the following records: fQ = 12, LB = 53.5, C = 10, FB = 7,Step 3 : The calculation using Formula 4.6(a): The Q1 is,
- 24. 60 TOPIC 4 MEASURE OF CENTRAL TENDENCY 12.75 7 Q1 53.5 10 = 58.29 58 books. 12Repeat the steps for calculating Q3.The position of Q3 is at 3(n + 1)/4 = 3(50 + 1)/4 = 38.25 = Q03.Class Q3 is 74 – 83, fQ = 10, LB = 73.5, C = 10, FB = 37, 38.25 37 Q3 73.5 10 = 74.75 75 books. 10Thus, we can conclude that about 25% of the weekly sales is less than or equal to58 books; and that about 75% of the weekly sales is less than or equal to 75books.
- 25. TOPIC 4 MEASURE OF CENTRAL TENCENCY 61 ACTIVITY 4.1 1. Calculate the mean of each of the following data set: (a) Student’s Mathematics marks for five different examinations are: 85, 90, 70, 65, 75. (b) Diameter (mm) of ten beakers in science laboratory: 38.5, 40.6, 39.2, 39.5, 40.4, 39.6, 40.3, 39.1, 40.1, 39.8. (c) Monthly income (in RM) of six factory employees is: 650, 1500, 1600, 1800, 1900, 2200. Give brief comments on your answer. 2. There are five groups of students whose sizes are respectively 14, 15, 16, 18, and 20. Their respective average heights (in meter) are: 1.6, 1.45, 1.50, 1.42, and 1.65. Obtain the average heights of all students. 3. For the following frequency table, obtain the mean, mode, median, first, and third quartiles. Weights 90- 95- 100- 105- 110- 115- 120- 125- (Kg) 94 99 104 109 114 119 124 129 Number of 2 5 12 17 14 6 3 1 ParcelsIn this topic, we have learnt about mean, mode, median as well as the quartiles,deciles, and percentiles. The mean which is affected by extreme end values playsthe role as a centre of distribution. Thus, given the value of mean, we can describethat almost all observations are located surrounding the mean. The mode usuallydescribes the most frequent observations in the data.We can interpret further that for any two different distributions, their respectivemeans will indicate that they are at two different locations. As such, the mean issometime being called location parameter. The median will be used if we want tosummarise the distribution in two equal parts of 50% each. If we want to breakfurther to summarise in the proportions of 25% each then we should use quartiles.We can also use percentiles to describe the distribution using proportion (inpercentages). However, to describe completely about any distribution, we need todescribe the shape and the data coverage (the range). The variance or standarddeviation which can describe the shape will be discussed in the next topic.

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment