- 1. Measures of central Tendency Ms. K. Lavanya MSc(N)-CHN Associate Professor
- 2. Definition: • Constructing frequency distribution of raw data is the first step towards condensation of large data into compact form. • It is necessary to condense the data into a single value. Such a single value is called an average.
- 3. Definition: • In most of the data the average is a centre of concentration of the values in the data therefore, the average is called a measure of central tendency. • The central tendency is stated as the statistical measure that represents the single value of the entire distribution or a dataset. It aims to provide an accurate description of the entire data in the distribution.
- 4. Properties of a Central Tendency: • It should be rigidly defined • Its computation should be based on all observations • It should lend itself for algebraic treatment • It should be least affected by extreme observations.
- 5. The following are the different measures of central tendency: 1. Average ( Arithmetic mean) 2. Median 3. Mode 4. Quartiles 5. Geometric Mean 6. Harmonic Mean 7. Weighted Mean
- 6. Average/ Arithmetic Mean(AM): • This is commonly used. Arithmetic mean (AM) or mean is a sum of all observations divided by number of observations. Computation for Ungrouped data: The mean of n observations X1, X2……..Xn is given by A.M = X1 + X2 + ……. +Xn n = Sum of observations/ Number of observations
- 7. Notation form : • 𝐴𝑀 = 𝑥 𝑛 = 𝑠𝑢𝑚𝑥 𝑛 • Arithmetic Mean is denoted by 𝑥. The notation ∑ is read as sigma and 𝑥 as X bar.
- 8. Merits Of AM: • Merits: 1. It is easy to calculate and understand 2. It is based on all observations 3. It is familiar to common man and rigidly defined 4. It is capable of further mathematical treatment 5. It is least affected by sampling fluctuations. Hence it is more stable.
- 9. Demerits Of AM • It is used only for quantitative data. • It is unduly affected by extreme observations • It cannot be calculated when the frequency distribution is with open end classes. • It cannot be determined graphically • Sometimes AM may not be an observation in a data
- 10. Example: Q. Obtain the arithmetic mean of marks scored by a student in 8 unit tests of II MBBS Class. 58 62 67 65 68 70 69 61
- 11. • Solution: AM= 61+58+62+67+65+68+70+69 8 = 520 8 = 65
- 12. Short cut or Assumed mean method: • When observations in data set are large in size, it is a laborious work to find mean. To avoid this difficulty, short cut method is adopted. • Assume arbitrary mean i.e., an value from data set (which will simplify the calculations) and subtract this assumed mean from each observation. • We get what is known as differences or deviations. • Obtain mean for deviations by usual method.
- 13. Contd…. • Observations • Original data: X1, X2, ………Xn • Differences or X1-a, X2-a, …….. Xn-a • Deviations: d1, d2,…..dn • Where a is any value from dataset. • Mean for deviations(d) = sum d/n. Thus, Mean of original data(X)=a+d
- 14. Example: • In a series of 10 postmorterms following observations regarding weight (in gms) of liver were found. • 1420 1405 1425 1410 1415 1435 1430 1415 1445 1430
- 15. Solution: • Let us consider a=1420 • d= x-1420: 0 -15 5 -10 -5 15 10 -5 25 10 • 𝑑 = 𝑠𝑢𝑚𝑑 𝑛 =30/10 =3 • Thus x= a+d =1420+3=1423
- 16. Computation of grouped data • In Statistics, data plays a vital role in estimating the different types of parameters. To draw any conclusions from the given data, first, we need to arrange the data in such a way that one can perform suitable statistical experiments. We know that data can be grouped into two ways, namely, Discrete and Continuous frequency distribution.
- 17. Discrete frequency distribution: • Suppose we have X1, X2, …….. Xn observations with corresponding frequencies f1, f2,…..,fn. The AM is defined as • 𝑥 = 𝑓1𝑥1+𝑓2𝑥2+⋯+𝑓𝑛𝑥𝑛 𝑓1+𝑓2 +…+𝑓𝑛 • In notation form, we have • MeanX= ∑(f.x)/ ∑f = ∑(f.x)/N = Sum (Frequency×observation) • Total Frequency
- 18. Calculate the average number of children per family from the following data: NO: of children No: of families 0 30 1 52 2 60 3 65 4 18 5 10 6 05
- 19. Solution: NO: of Children (X) NO: of families (f) Total NO: of Children (f.x) 0 30 0×30=0 1 52 1×52=52 2 60 2×60=120 3 65 3×65=195 4 18 4×18=72 5 10 5×10=50 6 5 6×5=30 Total 240 519
- 20. Contd…. • X= • MeanX= ∑(f.x)/ ∑f = 519/240=2.165
- 21. Continuous frequency distribution: • In continuous frequency distribution, the frequency is not associated with any specified single value but spread over entire class. • It creates difficulty for finding mid values X1, X2,….,Xn. To overcome this difficulty, we make a reasonable assumption that the frequency is associated with mid-value of class, or the frequency is distributed uniformly over the entire class. • Mean (X) = Sum(f.x)/ Sum(f)
- 22. The following are different steps to calculate average for continuous frequency distribution • Step 1- Write all class intervals serially in the first column and corresponding frequency in the second column. • Step 2- The mid values of each class interval are obtained by adding lower and upper class interval and dividing resultant quantity by 2 and put these values in third column. • Step 3- Multiply each ‘f’ by corresponding X and write this product in fourth column. The addition of this column gives sum(fx). i.e ∑f.x.
- 23. Notation form: • X= Sum of fourth column Sum of second column = Sum (f.x) Sum(f)
- 24. Example: • Find the average age (in years) at the time of death in city A. Age Interval NO: of Deaths 0-10 16 10-20 09 20-30 20 30-40 11 40-50 07 50-60 12 60-70 09 70-80 04 80-90 02
- 25. Solution: Age Interval No: of Deaths (f) Mid Values (x) Frequency Observation (f.x) 0-10 16 05 16×5=80 11-20 09 15 9×15=135 21-30 20 25 20×25=500 31-40 11 35 11×35=385 41-50 07 45 7×45=315 51-60 12 55 12×55=660 61-70 09 65 9×65=585 71-80 04 75 4×75=300 81-90 02 85 2×85=170 TOTAL 90 3130
- 26. • X= sum(fx) sum(f) = 3130/90 =34.78 Average age at death is 34.78 years
- 27. 2. MEDIAN • The mean is unduly affected by extreme observations and cannot be calculated for distribution with open end class and qualitative variables like honesty, sex, religion etc. • To overcome these drawbacks, we use other measures of central tendency like median.
- 28. Definition: • When all the observations of a variable are arranged in either ascending or descending order, the middle observation is known as median. It divides the whole data into two equal portions. • In other words, 50% of the observations will be smaller than the median while 50% of the observations will be larger than it.
- 29. Merits: • It is easy to understand and easy to calculate • It can be computed for a distribution with open end classes. • It is not affected due to extreme observations • It is applicable for quantitative as well as qualitative data. • It can be determined graphically.
- 30. Demerits: • It is not based on all the observations, hence it is not proper representative. • It is not as rigidly defined as the arithmetic mean. • It is not capable of further mathematical treatment.
- 31. Computation of Median: Ungrouped Data: • As discussed above, the median is one of the measures of central tendency, which gives the middle value of the given data set. • While finding the median of the ungrouped data, first arrange the given data in ascending order, and then find the median value.
- 32. • If the total number of observations (n) is odd, then the median is (n+1)/2 th observation. • If the total number of observations (n) is even, then the median will be average of n/2th and the (n/2)+1 th observation.
- 33. Example: For example, 6, 4, 7, 3 and 2 is the given data set. • To find the median of the given dataset, arrange it in ascending order. • Therefore, the dataset is 2, 3, 4, 6 and 7. • In this case, the number of observations is odd. (i.e) n= 5 • Hence, median = (n+1)/2 th observation. • Median = (5+1)/2 = 6/2 = 3rd observation. • Therefore, the median of the given dataset is 4
- 34. Calculation for grouped data • In a grouped data, it is not possible to find the median for the given observation by looking at the cumulative frequencies. The middle value of the given data will be in some class interval. So, it is necessary to find the value inside the class interval that divides the whole distribution into two halves. • we have to find the median class. • To find the median class, we have to find the cumulative frequencies of all the classes and n/2. After that, locate the class whose cumulative frequency is greater than (nearest to) n/2. The class is called the median class.
- 35. Formula:
- 36. Example:
- 37. Solution: • To find the median height, first, we need to find the class intervals and their corresponding frequencies. • The given distribution is in the form of being less than type,145, 150 …and 165 gives the upper limit. Thus, the class should be below 140, 140-145, 145-150, 150-155, 155-160 and 160-165. • From the given distribution, it is observed that, • 4 girls are below 140. Therefore, the frequency of class intervals below 140 is 4. • 11 girls are there with heights less than 145, and 4 girls with height less than 140 • Hence, the frequency distribution for the class interval 140-145 = 11-4 = 7 • Likewise, the frequency of 145 -150= 29 – 11 = 18 • Frequency of 150-155 = 40-29 = 11 • Frequency of 155 – 160 = 46-40 = 6 • Frequency of 160-165 = 51-46 = 5
- 38. Therefore, the frequency distribution table along with the cumulative frequencies are given below: Class Intervals Frequency Cumulative Frequency Below 140 4 4 140 – 145 7 11 145 – 150 18 29 150 – 155 11 40 155 – 160 6 46 160 – 165 5 51
- 39. Contd…. • Here, n= 51. • Therefore, n/2 = 51/2 = 25.5 • Thus, the observations lie between the class interval 145-150, which is called the median class. • Therefore, • Lower class limit = 145 • Class size, h = 5 • Frequency of the median class, f = 18 • Cumulative frequency of the class preceding the median class, cf = 11.
- 40. • Now, substituting the values in the formula, we get • Median=145+(25.5−11/18)×5 • Median = 145 + (72.5/18) • Median = 145 + 4.03 • Median = 149.03. • Therefore, the median height for the given data is 149. 03 cm.
- 41. Practice Problem
- 42. MODE: • In statistics, the mode is the value that is repeatedly occurring in a given set. We can also say that the value or number in a data set, which has a high frequency or appears more frequently, is called mode or modal value. It is one of the three measures of central tendency, apart from mean and median. For example, the mode of the set {3, 7, 8, 8, 9}, is 8. Therefore, for a finite number of observations, we can easily find the mode. A set of values may have one mode or more than one mode or no mode at all.
- 43. Definition: • A mode is defined as the value that has a higher frequency in a given set of values. It is the value that appears the most number of times. • Example: In the given set of data: 2, 4, 5, 5, 6, 7, the mode of the data set is 5 since it has appeared in the set twice.
- 44. Bimodal, Trimodal & Multimodal (More than one mode) • When there are two modes in a data set, then the set is called bimodal • For example, The mode of Set A = {2,2,2,3,4,4,5,5,5} is 2 and 5, because both 2 and 5 is repeated three times in the given set. • When there are three modes in a data set, then the set is called trimodal • For example, the mode of set A = {2,2,2,3,4,4,5,5,5,7,8,8,8} is 2, 5 and 8 • When there are four or more modes in a data set, then the set is called multimodal
- 45. Solution: • The value occurring most frequently in a set of observations is its mode. In other words, the mode of data is the observation having the highest frequency in a set of data. There is a possibility that more than one observation has the same frequency, i.e. a data set could have more than one mode. In such a case, the set of data is said to be multimodal. • Let us look into an example to get a better insight. • Example: The following table represents the number of wickets taken by a bowler in 10 matches. Find the mode of the given set of data. • • It can be seen that 2 wickets were taken by the bowler frequently in different matches. Hence, the mode of the given data is 2.
- 46. Mode Formula For Grouped Data: • In the case of grouped frequency distribution, calculation of mode just by looking into the frequency is not possible. To determine the mode of data in such cases we calculate the modal class. Mode lies inside the modal class. The mode of data is given by the formula:
- 47. • Where, • l = lower limit of the modal class • h = size of the class interval • f1 = frequency of the modal class • f0 = frequency of the class preceding the modal class • f2 = frequency of the class succeeding the modal class
- 48. Solution: • Let us learn here how to find the mode of a given data with the help of examples. Example 1: Find the mode of the given data set: 3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48. Solution: In the following list of numbers, 3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48 15 is the mode since it is appearing more number of times in the set compared to other numbers. Example 2: Find the mode of 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 data set. Solution: Given: 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 is the data set. As we know, a data set or set of values can have more than one mode if more than one value occurs with equal frequency and number of time compared to the other values in the set. Hence, here both the number 4 and 15 are modes of the set.
- 49. Example : • In a class of 30 students marks obtained by students in mathematics out of 50 is tabulated as below. Calculate the mode of data given.
- 50. Solution: • The maximum class frequency is 12 and the class interval corresponding to this frequency is 20 – 30. Thus, the modal class is 20 – 30. • Lower limit of the modal class (l) = 20 • Size of the class interval (h) = 10 • Frequency of the modal class (f1) = 12 • Frequency of the class preceding the modal class (f0) = 5 • Frequency of the class succeeding the modal class (f2)= 8 • Substituting these values in the formula we get;
- 52. Standard Deviation • The spread of statistical data is measured by the standard deviation. Distribution measures the deviation of data from its mean or average position. The degree of dispersion is computed by the method of estimating the deviation of data points. It is denoted by the symbol, ‘σ’. • The standard deviation is then defined as the positive square root of the arithmetic mean of the squares of the deviations taken from the arithmetic mean.
- 53. Merits of Standard Deviation • It is rigidly defined • It is based on all observations • It does not ignore the algebraic signs of deviations • It is capable of further mathematical treatment • It is not much affected by sampling fluctuations.
- 54. Demerits of Standard Deviation • It is difficult to understand and calculate • It cannot be calculated for qualitative data and distribution with open end classes. • It is unduly affected due to extreme deviations.
- 57. Variance • The square of the standard deviation of a set of observations is called the variance.