- 1. Empirics of Standard Deviation In research, there are the different methods of measuring data to be analyzed. The reason for these is to measure the level of dispersion (Eboh, 2009). Dispersion is the tendency of values of a variable to scatter away from the mean or midpoint. The data are measured majorly with basic statistical tools such as mean, median and mode. To arrive at accurate measurement, the use of standard deviation is employed. Standard deviation is a measurement that is designed to find the disparity between the calculated mean.it is one of the tools for measuring dispersion. To have a good understanding of these, it is of general interest to give a better light to the following terms (mean, median, mode) and variance) also their uses. MEAN Panneerslvam (2008) defined mean as the ratio between the sum of the observations and the number of the observation.in his study, he termed it as arithmetic mean. .Eboh (2009) said it is sum of observations divided by the number of observations. Mathematically, the mean is the arithmetic average of a number of scores. To obtain the mean, add your scores and divide by the number of scores that you have. Simply put that the mean is the addition of all the collated data that are to be analyzed, which is then divided by the number of the data to the analyzed.it is generally stated as /x = ∑ 𝑥𝑖𝑛 𝑖=1 /n Where /x is the arithmetic mean; xi, the ith observation; and n, the total number of observations Example 1.1 determine the arithmetic mean of salaries of the employees s shown in the table 1.1 below Employees no. 1 2 3 4 5 6 7 8 9 Monthly salary N ,000 20 27 34 56 34 45 20 29 41 Solution-----------The number of observations, n =9 Using the above formula, /x = ∑ 𝑥𝑖𝑛 𝑖=1 /n 20000+27000+34000+56000+34000+45000+20000+29000+41000 = N34000 9 It should be noted that before summing them up, they must be in the same units and also in the same scale. This means that there can’t different values that ought to be summated, such as having naira and dollars values that are to be summated, it will be impossible to do so. The summation of these two different scales of measurement won’t be possible. Consider the following data, which represents the time needed to complete a reading task, as an example.
- 2. Example 1.2 Times in miuntes 6 3 5 5 2 7 6 4 3 Total = 43 The mean is the sum of scores divided by the number of scores, mathematically: Mean = ΣX/N = 43/10 = 4.3 PROPERTIES of THE MEAN The mean has certain properties that are attributed to it (Eboh, 2009). They include 1. It has algebraic property that the sum of the deviations of each observation from the mean will always be zero.it means that when the mean observation is subtracted from the mean and summed together (which will comprise of both the positive and negative values), it must result to zero. This is expressed mathematically as thus: ∑ (𝑥𝑖 − 𝑥 )𝑁 𝑖=1 = 0 2. The sum of the squared deviations of each observation from the mean is less than the sum of the squared deviations about any other number ∑ (𝑥𝑖 − 𝑥 )𝑁 𝑖=1 2= minimum This means that the when the various values that were computed together to form the mean are being subtracted, originally when summed up, they will give a zero value. But when squared together after the subtraction from the mean, the result arrived it the minimum value. 3. When mean is commuted from a grouped data which is a special case, midpoints of each assumed that each of the interval classes is being assumed. This is illustrated mathematically as this 𝑋− = ∑ 𝒌 𝒊=𝟏 fi mi = N Where fi = number of cases in the ith category, with f, =N M1 = midpoint of the ith category K = number of categories This is further expressed as finding the idle point of a grouped data that is expected to be analyzed.an example of a grouped data is 1950-2950. Such a grouped data has its midpoint has 2,450.the 2,450 is what will be used for mean analysis. MEDIAN According to R.panneerslvam (2008), the median is the score found at the exact middle of the set of values. It refers to the midpoint in a series of numbers. To find the median the values are arranged in order from smallest to largest. If there is an odd number of values, the middle value is the median. If there is an even number of values, the average of the two middle values is the median.
- 3. Example 1.3: Find the median of 19, 29, 36, 15, and 20 In order: 15, 19, 20, 29, 36 since there are 5 values (odd number), 20 is the median (middle number) Example 1.4: Find the median of 67, 28, 92, 37, 81, 75 In order: 28, 37, 67, 75, 81, 92 since there are 6 values (even number), we must average those two middle numbers to get the median value. Average: 67 + 75 = 142 = 71 is the median value 2 2 MODE The mode of a set of values is the value that occurs most often. A set of values may have more than one mode or no mode. Example 1.5: Find the mode of 15, 21, 26, 25, 21, 23, 28, 21 The mode is 21 since it occurs three times and the other values occur only once. Example 1.6: Find the mode of 12, 15, 18, 26, 15, 9, 12, 27 The modes are 12 and 15 since both occur twice. Example 1.7: Find the mode of 4, 8, 15, 21, 23 There is no mode since all the values occur the same number of times. Since there are 3 different measures of centers, it seems reasonable to ask which is best to use. There are advantages and disadvantages to each of them, depending on the nature of the data set. These are listed below. Measure Advantages Disadvantages Mean Easy to Compute Sample Means tend to Vary Less Good properties as sample size increases (more to come on that later) Sensitive to extreme values (outliers) Median Resistant to outlying values Good for skewed data (see below) Harder to calculate Less useful than the mean for inference (more to come on that later) Mode Easy to compute Good for qualitative (categorical) data Not very useful for quantitative data Skewness Using the mean, median, and mode together can help to describe the skewness of a data set. A data set is considered skewed if the values extend more to one side of the distribution than the other. (Schuetter, 2007)
- 4. VARIANCE The variance ( S2 ) is the average squared deviation from the mean. It is also known as the square of the standard deviation. Both measures are interchangeable. These means that the standard deviation is the square root of the variance. The defining formula is S2 = ∑(x−m)2 N−1 Where: x is each individual score making up the distribution M is the mean of the distribution N is the number of scores. This is illustrated below Example 1.8 calculation of variance Calculation of a variance x x2 x-m, (x-m)2 3 9 -2 4 5 25 0 0 2 4 -3 9 7 49 2 4 9 81 4 16 4 16 -1 1 ∑ 30 184 34 M = 30/6 = 5.0 S2 = 34/5 = 6.8 (Keronanton, 2004) Standard Deviation Though the variance is frequently used as a measure of spread in certain statistical calculations, it does have the disadvantage of being expressed in units different from those of those of the summarized data. Which means the expressed units is going to be far smaller than the data. However the variance can be easily converted into a measure of spread expressed in the same unit of measurement as the original scores: the standard deviation(s). It should be noted that Standard deviation indicate the fluctuation of the variables around their mean. To convert from variance to the standard deviation simply find the square root of the variance. It is the most popular measure of spread. The formula for the standard deviation is given below. 1)-(N N )x( -x =S 2 2
- 5. Example 1.8 calculation of standard deviation Find the standard deviation of example 1.7 √6.8 = 2.61 Example 1.9 Find the standard deviation of the following distributed values Times score 6 3 5 5 2 7 6 4 3 2 Mean score 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 To get the standard deviation, subtract the mean from each of the scores, square the deviation, and then add up the squared deviations. This process is outlined below. Time Scores Mean Score - mean (Score - mean) 6 4.3 1.7 2.89 3 4.3 -1.3 1.69 5 4.3 0.7 0.49 5 4.3 0.7 0.49 2 4.3 -2.3 5.29 7 4.3 2.7 7.29 6 4.3 1.7 2.89 4 4.3 -0.3 0.09 3 4.3 -1.3 1.69 2 4.3 -2.3 5.29 Total = 28.10 Therefore, the standard deviation becomes: 1.77=3.12= 9 28.10 =S Grouped Data: Often, data will be reported in terms of grouped observation and the calculation of the standard is obtained by a slightly different formula, which is easier to apply in this situation. An example is presented below: Example 1.10 Ages f 51 - 60 3 41 - 50 10 31 - 40 15
- 6. 21 - 30 11 11 - 20 5 To calculate the mean and standard deviation of a grouped data, you must determine the midpoint for each of the groups of observations. (Panneerselvam, 2008) Adding the upper and lower scores for each interval and dividing by two can obtain the midpoints. For example, for the first group of data the midpoint would be (51 + 60)/2 = 111/2 = 55.5. I have redrawn the data below with the midpoints inserted. Also, I have included in the redrawn data a column headed by the term fxMidpoint, which is simply the midpoint multiplied by the frequency Ages f Midpoints fxMidpoint 51 - 60 3 55.5 166.5 41 - 50 10 45.5 455.0 31 - 40 15 35.5 532.5 21 - 30 11 25.5 280.5 11 - 20 5 15.5 77.5 Total = 1512.0 The mean becomes the sum of the scores in the fxMidpoint column divided by the sample size. The sample size can be determined by adding the f column (N = 44). Therefore, the mean = 1512.0/44 = 34.36 (rounded to 34.4).To determine the standard deviation there is a need to add one additional column to the table of calculations above. This additional column is fxmidpoint2 and I have redrawn our table below with the added column. Ages f Midpoints fxMidpoint fxMidpoint2 51 - 60 3 55.5 166.5 9240.75 41 - 50 10 45.5 455.0 20702.50 31 - 40 15 35.5 532.5 18903.75 21 - 30 11 25.5 280.5 7152.75 11.04=121.93= 43 51957.82-57201 = 43 44 01512. -57201.0 =S 2
- 7. Chapter Exercises 1. A magazine is interested in expanding its readership to "yuppies," defined as people between the ages of 30 to 40. Following are the ages of a random selection of the magazine's readership, is there any reason to be concerned? Draw the theoretical distribution and contrast the actual score distribution with the theoretical distribution. Is the sample adequate? 23, 31, 29, 21, 25, 27, 25, 21, 29, 30, 35, 41, 23, 35, 19, 20, 26, 24, 26, 25, 28, 27, 51, 15, 28, 21, 23, 25 2. A more extensive examination of the readership was undertaken after the initiation of a one-year advertising program, designed to increase the readership age range. Following are the data collected from this more extensive study. Draw the theoretical distribution of the ages of the readers. What do you conclude? Is the sample adequate? Ages Frequency 20 - 24 26 25 - 29 45 30 - 34 87 35 - 39 30 40 - 44 8 45 - 49 4 References Eboh, E. (2009). Social and economic Research Principles and Method. Enugu: African institute for applied method. Keronanton, A. (2004). Statistics: Median, Mode and Frequency Distribution. In A. Keronanton. Dublin: Dublin Institution of Technology. Obadan, M. I. (2012). research porcess,report writing and referencing. Ugbowo,Benin City,Nigeria: Goldmark Press Limited. Panneerselvam, R. (2008). Research Method. New delhi: Prentice hall of india limited. Schuetter, J. (2007). Chapter 1. In J. Schuetter, measures of dispersation (pp. 45-54). Walonick, D. S. (1993). The Reseach Process. minneapolis.