Lecture 4 Measures of Central Tendency

  1. 1. Lecture 4: Measures of Central Tendency
  2. 2. Looking at a Data Set What is it’s most typical number? ◦ Measure of central tendency ◦ There are several different kinds How spread out, or varied, is the data? ◦ Measure of variation (Next lecture) What positions do various members of the data set have relative to the most typical member? ◦ Measures of Position (Lecture after that) Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  3. 3. Measure of Central Tendency The kind of measure of central tendency which is appropriate for a certain data set depends on whether the variable is qualitative or quantitative ◦ If it’s quantitative, we also must consider what level of measurement the data set has ◦ If a data set is at the nominal level, the only fitting measure is the mode ◦ Mode – The value that occurs most frequently in a given set of data ◦ You can have more than one mode as long as those values have the same frequency, and that frequency is the highest ◦ A set with two modes is called bimodal Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  4. 4. Measure of Central Tendency ◦ If it’s at a higher level (ordinal, interval, or ratio), other measures of central tendency can be used ◦ Median – The value of the variable that has half the data set less than or equal to it and half of the data set greater than or equal two it ◦ The median divides the data set into two parts of equal frequency ◦ Take a look at the last five women’s ages as an example (from the CDB): ◦ This data is raw; it has not been sorted ◦ We can sort quantitative data one of two ways ◦ Ascending – lowest to highest ◦ Descending – highest to lowest ◦ Data that has been sorted is called an array x 37 17 29 19 20 x 17 19 20 29 37 Raw Data Array (Ascending) Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  5. 5. Finding the Median ◦ To quickly find the location of the median, use the expression 𝑛+1 2 ◦ If we’re just looking at this data, what is n? ◦ 𝑛+1 2 = 5+1 2 = 3 ◦ This means that the median is in the 3rd position of the array ◦ What if n is even? ◦ Clearly, there is no exact middle number ◦ If there was, where would it be? ◦ In this case, we define the median as being the number halfway between the two numbers in the middle ◦ 19+20 2 = 19.5 ◦ Note that there are exactly three numbers less than 19.5 and three numbers greater than 19.5 ◦ Is there a mode? Array (Ascending) x 17 19 20 29 37 x 17 18 19 20 29 37 Example if n is even Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  6. 6. What Most People Mean by Average …do you see what I did there? Mean is the most commonly used measure of central tendency ◦ Appropriate to use when the variable is at the interval or ratio level of measurement ◦ To find the mean, add up all values in your data, and divide by n, the number of values in your data set ◦ 𝑥 = 17 + 19 + 20 + 39 + 37 = 122 ◦ 𝑥 𝑛 = 122 5 = 24.4 ◦ Thus, the mean of our sample is 24.4 ◦ Please note that this is the mean of our sample, not the population x 17 19 20 29 37 Array (Ascending) Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  7. 7. Measures of the Center of the Data We will talk about two different kinds of means: Sample Mean  Indicated by 𝑥 (pronounced x-bar)  In our previous example  𝑥 = 122 5 = 24.4  In general  𝑥 = 𝑥 𝑛  If this isn’t a whole number, we will usually round to one decimal place beyond the data Suppose you went bowling and you bowled a 121, a 140, and a 168  𝑥 = 429, 𝑥 = 𝑥 𝑛 = 429 3 = 143  If you bowled three games at 143 each, the total would still be 429  The mean is kind of a balancing point  If you take each member of the data set and subtract the mean from it and then add up all the differences, you always get zero. 121 – 143 = -22 140 – 143 = -3 168 – 143 = 25 (-22) + (-3) + (25) = 0 This always works Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  8. 8. Measures of the Center of the Data What if the data set consists of the entire population we’re studying? Population Mean  Indicated by µ  Pronounced “Mew”  Spelled mu in English  It’s the Greek “m”  It is often the case that parameters are symbolized by Greek letters, while statistics have English letters  The formula for is actually µ= 𝑥 𝑁  Note the CAPITAL N  Sample size is n  Population size is N Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  9. 9. Midrange The final measure of central tendency ◦ Appropriate when the variable is at the interval or ratio level of measurement ◦ Minimum (Min) – the smallest value of a data set ◦ Maximum (Max) – the largest value of a data set ◦ Midrange – the number halfway between the smallest and the largest value of the variable ◦ 𝑀𝑖𝑛+𝑀𝑎𝑥 2 ◦ 17+37 2 = 54 2 = 27 x 17 19 20 29 37 Array (Ascending) Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  10. 10. Measures of Central Tendency Thus, we have found four measures of central tendency for our data set, each one different: Which is best to use? Depends on the situation… x 17 19 20 29 37 Array (Ascending) Mode None Median 20 Mean 24.4 Midrange 27 Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  11. 11. Resistant Statistics Suppose you are apply for a job where the boss make $120K per year, and each of the four employees makes $20K per year. ◦ Total payroll: $200K If the boss tells you that the average salary is $40K, which average is he using? ◦ Mean – let’s further suppose you are happy to hear the salary is $40K When you take the job, you find out your only making $20K ◦ What would have been a better average to use here? ◦ Median: $20K ◦ The median is a much more resistant statistic and should be used whenever there are a few very large or unusually small values of the variable in the data set ◦ This is why median is used in home values, because a few mansions will not affect the median, but will make the mean unbelievably large Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  12. 12. Resistant Statistics In this example, the mode would give same value, but it is really only good for nominal level – it doesn’t usually do much for us mathematically The midrange suffers even more than the mean by being affected by a high or low value. Why is mean used so much? ◦ It is usually the most useful for us mathematically ◦ Thus, even when it is not the most appropriate measure of central tendency to use, people tend to use it Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  13. 13. Skewness and the Mean, Median, Mode, and Midrange  This histogram is symmetric  You can draw a vertical line at some point in the graph and the left and right are mirror images of each other  Notice that in this data set, the median, mean, and mode are the same.  In a perfectly symmetrical distribution, the mean and median are the same.  It is possible to have a bimodal symmetric distribution  The two modes would be different from the mean and the median. Mode Median Mean Midrange Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  14. 14.  This histogram is not symmetric  The right hand side seems to be ‘chopped off’ compared to the left hand side  We call this skewed to the left because it is pulled out to the left  The mean is 6.4, the median 6.5, the mode is 7, and the midrange is 6  Notice that the mean is less than the median, and they are both less than the mode  The mean and the median both reflect the skewing, but the mean reflects it more so Mode Median Mean Midrange Skewness and the Mean, Median, Mode, and Midrange Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  15. 15.  This histogram is not symmetric  The left hand side seems to be ‘chopped off’ compared to the left hand side  We call this skewed to the right because it is pulled out to the right  The mean is 7.7, the median 7.5, the mode is 7, and the midrange is 8  Notice that the mean is the largest, and the mode is the smallest  The mean and the median both reflect the skewing, but the mean reflects it more so Midrange Mean Median Mode Skewness and the Mean, Median, Mode, and Midrange Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  16. 16. Weighted Means The mean height of men in the Class Data Base: 𝑥 𝑀 =70.214 The mean height of women in the CDB: 𝑥 𝐹 =64.100 Is it possible to determine from the figures the mean of the entire sample? (Call it 𝑥 𝐻 ) How NOT to do it: 𝑥 𝐻 = 𝑥 𝑀+𝑥 𝐹 2 = 70.214+64.100 2 = 67.157 inches (to the nearest thousandth of an inch) If we take the mean of the entire sample though, we’ll find 𝑥 𝐻 = 66.618 inches. Why do we get different values? Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  17. 17. Weighted Means There are different numbers of men and women making up the separate means, and we have to weight them accordingly. ◦ Men: 42 (𝑛 𝑀 = 42) ◦ Women: 60 (𝑛 𝐹 =60) ◦ Since there are more women than men should the sample mean be closer to 𝑥 𝑀 =70.214 or 𝑥 𝐹 = 64.100? ◦ The halfway point between is what we found with 𝑥 𝑀+𝑥 𝐹 2 = 67.157 ◦ 66.618 < 67.157, so we were closer to Women’s mean height Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means
  18. 18. Weighted Means ◦ Recall: the mean of a data set is the value of the variable that, if every member of the set were that value the sum of the values would be the same ◦ In other words, if 42 men were each 70.214 inches tall, their total height would be the same as the actual total height of the men ◦ Imagine all the men standing on top of each other’s heads in a tall column ◦ Their total height would be 𝑛 𝑀 ∙ 𝑥 𝑀 ◦ If we do the same for women ◦ Their total height would be 𝑛 𝐹 ∙ 𝑥 𝐹 ◦ Now, put the two columns on top of each other, and you have the total height of the sample: ◦ 𝑛 𝑀 ∙ 𝑥 𝑀 +𝑛 𝐹 ∙ 𝑥 𝐹 ◦ And, what is the size of the sample? ◦ 𝑛 𝑀+𝑛 𝐹 ◦ Putting it all together, the mean of the whole sample is ◦ 𝑥 𝐻 = 𝑛 𝑀∙ 𝑥 𝑀 +𝑛 𝐹∙ 𝑥 𝐹 𝑛 𝑀+𝑛 𝐹 = 42×70.214+60×64.100 42+60 ≈ 66.618 Measuring Data Median, Mean, Midrange Resistant Statistics Skew Weighted Means

