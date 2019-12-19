Successfully reported this slideshow.
QUANTILES SKEWED DISTRIBUTIONS 12/19/2019 1Presentation by SADIA NOOR
QUANTILES: • When the numbers of observations is quite large the principle according to which a distribution or an order d...
Formulas for quantiles (ungrouped data): 12/19/2019 3Presentation by SADIA NOOR
Formulas for quantiles (grouped data): 12/19/2019 4Presentation by SADIA NOOR
Activity Daily wages Frequency 10-11 3 11-12 1 12-13 4 13-14 7 14-15 20 15-16 8 16-17 2 17-18 2 18-19 1 12/19/2019 5Presen...
Normally distributed • We often test whether our data is normally distributed because this is a common assumption underlyi...
• When you have a normally distributed sample you can legitimately use both the mean and the median as your measure of cen...
• The more skewed the distribution, the greater the difference between the median and mean, and the greater emphasis shoul...
• If dealing with a normal distribution and tests of normality show that the data is non- normal, it is customary to use t...
Skewness • A distribution in which the values equidistant from the mean have equal frequencies is defined to be symmetrica...
• In a positively skewed distribution , the mean is greater than the median and the median is greater than the mode. • Mea...
Measures • According to measure the degree of skewness of a distribution or curve, Karl Pearson (1857- 1936) introduced a ...
• This coefficient usually varies between -3(negative skewness) and +3(positive Skewness) And the signs indicates the dire...
Example • Here are grouped data for heights of 100 randomly selected male students, adapted from Spiegel and Stephens (199...
• But how highly skewed are they, compared to other data sets? To answer this question, you have to compute the skewness. ...
• Now, with the mean in hand, you can compute the skewness. (Of course in real life you’d probably use Excel or a statisti...
• Finally, the skewness is • g1 = m3 / m2 3/2 = −2.6933 / 8.52753/2 = −0.1082 • But wait, there’s more! That would be the ...
Interpreting • If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly zero is quite unlikely for r...
Kurtosis • Karl Pearson (1857-1936) introduced term kurtosis (literally the amount of hump) for the degree of Peakedness o...
• A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution with kurtosis ≈3 (excess ≈0) ...
kurtosis is unfortunately harder to picture than skewness, but these illustrations. All three of these distributions have ...
• The moment coefficient of kurtosis of a data set is computed almost the same way as the coefficient of skewness: just ch...
• Example: Let’s continue with the example of the college men’s heights, and compute the kurtosis of the data set. n = 100...
• Finally, the kurtosis is • a4 = m4 / m2² = 199.3760/8.5275² = 2.7418 • and the excess kurtosis is • g2 = 2.7418−3 = −0.2...
Outlier • Outliers are extreme values that deviate from other observations on data , they may indicate a variability in a ...
Most common causes of outliers on a data set: • Data entry errors (human errors) • Measurement errors (instrument errors) ...
Describing a frequency distribution: • To describe the major characteristics of a frequency distribution, we need the calc...
Moments are a set of statistical parameters to measure a distribution. Four moments are commonly used: • 1st, Mean: the av...
• 3rd, Skewness: measure the asymmetry of a distribution about its peak; it is a number that describes the shape of the di...
  1. 1. QUANTILES SKEWED DISTRIBUTIONS 12/19/2019 1Presentation by SADIA NOOR
  2. 2. QUANTILES: • When the numbers of observations is quite large the principle according to which a distribution or an order data set is divided into equal parts, may be extended to any numbers of divisions. • There are three methods: • QUARTILES the 3 values which divided the distribution into four equal parts denoted by Q1, Q2, Q3, • DECILES the 9 values which divided the distribution into ten equal parts denoted by D1 D2 D3 D4 D5 D6 D7 D8 D9. • PERCENTILES the 99 values which divided the distribution into hundred equal parts. Denoted by P1 P2 P3.................... P99. 12/19/2019 2Presentation by SADIA NOOR
  3. 3. Formulas for quantiles (ungrouped data): 12/19/2019 3Presentation by SADIA NOOR
  4. 4. Formulas for quantiles (grouped data): 12/19/2019 4Presentation by SADIA NOOR
  5. 5. Activity Daily wages Frequency 10-11 3 11-12 1 12-13 4 13-14 7 14-15 20 15-16 8 16-17 2 17-18 2 18-19 1 12/19/2019 5Presentation by SADIA NOOR
  6. 6. Normally distributed • We often test whether our data is normally distributed because this is a common assumption underlying many statistical tests. An example of a normally distributed set of data is presented below: 12/19/2019 6Presentation by SADIA NOOR
  7. 7. • When you have a normally distributed sample you can legitimately use both the mean and the median as your measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode are equal. However, in this situation, the mean is widely preferred as the best measure of central tendency because it is the measure that includes all the values in the data set for its calculation, and any change in any of the scores will affect the value of the mean. This is not the case with the median or mode. We find that the mean is being dragged in the direct of the skew. In these situations, the median is generally considered to be the best representative of the central location of the data. 12/19/2019 7Presentation by SADIA NOOR
  8. 8. • The more skewed the distribution, the greater the difference between the median and mean, and the greater emphasis should be placed on using the median as opposed to the mean. • A classic example of the below right-skewed distribution is income (salary), where higher- earners provide a false representation of the typical income if expressed as a mean and not a median. • However, when our data is skewed, for example, as with the right-skewed data set below 12/19/2019 8Presentation by SADIA NOOR
  9. 9. 12/19/2019 9Presentation by SADIA NOOR
  10. 10. • If dealing with a normal distribution and tests of normality show that the data is non- normal, it is customary to use the median instead of the mean. • However, this is more a rule of thumb than a strict guideline. Sometimes, researchers wish to report the mean of a skewed distribution if the median and mean are not appreciably different (a subjective assessment), and if it allows easier comparisons to previous research to be made. 12/19/2019 10Presentation by SADIA NOOR
  11. 11. Skewness • A distribution in which the values equidistant from the mean have equal frequencies is defined to be symmetrical and any departure from symmetry is called skewness. • It is important to note that in a perfectly symmetrical distribution, the mean, median, and mode coincide and that the two tails of distribution are equal in length from the mean. • If the right tail is longer then the left tail, the distribution is said to have positive skewness. • If the left tail is longer then the right tail, the distribution is said to have negative skewness. 12/19/2019 Presentation by SADIA NOOR 11
  12. 12. 12/19/2019 12Presentation by SADIA NOOR
  13. 13. • In a positively skewed distribution , the mean is greater than the median and the median is greater than the mode. • Mean > Median > Mode • In a negatively skewed distribution , the mode is greater than the median and the median is greater than the mean. • Mode > Median > Mean 12/19/2019 13Presentation by SADIA NOOR
  14. 14. Measures • According to measure the degree of skewness of a distribution or curve, Karl Pearson (1857- 1936) introduced a coefficient of skewness denoted by sk and defined by • sk= mean-mode Standard deviation • sk = 3(mean-median) Standard deviation 12/19/2019 Presentation by SADIA NOOR 14
  15. 15. • This coefficient usually varies between -3(negative skewness) and +3(positive Skewness) And the signs indicates the direction of skewness. • Arthur Lyon Bowley(1869-1957) a British statistician, proposed a measure of skewness. • The bowley’s coefficient of skewness is: • sk= Q1+ Q3 - 2 median Q3 – Q1 • Its value lies between 0 and ±1 12/19/2019 Presentation by SADIA NOOR 15
  16. 16. Example • Here are grouped data for heights of 100 randomly selected male students, adapted from Spiegel and Stephens (1999, 68). • A histogram shows that the data are skewed left, not symmetric. 12/19/2019 Presentation by SADIA NOOR 16
  17. 17. • But how highly skewed are they, compared to other data sets? To answer this question, you have to compute the skewness. • Begin with the sample size and sample mean. (The sample size was given, but it never hurts to check.) • n = 5+18+42+27+8 = 100 • x̅ = (61×5 + 64×18 + 67×42 + 70×27 + 73×8) ÷ 100 • x̅ = 9305 + 1152 + 2814 + 1890 + 584) ÷ 100 • x̅ = 6745÷100 = 67.45 12/19/2019 Presentation by SADIA NOOR 17
  18. 18. • Now, with the mean in hand, you can compute the skewness. (Of course in real life you’d probably use Excel or a statistics package, but it’s good to know where the numbers come from.) 12/19/2019 Presentation by SADIA NOOR 18
  19. 19. • Finally, the skewness is • g1 = m3 / m2 3/2 = −2.6933 / 8.52753/2 = −0.1082 • But wait, there’s more! That would be the skewness if you had data for the whole population. But obviously there are more than 100 male students in the world, or even in almost any school, so what you have here is a sample, not the population. You must compute the sample skewness: • =[√(100×99) / 98] [−2.6933 / 8.52753/2]= −0.1098 12/19/2019 Presentation by SADIA NOOR 19
  20. 20. Interpreting • If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly zero is quite unlikely for real- world data. • If skewness is less than −1 or greater than +1, the distribution is highly skewed. • If skewness is between −1 and −0.5 or between +0.5 and +1, the distribution is moderately skewed. • If skewness is between −0.5 and +o.5, the distribution is approximately symmetric. • With a skewness of −0.1098, the sample data for student heights are approximately symmetric. 12/19/2019 Presentation by SADIA NOOR 20
  21. 21. Kurtosis • Karl Pearson (1857-1936) introduced term kurtosis (literally the amount of hump) for the degree of Peakedness or flatness of a uni-model frequency curve. • When the value of a variable are closely bunched round the mode in such a way that the peak of the curve becomes relatively high, we say that the curve is leptokurtic. • If the curve is flat-topped, we say that the curve is Platykurtic. • Since the normal curve (to be Described later) is neither very peaked nor very flat-topped, it is taken as a basis for comparison. The normal curve itself is called Mesokurtic. 12/19/2019 Presentation by SADIA NOOR 21
  22. 22. • A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution with kurtosis ≈3 (excess ≈0) is called mesokurtic. • A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a normal distribution, its tails are shorter and thinner, and often its central peak is lower and broader. • A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a normal distribution, its tails are longer and fatter, and often its central peak is higher and sharper. 12/19/2019 Presentation by SADIA NOOR 22
  23. 23. 12/19/2019 Presentation by SADIA NOOR 23
  24. 24. kurtosis is unfortunately harder to picture than skewness, but these illustrations. All three of these distributions have mean of 0, standard deviation of 1, and skewness of 0, and all are plotted on the same horizontal and vertical scale. Look at the progression from left to right, as kurtosis increases. 12/19/2019 Presentation by SADIA NOOR 24
  25. 25. • The moment coefficient of kurtosis of a data set is computed almost the same way as the coefficient of skewness: just change the exponent 3 to 4 in the formulas: • kurtosis: a4 = m4 / m2 2 and • excess kurtosis: g2 = a4−3 • m4 = ∑(x−x̅)4 / n • m2 = ∑(x−x̅)2 / n 12/19/2019 Presentation by SADIA NOOR 25
  26. 26. • Example: Let’s continue with the example of the college men’s heights, and compute the kurtosis of the data set. n = 100, x̅ = 67.45 inches, and the variance m2 = 8.5275 in² were computed earlier. 12/19/2019 Presentation by SADIA NOOR 26
  27. 27. • Finally, the kurtosis is • a4 = m4 / m2² = 199.3760/8.5275² = 2.7418 • and the excess kurtosis is • g2 = 2.7418−3 = −0.2582 • But this is a sample, not the population, so you have to compute the sample excess kurtosis: • G2 = [99/(98×97)] [101×(−0.2582)+6)] = −0.2091 • This sample is slightly platykurtic: its peak is just a bit shallower than the peak of a normal distribution. 12/19/2019 Presentation by SADIA NOOR 27
  28. 28. Outlier • Outliers are extreme values that deviate from other observations on data , they may indicate a variability in a measurement, experimental errors or a novelty. In other words, an outlier is an observation that diverges from an overall pattern on a sample. 12/19/2019 Presentation by SADIA NOOR 28
  29. 29. 12/19/2019 Presentation by SADIA NOOR 29
  30. 30. 12/19/2019 Presentation by SADIA NOOR 30
  31. 31. Most common causes of outliers on a data set: • Data entry errors (human errors) • Measurement errors (instrument errors) • Experimental errors (data extraction or experiment planning/executing errors) • Intentional (dummy outliers made to test detection methods) • Data processing errors (data manipulation or data set unintended mutations) • Sampling errors (extracting or mixing data from wrong or various sources) • Natural (not an error, novelties in data) 12/19/2019 Presentation by SADIA NOOR 31
  32. 32. Describing a frequency distribution: • To describe the major characteristics of a frequency distribution, we need the calculations of the following five quantities: • The number of observations that describe the size of the data • A measure of central tendency such as the mean or median that provides information about the centre average value. • a measure of dispersion such as standard deviation that indicates the variability of the data. • A measure of skewness that shows lack of symmetry in the frequency distribution. • A measure of kurtosis that gives information about its Peakedness. 12/19/2019 Presentation by SADIA NOOR 32
  33. 33. Moments are a set of statistical parameters to measure a distribution. Four moments are commonly used: • 1st, Mean: the average • 2nd, Variance: – Standard deviation is the square root of the variance: an indication of how closely the values are spread about the mean. A small standard deviation means the values are all similar. If the distribution is normal, 63% of the values will be within 1 standard deviation. 12/19/2019 Presentation by SADIA NOOR 33
  34. 34. • 3rd, Skewness: measure the asymmetry of a distribution about its peak; it is a number that describes the shape of the distribution. – It is often approximated by Skew = (Mean - Median) / (Std dev). – If skewness is positive, the mean is bigger than the median and the distribution has a large tail of high values. – If skewness is negative, the mean is smaller than the median and the distribution has a large tail of small values. • 4th: Kurtosis: measures the Peakedness or flatness of a distribution. – Positive kurtosis indicates a thin pointed distribution. – Negative kurtosis indicates a broad flat distribution. • Higher moments tend to be less robust. 12/19/2019 Presentation by SADIA NOOR 34

