Upcoming SlideShare
×

# Statr sessions 4 to 6

704 views

Published on

Praxis Weekend Analytics

Published in: Education, Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
704
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
24
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Statr sessions 4 to 6

1. 1. Measures of Variability: Ungrouped Data • Measures of Variability - tools that describe the spread or the dispersion of a set of data. – Provides more meaningful data when used • with measures of central tendency • in comparison to other groups
2. 2. Measures of Spread or Dispersion: Ungrouped Data • Common Measures of Variability –Range –Inter-quartile Range –Mean Absolute Deviation –Variance and Standard Deviation –Coefficient of Variation
3. 3. Range • The difference between the largest and the smallest values in a set of data – Advantage – easy to compute – Disadvantage – is affected by extreme values
4. 4. Interquartile Range • Interquartile Range - range of values between the first and third quartiles • Range of the “middle half”; middle 50% – Useful when researchers are interested in the middle 50%, and not the extremes
5. 5. Deviations from the mean
6. 6. Mean Absolute Deviation (MAD) • One solution is to take the absolute value of each deviation around the mean. This is called the Mean Absolute Deviation • Note that while the MAD is intuitively simple, it is rarely used in practice
7. 7. Sample Variance • Another solution is the take the Sum of Squared Deviations (SSD) about the mean • Sample Variance is the average of the squared deviations from the arithmetic mean • Sample Variance is denoted by s2 Why Sum of Squared Deviations about the mean? - Squaring deviations remove sign - The deviations are amplified
8. 8. Calculation of Sample Variance Degree of Freedom
9. 9. Sample Standard Deviation • Sample standard deviation is the square root of the sample variance • Denoted by s • Benefit: Same units as original data
10. 10. Standard Deviation: Empirical Rule If a variable is normally distributed, then: 1. Approximately 68% of the observations lie within 1 standard deviation of the mean 2. Approximately 95% of the observations lie within 2 standard deviations of the mean 3. Approximately 99.7% of the observations lie within 3 standard deviations of the mean Notes:  Also applies to populations  Can be used to determine if a distribution is normally distributed
11. 11. Standard Deviation : Empirical Rule 99.7% 95% 68% x 3s x 2s x s x x s x 2s x 3s
12. 12. A Note about the Empirical Rule Note: The empirical rule may be used to determine whether or not a set of data is approximately normally distributed 1. Find the mean and standard deviation for the data 2. Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean 3. Compare these actual proportions with those given by the empirical rule 4. If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed
13. 13. z Scores • Z score – represents the number of Standard Deviation a value (x) is above or below the mean of a set of numbers when the data are normally distributed • Z score allows translation of a value’s raw distance from the mean into units of standard deviations • z-scores typically range from -3.00 to +3.00 • z-scores may be used to make comparisons of raw scores
14. 14. Coefficient of Variation (C.V.) • Coefficient of Variation (CV) – measures the volatility of a value (perhaps a stock portfolio), relative to its mean. It’s the ratio of the standard deviation to the mean, expressed as a percentage • Useful when comparing Standard Deviation is computed from data with different means • Measurement of relative dispersion
15. 15. Coefficient of Variation (C.V.) Consider two different populations Since 15.86 > 11.90, the first population is more variable, relative to its mean, than the second population
16. 16. Calculation of Grouped Mean Sometimes data are already grouped, and we are interested in calculating summary statistics Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Frequency (f) 6 18 11 11 3 1 50 Midpoint (M) 25 35 45 55 65 75 f*M 150 630 495 605 195 75 2150
17. 17. Median of Grouped Data - Example Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Cumulative Frequency Frequency 6 6 18 24 11 35 11 46 3 49 1 50 N = 50
18. 18. Mode of Grouped Data Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Frequency 6 18 11 11 3 1 • Mode : Midpoint of the modal class • Modal class : the class with greatest frequency
19. 19. Variance and Standard Deviation of Grouped Data
20. 20. Variance and Standard Deviation of Grouped Data Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 f M fM 6 18 11 11 3 1 50 25 35 45 55 65 75 150 630 495 605 195 75 2150 M -18 -8 2 12 22 32 M 324 64 4 144 484 1024 2 2 f M 1944 1152 44 1584 1452 1024 7200
21. 21. Measures of Shape - Skewness • Symmetrical – the right half is a mirror image of the left half • Skewed – shows that the distribution lacks symmetry; used to denote the data is sparse at one end, and piled at the other end – Absence of symmetry – Extreme values or “tail” in one side of a distribution – Positively- or right-skewed vs. negatively- or left-skewed
22. 22. 0.00 0.00 0.05 0.05 y y 0.10 0.10 0.15 0.15 Measures of Shape - Skewness 0 5 10 x 15 20 0 5 10 15 x Positively- or right-skewed vs. negatively- or left-skewed 20
23. 23. 5-Number Summary
24. 24. Box-and-Whisker Plot A graphic representation of the 5-number summary: • The five numerical values (smallest, first quartile, median, third quartile, and largest) are located on a scale, either vertical or horizontal • The box is used to depict the middle half of the data that lies between the two quartiles • The whiskers are line segments used to depict the other half of the data • One line segment represents the quarter of the data that is smaller in value than the first quartile • The second line segment represents the quarter of the data that is larger in value than the third quartile
25. 25. Example: Box-and-Whisker Plot Example: A random sample of students in a sixth grade class was selected. Their weights are given in the table below. Find the 5number summary for this data and construct a boxplot: 63 64 76 76 81 83 90 91 92 93 93 93 99 101 108 109 112 63 L 85 Q1 92 ~ x 85 94 99 Q3 86 97 88 99 112 H 89 99
26. 26. Example: Box-and-Whisker Plot Weights from Sixth Grade Class 60 70 80 90 100 110 Weight L Q1 ~ x Q3 H