Your SlideShare is downloading. ×
0
Measures of Variability:
Ungrouped Data
• Measures of Variability - tools that describe
the spread or the dispersion of a ...
Measures of Spread or Dispersion:
Ungrouped Data
• Common Measures of Variability
–Range
–Inter-quartile Range
–Mean Absol...
Range
• The difference between the largest and the
smallest values in a set of data
– Advantage – easy to compute
– Disadv...
Interquartile Range
• Interquartile Range - range of values between
the first and third quartiles
• Range of the “middle h...
Deviations from the mean
Mean Absolute Deviation (MAD)
• One solution is to take the absolute value of each deviation
around the mean. This is call...
Sample Variance
• Another solution is the take the Sum of Squared Deviations
(SSD) about the mean
• Sample Variance is the...
Calculation of Sample Variance

Degree of
Freedom
Sample Standard Deviation
• Sample standard deviation is the square root of the sample
variance
• Denoted by s
• Benefit: ...
Standard Deviation: Empirical Rule
If a variable is normally distributed, then:
1. Approximately 68% of the observations l...
Standard Deviation : Empirical Rule
99.7%
95%
68%

x 3s

x 2s

x s

x

x s

x 2s

x 3s
A Note about the Empirical Rule
Note: The empirical rule may be used to determine whether or
not a set of data is approxim...
z Scores
• Z score – represents the number of Standard Deviation a
value (x) is above or below the mean of a set of number...
Coefficient of Variation (C.V.)
• Coefficient of Variation (CV) – measures the volatility
of a value (perhaps a stock port...
Coefficient of Variation (C.V.)
Consider two different populations

Since 15.86 > 11.90, the first population is more
vari...
Calculation of Grouped Mean
Sometimes data are already grouped, and we are
interested in calculating summary statistics
In...
Median of Grouped Data - Example
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

C...
Mode of Grouped Data
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

Frequency
6
1...
Variance and Standard Deviation
of Grouped Data
Variance and Standard Deviation
of Grouped Data
Class Interval

20-under 30
30-under 40
40-under 50
50-under 60
60-under 7...
Measures of Shape - Skewness
• Symmetrical – the right half is a mirror image
of the left half
• Skewed – shows that the d...
0.00

0.00

0.05

0.05

y

y

0.10

0.10

0.15

0.15

Measures of Shape - Skewness

0

5

10
x

15

20

0

5

10

15

x

P...
5-Number Summary
Box-and-Whisker Plot
A graphic representation of the 5-number summary:
• The five numerical values (smallest, first quarti...
Example: Box-and-Whisker Plot
Example: A random sample of students in a sixth grade class was
selected. Their weights are ...
Example: Box-and-Whisker Plot
Weights from Sixth Grade Class

60

70

80

90

100

110

Weight

L

Q1

~
x

Q3

H
Upcoming SlideShare
Loading in...5
×

Statr sessions 4 to 6

190

Published on

Praxis Weekend Analytics

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
190
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Statr sessions 4 to 6"

  1. 1. Measures of Variability: Ungrouped Data • Measures of Variability - tools that describe the spread or the dispersion of a set of data. – Provides more meaningful data when used • with measures of central tendency • in comparison to other groups
  2. 2. Measures of Spread or Dispersion: Ungrouped Data • Common Measures of Variability –Range –Inter-quartile Range –Mean Absolute Deviation –Variance and Standard Deviation –Coefficient of Variation
  3. 3. Range • The difference between the largest and the smallest values in a set of data – Advantage – easy to compute – Disadvantage – is affected by extreme values
  4. 4. Interquartile Range • Interquartile Range - range of values between the first and third quartiles • Range of the “middle half”; middle 50% – Useful when researchers are interested in the middle 50%, and not the extremes
  5. 5. Deviations from the mean
  6. 6. Mean Absolute Deviation (MAD) • One solution is to take the absolute value of each deviation around the mean. This is called the Mean Absolute Deviation • Note that while the MAD is intuitively simple, it is rarely used in practice
  7. 7. Sample Variance • Another solution is the take the Sum of Squared Deviations (SSD) about the mean • Sample Variance is the average of the squared deviations from the arithmetic mean • Sample Variance is denoted by s2 Why Sum of Squared Deviations about the mean? - Squaring deviations remove sign - The deviations are amplified
  8. 8. Calculation of Sample Variance Degree of Freedom
  9. 9. Sample Standard Deviation • Sample standard deviation is the square root of the sample variance • Denoted by s • Benefit: Same units as original data
  10. 10. Standard Deviation: Empirical Rule If a variable is normally distributed, then: 1. Approximately 68% of the observations lie within 1 standard deviation of the mean 2. Approximately 95% of the observations lie within 2 standard deviations of the mean 3. Approximately 99.7% of the observations lie within 3 standard deviations of the mean Notes:  Also applies to populations  Can be used to determine if a distribution is normally distributed
  11. 11. Standard Deviation : Empirical Rule 99.7% 95% 68% x 3s x 2s x s x x s x 2s x 3s
  12. 12. A Note about the Empirical Rule Note: The empirical rule may be used to determine whether or not a set of data is approximately normally distributed 1. Find the mean and standard deviation for the data 2. Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean 3. Compare these actual proportions with those given by the empirical rule 4. If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed
  13. 13. z Scores • Z score – represents the number of Standard Deviation a value (x) is above or below the mean of a set of numbers when the data are normally distributed • Z score allows translation of a value’s raw distance from the mean into units of standard deviations • z-scores typically range from -3.00 to +3.00 • z-scores may be used to make comparisons of raw scores
  14. 14. Coefficient of Variation (C.V.) • Coefficient of Variation (CV) – measures the volatility of a value (perhaps a stock portfolio), relative to its mean. It’s the ratio of the standard deviation to the mean, expressed as a percentage • Useful when comparing Standard Deviation is computed from data with different means • Measurement of relative dispersion
  15. 15. Coefficient of Variation (C.V.) Consider two different populations Since 15.86 > 11.90, the first population is more variable, relative to its mean, than the second population
  16. 16. Calculation of Grouped Mean Sometimes data are already grouped, and we are interested in calculating summary statistics Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Frequency (f) 6 18 11 11 3 1 50 Midpoint (M) 25 35 45 55 65 75 f*M 150 630 495 605 195 75 2150
  17. 17. Median of Grouped Data - Example Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Cumulative Frequency Frequency 6 6 18 24 11 35 11 46 3 49 1 50 N = 50
  18. 18. Mode of Grouped Data Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Frequency 6 18 11 11 3 1 • Mode : Midpoint of the modal class • Modal class : the class with greatest frequency
  19. 19. Variance and Standard Deviation of Grouped Data
  20. 20. Variance and Standard Deviation of Grouped Data Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 f M fM 6 18 11 11 3 1 50 25 35 45 55 65 75 150 630 495 605 195 75 2150 M -18 -8 2 12 22 32 M 324 64 4 144 484 1024 2 2 f M 1944 1152 44 1584 1452 1024 7200
  21. 21. Measures of Shape - Skewness • Symmetrical – the right half is a mirror image of the left half • Skewed – shows that the distribution lacks symmetry; used to denote the data is sparse at one end, and piled at the other end – Absence of symmetry – Extreme values or “tail” in one side of a distribution – Positively- or right-skewed vs. negatively- or left-skewed
  22. 22. 0.00 0.00 0.05 0.05 y y 0.10 0.10 0.15 0.15 Measures of Shape - Skewness 0 5 10 x 15 20 0 5 10 15 x Positively- or right-skewed vs. negatively- or left-skewed 20
  23. 23. 5-Number Summary
  24. 24. Box-and-Whisker Plot A graphic representation of the 5-number summary: • The five numerical values (smallest, first quartile, median, third quartile, and largest) are located on a scale, either vertical or horizontal • The box is used to depict the middle half of the data that lies between the two quartiles • The whiskers are line segments used to depict the other half of the data • One line segment represents the quarter of the data that is smaller in value than the first quartile • The second line segment represents the quarter of the data that is larger in value than the third quartile
  25. 25. Example: Box-and-Whisker Plot Example: A random sample of students in a sixth grade class was selected. Their weights are given in the table below. Find the 5number summary for this data and construct a boxplot: 63 64 76 76 81 83 90 91 92 93 93 93 99 101 108 109 112 63 L 85 Q1 92 ~ x 85 94 99 Q3 86 97 88 99 112 H 89 99
  26. 26. Example: Box-and-Whisker Plot Weights from Sixth Grade Class 60 70 80 90 100 110 Weight L Q1 ~ x Q3 H
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×