SlideShare a Scribd company logo
5/3/2023 Summary Statistics 1
Summary Statistics
Last week we used stemplots and histograms to
describe the shape, location, and spread of a
distribution. This week we use numerical summaries of
location and spread.
5/3/2023 Summary Statistics 2
Main Summary Statistics by Type
Central location
 Mean
 Median
 Mode
Spread
 Variance and standard deviation
 Quartiles and Inter Quartile Range (IQR)
Shape
 Statistical measures of spread (e.g., skewness and
kurtosis) are available but are seldom used in
practice (not covered)
5/3/2023 Summary Statistics 3
Notation
n  sample size
X  variable
xi  value of individual i
  sum all values (capital sigma)
Illustrative example (sample.sav), data:
21 42 5 11 30 50 28 27 24 52
 n = 10
 X = age
 x1= 21, x2= 42, …, x10= 52
 x = 21 + 42 + … + 52 = 290
5/3/2023 Summary Statistics 4
Sample Mean

 
 i
i
x
n
n
x
x
1
0
.
29
)
290
(
10
1
1


  i
x
n
x
Illustrative example: n = 10 (data & intermediate calculations on prior slide)
5/3/2023 Summary Statistics 5
Population Mean
Same operation as sample mean, but
based on entire population (N =
population size)
Not available in practice, but important
conceptually

 
 i
i
x
N
N
x 1

5/3/2023 Summary Statistics 6
Interpretation of xbar
Sample mean used to predict
 an observation drawn at random from a sample
 an observation drawn at random from the
population
 the population mean
Gravitational center (balance point)
0 10 20 30 40 50 60
Mean = 29
5/3/2023 Summary Statistics 7
Median – a different kind of average
“Middle value”
Covered last week
 Order data
 Depth of median is (n+1) / 2
 When n is odd  middle value
 When n is even  average two middle values
Illustrative example, n = 10  median has
depth (10+1) / 2 = 5.5
05 11 21 24 27 28 30 42 50 52

median = average of 27 and 28 = 27.5
5/3/2023 Summary Statistics 8
Median is “robust”
Robust  resistant to skews and outliers
This data set has a mean (xbar) of 1600:
1362 1439 1460 1614 1666 1792 1867
This data set has an outlier and a mean of 2743:
1362 1439 1460 1614 1666 1792 9867
Outlier
The median is 1614 in both instances.
The median was not influenced by the outlier.
5/3/2023 Summary Statistics 9
Mode
Mode  value with greatest frequency
e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7
Used only in very large data sets
5/3/2023 Summary Statistics 10
Mean, Median, Mode
(A) Symmetrical data: mean = median
(B) positive skew: mean > median [mean gets “pulled” by tail]
(C) negative skew: mean < median
Mean Mode
Median
(A)Symmetrica
l
Mode
Median
Mean
Mean
Median
Mode
(B)PositiveSkew (B)NegativeS
kew
5/3/2023 Summary Statistics 11
Spread = Variability
Variability  amount values spread
above and below the average
Measures of spread
 Range and inter-quartile range
 Standard deviation and variance (this week)
5/3/2023 Summary Statistics 12
Range = max – min
The range is rarely used in practice b/c it
tends to underestimate population range
and is not robust
5/3/2023 Summary Statistics 13
Standard deviation
x
xi 
Deviation =
 2
 
 x
x
SS i
Sum of squared deviations =
1
2


n
SS
s
Sample variance =
2
s
s 
Sample standard deviation =
Most common descriptive measure of spread
5/3/2023 Summary Statistics 14
Standard deviation (formula)
 

 2
)
(
1
1
x
x
n
s i
Sample standard deviation s is the unbiased estimator of
population standard deviation .
Population standard deviation  is rarely known in practice.
5/3/2023 Summary Statistics 15
New data set (“Metabolic Rates”)
This example is not in your lecture notes
Metabolic rates (cal/day), n = 7
1792 1666 1362 1614 1460 1867 1439
1600
7
200
,
11
7
1439
1867
1460
1614
1362
1666
1792









x
5/3/2023 Summary Statistics 16
Metabolic rates showing mean (*) and
deviations of first two observations
5/3/2023 Summary Statistics 17
Standard Deviation Calculation
metabolic.sav – introduced slide 15
Observations Deviations Squared deviations
1792 1792 1600 = 192 (192)2 = 36,864
1666 1666 1600 = 66 (66)2 = 4,356
1362 1362 1600 = -238 (-238)2 = 56,644
1614 1614 1600 = 14 (14)2 = 196
1460 1460 1600 = -140 (-140)2 = 19,600
1867 1867 1600 = 267 (267)2 = 71,289
1439 1439 1600 = -161 (-161)2 = 25,921
SUMS  0* SS = 214,870
x
xi 
i
x  2
x
xi 
* Sum of deviations will always equal zero
5/3/2023 Summary Statistics 18
Standard Deviation Metabolic data
(cont.)
2
2
calories
67
.
811
,
35
1
7
870
,
214
1





n
SS
s
calories
24
.
189
67
.
811
,
35
2


 s
s
Variance (s2)
Standard deviation (s)
5/3/2023 Summary Statistics 19
General rule for rounding means
and standard deviations
Report mean to one additional decimals above that of
the data
To achieve accuracy, intermediate calculations should
carry still an additional decimals
Illustrative example
 Suppose data is recorded with one decimal accuracy (i.e.,
xx.x)
 Report mean with two decimal accuracy (i.e., xx.xx)
 Carry all intermediate calculations with at least three decimal
accuracy (i.e., xx.xxx)
Even more important: Always use common sense and judgment.
5/3/2023 Summary Statistics 20
TI-30XIIS – about $12
In practice, we often use software
or a calculator to check our
standard deviation
5/3/2023 Summary Statistics 21
Interpretation of Standard Deviation
Larger standard deviation  greater variability
 s1 = 15 and s2 = 10  group 1 has more variability
68-95-99.7 rule – Normal data only
 68% of data with 1 SD of mean, 95% within 2 SD from
mean, and 99.7% within 3 SD of mean
 e.g., if mean = 30 and SD = 10, then 95% of individuals are
in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50)
Chebychev’s rule – All data
 at least 75% data within 2 SD of mean
 e.g., mean = 30 and SD = 10, then at least 75% of
individuals in range 30 ± (2)(10) = (10 to 50)
5/3/2023 Summary Statistics 22
Quartiles and IQR
Quartiles divide the ordered data into
four equally-sized groups
Q0 = minimum
Q1 = 25th %ile
Q2 = 50th %ile (Median)
Q3 = 75th %ile
Q4 = maximum
5/3/2023 Summary Statistics 23
Rule for quartiles
Find the median  Q2
Middle of lower half of data set  Q1
Middle of upper half of the data  Q3
Bottom half | Top half
05 11 21 24 27 | 28 30 42 50 52
  
Q1 Q2 Q3
IQR = Q3 – Q1 = 42 – 21 = 21
gives spread of middle 50% of the data
5/3/2023 Summary Statistics 24
5-Point Summary (sample.sav)
Q0 = 5 (minimum)
Q1 = 21 (lower hinge)
Q2 = 27.5 (median)
Q3 = 42 (upper hinge)
Q4 = 52 (maximum)
Best descriptive statistics for skewed data
5/3/2023 Summary Statistics 25
Illustrative example (metabolic.sav)
1362 1439 1460 1614 1666 1792 1867

median
Bottom half : 1362 1439 1460 1614

Q1 = (1439 + 1460) / 2 = 1449.5
Top half: 1614 1666 1792 1867

Q3 = (1666 + 1792) / 2 = 1729
5-point summary: 1362, 1449.5, 1614, 1729, 1867
5/3/2023 Summary Statistics 26
Box-and-whiskers plot (boxplot)
5 point summary + “outside values”
Procedure
 Determine 5-point summary
 Draw box from Q1 to Q3
 Draw line @ Q2
 Calculate IQR = Q3 – Q1
 Calculate fences
 FLower = Q1 – 1.5(IQR)
 FUpper = Q3 + 1.5(IQR)
 Determine if any outside values? If so, plot separately
 Determine inside values and draw whiskers from box to
inside values
5/3/2023 Summary Statistics 27
Boxplot example
5-point: 5, 21, 27.5, 42, 52
IQR = 42 – 21 = 21
FU = 42 + (1.5)(21) = 73.5
 No outside above (outside)
Upper inside value = 52
FL = 21 – (1.5)(21) = –10.5
 No values below (outside)
 Lower inside value = 5
05 11 21 24 27 28 30 42 50 52
60
50
40
30
20
10
0
Upper inside = 52
Q3 = 42
Q1 = 21
Lower inside = 5
Q2 = 27.5
5/3/2023 Summary Statistics 28
Boxplot example 2
5-point: 3, 22, 25.5, 29, 51
IQR = 29 – 22 = 7
FU = 29 + (1.5)(7) = 39.5
 One outside (51)
 Inside value = 31
FL = 22 – (1.5)(7) = 11.5
 One outside (3)
 Inside value = 21
3 21 22 24 25 26 28 29 31 51
60
50
40
30
20
10
0
Outside value (51)
Outside value (3)
Inside value (21)
Upper hinge (29)
Lower hinge (22)
Median (25.5)
Inside value (31)
5/3/2023 Summary Statistics 29
Boxplot example 3 (metabolic.sav)
5-point: 1362, 1449.5, 1614, 1729,
1867 (slide 30)
IQR = 1729 – 1449.5 = 279.5
FU = 1729 + (1.5)(279.5) =
2148.25
 None outside
 Upper inside = 1867
FL = 1449.5 – (1.5)(279.5) =
1030.25
 None outside
 Lower inside = 1362
1362 1439 1460 1614 1666 1792 1867
7
N =
Data source: Moore,
2000
1900
1800
1700
1600
1500
1400
1300
5/3/2023 Summary Statistics 30
Interpretation of boxplots
Location
 Position of median
 Position of box
Spread
 Hinge-spread (box length) = IQR
 Whisker-to-whisker spread (range or range minus
the outside values)
Shape
 Symmetry of box
 Size of whiskers
 Outside values (potential outliers)
5/3/2023 Summary Statistics 31
Side-by-side boxplots
Boxplots are especially useful for comparing groups:

More Related Content

Similar to summary statistics

Rm class-2 part-1
Rm class-2 part-1Rm class-2 part-1
Rm class-2 part-1
anupta jana
 
Measures of-variation
Measures of-variationMeasures of-variation
Measures of-variation
Jhonna Barrosa
 
measure of variability (windri). In research include example
measure of variability (windri). In research include examplemeasure of variability (windri). In research include example
measure of variability (windri). In research include example
windri3
 
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
MaxineBoyd
 
CENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITY
CENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITYCENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITY
CENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITY
SharmaineTuliao1
 
An overview of statistics management with excel
An overview of statistics management with excelAn overview of statistics management with excel
An overview of statistics management with excel
KRISHANACHOUDHARY1
 
First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1
OmotaraAkinsowon
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
Brian Miles
 
Chapter13
Chapter13Chapter13
Chapter13
Richard Ferreria
 
Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2
nurun2010
 
Math unit18 measure of variation
Math unit18 measure of variationMath unit18 measure of variation
Math unit18 measure of variation
eLearningJa
 
Mean, median, and mode ug
Mean, median, and mode ugMean, median, and mode ug
Mean, median, and mode ug
AbhishekDas15
 
Central tendency
Central tendencyCentral tendency
Central tendency
Anil Kr Jha
 
Describing Data: Numerical Measures
Describing Data: Numerical MeasuresDescribing Data: Numerical Measures
Describing Data: Numerical Measures
ConflagratioNal Jahid
 
Variability
VariabilityVariability
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
Melba Shaya Sweety
 
8490370.ppt
8490370.ppt8490370.ppt
8490370.ppt
ssuserfa15e21
 
Statistical methods
Statistical methods Statistical methods
Statistical methods
rcm business
 
03 ch ken black solution
03 ch ken black solution03 ch ken black solution
03 ch ken black solution
Krunal Shah
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
Nadeem Uddin
 

Similar to summary statistics (20)

Rm class-2 part-1
Rm class-2 part-1Rm class-2 part-1
Rm class-2 part-1
 
Measures of-variation
Measures of-variationMeasures of-variation
Measures of-variation
 
measure of variability (windri). In research include example
measure of variability (windri). In research include examplemeasure of variability (windri). In research include example
measure of variability (windri). In research include example
 
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
 
CENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITY
CENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITYCENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITY
CENTRAL LIMIT THEOREM- STATISTICS AND PROBABILITY
 
An overview of statistics management with excel
An overview of statistics management with excelAn overview of statistics management with excel
An overview of statistics management with excel
 
First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
 
Chapter13
Chapter13Chapter13
Chapter13
 
Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2
 
Math unit18 measure of variation
Math unit18 measure of variationMath unit18 measure of variation
Math unit18 measure of variation
 
Mean, median, and mode ug
Mean, median, and mode ugMean, median, and mode ug
Mean, median, and mode ug
 
Central tendency
Central tendencyCentral tendency
Central tendency
 
Describing Data: Numerical Measures
Describing Data: Numerical MeasuresDescribing Data: Numerical Measures
Describing Data: Numerical Measures
 
Variability
VariabilityVariability
Variability
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
8490370.ppt
8490370.ppt8490370.ppt
8490370.ppt
 
Statistical methods
Statistical methods Statistical methods
Statistical methods
 
03 ch ken black solution
03 ch ken black solution03 ch ken black solution
03 ch ken black solution
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 

More from ClarissaCambosaReyes

CHAPTER IV.pptx
CHAPTER IV.pptxCHAPTER IV.pptx
CHAPTER IV.pptx
ClarissaCambosaReyes
 
CONCEPTUAL FRAMEWORK (NATH).docx
CONCEPTUAL FRAMEWORK (NATH).docxCONCEPTUAL FRAMEWORK (NATH).docx
CONCEPTUAL FRAMEWORK (NATH).docx
ClarissaCambosaReyes
 
Secondary Education
Secondary EducationSecondary Education
Secondary Education
ClarissaCambosaReyes
 
Presentation (3).pdf
Presentation (3).pdfPresentation (3).pdf
Presentation (3).pdf
ClarissaCambosaReyes
 
THE AGRICULTURAL research
THE AGRICULTURAL researchTHE AGRICULTURAL research
THE AGRICULTURAL research
ClarissaCambosaReyes
 
Title Defense Presentation and Format.pptx
Title Defense Presentation and Format.pptxTitle Defense Presentation and Format.pptx
Title Defense Presentation and Format.pptx
ClarissaCambosaReyes
 
syllabus-on-philippine-literature.docx
syllabus-on-philippine-literature.docxsyllabus-on-philippine-literature.docx
syllabus-on-philippine-literature.docx
ClarissaCambosaReyes
 

More from ClarissaCambosaReyes (7)

CHAPTER IV.pptx
CHAPTER IV.pptxCHAPTER IV.pptx
CHAPTER IV.pptx
 
CONCEPTUAL FRAMEWORK (NATH).docx
CONCEPTUAL FRAMEWORK (NATH).docxCONCEPTUAL FRAMEWORK (NATH).docx
CONCEPTUAL FRAMEWORK (NATH).docx
 
Secondary Education
Secondary EducationSecondary Education
Secondary Education
 
Presentation (3).pdf
Presentation (3).pdfPresentation (3).pdf
Presentation (3).pdf
 
THE AGRICULTURAL research
THE AGRICULTURAL researchTHE AGRICULTURAL research
THE AGRICULTURAL research
 
Title Defense Presentation and Format.pptx
Title Defense Presentation and Format.pptxTitle Defense Presentation and Format.pptx
Title Defense Presentation and Format.pptx
 
syllabus-on-philippine-literature.docx
syllabus-on-philippine-literature.docxsyllabus-on-philippine-literature.docx
syllabus-on-philippine-literature.docx
 

Recently uploaded

Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 

Recently uploaded (20)

Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 

summary statistics

  • 1. 5/3/2023 Summary Statistics 1 Summary Statistics Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread.
  • 2. 5/3/2023 Summary Statistics 2 Main Summary Statistics by Type Central location  Mean  Median  Mode Spread  Variance and standard deviation  Quartiles and Inter Quartile Range (IQR) Shape  Statistical measures of spread (e.g., skewness and kurtosis) are available but are seldom used in practice (not covered)
  • 3. 5/3/2023 Summary Statistics 3 Notation n  sample size X  variable xi  value of individual i   sum all values (capital sigma) Illustrative example (sample.sav), data: 21 42 5 11 30 50 28 27 24 52  n = 10  X = age  x1= 21, x2= 42, …, x10= 52  x = 21 + 42 + … + 52 = 290
  • 4. 5/3/2023 Summary Statistics 4 Sample Mean     i i x n n x x 1 0 . 29 ) 290 ( 10 1 1     i x n x Illustrative example: n = 10 (data & intermediate calculations on prior slide)
  • 5. 5/3/2023 Summary Statistics 5 Population Mean Same operation as sample mean, but based on entire population (N = population size) Not available in practice, but important conceptually     i i x N N x 1 
  • 6. 5/3/2023 Summary Statistics 6 Interpretation of xbar Sample mean used to predict  an observation drawn at random from a sample  an observation drawn at random from the population  the population mean Gravitational center (balance point) 0 10 20 30 40 50 60 Mean = 29
  • 7. 5/3/2023 Summary Statistics 7 Median – a different kind of average “Middle value” Covered last week  Order data  Depth of median is (n+1) / 2  When n is odd  middle value  When n is even  average two middle values Illustrative example, n = 10  median has depth (10+1) / 2 = 5.5 05 11 21 24 27 28 30 42 50 52  median = average of 27 and 28 = 27.5
  • 8. 5/3/2023 Summary Statistics 8 Median is “robust” Robust  resistant to skews and outliers This data set has a mean (xbar) of 1600: 1362 1439 1460 1614 1666 1792 1867 This data set has an outlier and a mean of 2743: 1362 1439 1460 1614 1666 1792 9867 Outlier The median is 1614 in both instances. The median was not influenced by the outlier.
  • 9. 5/3/2023 Summary Statistics 9 Mode Mode  value with greatest frequency e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7 Used only in very large data sets
  • 10. 5/3/2023 Summary Statistics 10 Mean, Median, Mode (A) Symmetrical data: mean = median (B) positive skew: mean > median [mean gets “pulled” by tail] (C) negative skew: mean < median Mean Mode Median (A)Symmetrica l Mode Median Mean Mean Median Mode (B)PositiveSkew (B)NegativeS kew
  • 11. 5/3/2023 Summary Statistics 11 Spread = Variability Variability  amount values spread above and below the average Measures of spread  Range and inter-quartile range  Standard deviation and variance (this week)
  • 12. 5/3/2023 Summary Statistics 12 Range = max – min The range is rarely used in practice b/c it tends to underestimate population range and is not robust
  • 13. 5/3/2023 Summary Statistics 13 Standard deviation x xi  Deviation =  2    x x SS i Sum of squared deviations = 1 2   n SS s Sample variance = 2 s s  Sample standard deviation = Most common descriptive measure of spread
  • 14. 5/3/2023 Summary Statistics 14 Standard deviation (formula)     2 ) ( 1 1 x x n s i Sample standard deviation s is the unbiased estimator of population standard deviation . Population standard deviation  is rarely known in practice.
  • 15. 5/3/2023 Summary Statistics 15 New data set (“Metabolic Rates”) This example is not in your lecture notes Metabolic rates (cal/day), n = 7 1792 1666 1362 1614 1460 1867 1439 1600 7 200 , 11 7 1439 1867 1460 1614 1362 1666 1792          x
  • 16. 5/3/2023 Summary Statistics 16 Metabolic rates showing mean (*) and deviations of first two observations
  • 17. 5/3/2023 Summary Statistics 17 Standard Deviation Calculation metabolic.sav – introduced slide 15 Observations Deviations Squared deviations 1792 1792 1600 = 192 (192)2 = 36,864 1666 1666 1600 = 66 (66)2 = 4,356 1362 1362 1600 = -238 (-238)2 = 56,644 1614 1614 1600 = 14 (14)2 = 196 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 SUMS  0* SS = 214,870 x xi  i x  2 x xi  * Sum of deviations will always equal zero
  • 18. 5/3/2023 Summary Statistics 18 Standard Deviation Metabolic data (cont.) 2 2 calories 67 . 811 , 35 1 7 870 , 214 1      n SS s calories 24 . 189 67 . 811 , 35 2    s s Variance (s2) Standard deviation (s)
  • 19. 5/3/2023 Summary Statistics 19 General rule for rounding means and standard deviations Report mean to one additional decimals above that of the data To achieve accuracy, intermediate calculations should carry still an additional decimals Illustrative example  Suppose data is recorded with one decimal accuracy (i.e., xx.x)  Report mean with two decimal accuracy (i.e., xx.xx)  Carry all intermediate calculations with at least three decimal accuracy (i.e., xx.xxx) Even more important: Always use common sense and judgment.
  • 20. 5/3/2023 Summary Statistics 20 TI-30XIIS – about $12 In practice, we often use software or a calculator to check our standard deviation
  • 21. 5/3/2023 Summary Statistics 21 Interpretation of Standard Deviation Larger standard deviation  greater variability  s1 = 15 and s2 = 10  group 1 has more variability 68-95-99.7 rule – Normal data only  68% of data with 1 SD of mean, 95% within 2 SD from mean, and 99.7% within 3 SD of mean  e.g., if mean = 30 and SD = 10, then 95% of individuals are in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50) Chebychev’s rule – All data  at least 75% data within 2 SD of mean  e.g., mean = 30 and SD = 10, then at least 75% of individuals in range 30 ± (2)(10) = (10 to 50)
  • 22. 5/3/2023 Summary Statistics 22 Quartiles and IQR Quartiles divide the ordered data into four equally-sized groups Q0 = minimum Q1 = 25th %ile Q2 = 50th %ile (Median) Q3 = 75th %ile Q4 = maximum
  • 23. 5/3/2023 Summary Statistics 23 Rule for quartiles Find the median  Q2 Middle of lower half of data set  Q1 Middle of upper half of the data  Q3 Bottom half | Top half 05 11 21 24 27 | 28 30 42 50 52    Q1 Q2 Q3 IQR = Q3 – Q1 = 42 – 21 = 21 gives spread of middle 50% of the data
  • 24. 5/3/2023 Summary Statistics 24 5-Point Summary (sample.sav) Q0 = 5 (minimum) Q1 = 21 (lower hinge) Q2 = 27.5 (median) Q3 = 42 (upper hinge) Q4 = 52 (maximum) Best descriptive statistics for skewed data
  • 25. 5/3/2023 Summary Statistics 25 Illustrative example (metabolic.sav) 1362 1439 1460 1614 1666 1792 1867  median Bottom half : 1362 1439 1460 1614  Q1 = (1439 + 1460) / 2 = 1449.5 Top half: 1614 1666 1792 1867  Q3 = (1666 + 1792) / 2 = 1729 5-point summary: 1362, 1449.5, 1614, 1729, 1867
  • 26. 5/3/2023 Summary Statistics 26 Box-and-whiskers plot (boxplot) 5 point summary + “outside values” Procedure  Determine 5-point summary  Draw box from Q1 to Q3  Draw line @ Q2  Calculate IQR = Q3 – Q1  Calculate fences  FLower = Q1 – 1.5(IQR)  FUpper = Q3 + 1.5(IQR)  Determine if any outside values? If so, plot separately  Determine inside values and draw whiskers from box to inside values
  • 27. 5/3/2023 Summary Statistics 27 Boxplot example 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 FU = 42 + (1.5)(21) = 73.5  No outside above (outside) Upper inside value = 52 FL = 21 – (1.5)(21) = –10.5  No values below (outside)  Lower inside value = 5 05 11 21 24 27 28 30 42 50 52 60 50 40 30 20 10 0 Upper inside = 52 Q3 = 42 Q1 = 21 Lower inside = 5 Q2 = 27.5
  • 28. 5/3/2023 Summary Statistics 28 Boxplot example 2 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 FU = 29 + (1.5)(7) = 39.5  One outside (51)  Inside value = 31 FL = 22 – (1.5)(7) = 11.5  One outside (3)  Inside value = 21 3 21 22 24 25 26 28 29 31 51 60 50 40 30 20 10 0 Outside value (51) Outside value (3) Inside value (21) Upper hinge (29) Lower hinge (22) Median (25.5) Inside value (31)
  • 29. 5/3/2023 Summary Statistics 29 Boxplot example 3 (metabolic.sav) 5-point: 1362, 1449.5, 1614, 1729, 1867 (slide 30) IQR = 1729 – 1449.5 = 279.5 FU = 1729 + (1.5)(279.5) = 2148.25  None outside  Upper inside = 1867 FL = 1449.5 – (1.5)(279.5) = 1030.25  None outside  Lower inside = 1362 1362 1439 1460 1614 1666 1792 1867 7 N = Data source: Moore, 2000 1900 1800 1700 1600 1500 1400 1300
  • 30. 5/3/2023 Summary Statistics 30 Interpretation of boxplots Location  Position of median  Position of box Spread  Hinge-spread (box length) = IQR  Whisker-to-whisker spread (range or range minus the outside values) Shape  Symmetry of box  Size of whiskers  Outside values (potential outliers)
  • 31. 5/3/2023 Summary Statistics 31 Side-by-side boxplots Boxplots are especially useful for comparing groups:

Editor's Notes

  1. 5/3/2023
  2. 5/3/2023
  3. 5/3/2023
  4. 5/3/2023
  5. 5/3/2023
  6. 5/3/2023
  7. 5/3/2023
  8. 5/3/2023
  9. 5/3/2023
  10. 5/3/2023
  11. 5/3/2023
  12. 5/3/2023