SlideShare a Scribd company logo
12/24/2022 Summary Statistics 1
Summary Statistics
Last week we used stemplots and histograms to
describe the shape, location, and spread of a
distribution. This week we use numerical summaries of
location and spread.
12/24/2022 Summary Statistics 2
Main Summary Statistics by Type
Central location
 Mean
 Median
 Mode
Spread
 Variance and standard deviation
 Quartiles and Inter Quartile Range (IQR)
Shape
 Statistical measures of spread (e.g., skewness and
kurtosis) are available but are seldom used in
practice (not covered)
12/24/2022 Summary Statistics 3
Notation
n  sample size
X  variable
xi  value of individual i
  sum all values (capital sigma)
Illustrative example (sample.sav), data:
21 42 5 11 30 50 28 27 24 52
 n = 10
 X = age
 x1= 21, x2= 42, …, x10= 52
 x = 21 + 42 + … + 52 = 290
12/24/2022 Summary Statistics 4
Sample Mean

 
 i
i
x
n
n
x
x
1
0
.
29
)
290
(
10
1
1


  i
x
n
x
Illustrative example: n = 10 (data & intermediate calculations on prior slide)
12/24/2022 Summary Statistics 5
Population Mean
Same operation as sample mean, but
based on entire population (N =
population size)
Not available in practice, but important
conceptually

 
 i
i
x
N
N
x 1

12/24/2022 Summary Statistics 6
Interpretation of xbar
Sample mean used to predict
 an observation drawn at random from a sample
 an observation drawn at random from the
population
 the population mean
Gravitational center (balance point)
0 10 20 30 40 50 60
Mean = 29
12/24/2022 Summary Statistics 7
Median – a different kind of average
“Middle value”
Covered last week
 Order data
 Depth of median is (n+1) / 2
 When n is odd  middle value
 When n is even  average two middle values
Illustrative example, n = 10  median has
depth (10+1) / 2 = 5.5
05 11 21 24 27 28 30 42 50 52

median = average of 27 and 28 = 27.5
12/24/2022 Summary Statistics 8
Median is “robust”
Robust  resistant to skews and outliers
This data set has a mean (xbar) of 1600:
1362 1439 1460 1614 1666 1792 1867
This data set has an outlier and a mean of 2743:
1362 1439 1460 1614 1666 1792 9867
Outlier
The median is 1614 in both instances.
The median was not influenced by the outlier.
12/24/2022 Summary Statistics 9
Mode
Mode  value with greatest frequency
e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7
Used only in very large data sets
12/24/2022 Summary Statistics 10
Mean, Median, Mode
(A) Symmetrical data: mean = median
(B) positive skew: mean > median [mean gets “pulled” by tail]
(C) negative skew: mean < median
Mean Mode
Median
(A)Symmetrica
l
Mode
Median
Mean
Mean
Median
Mode
(B)PositiveSkew (B)NegativeS
kew
12/24/2022 Summary Statistics 11
Spread = Variability
Variability  amount values spread
above and below the average
Measures of spread
 Range and inter-quartile range
 Standard deviation and variance (this week)
12/24/2022 Summary Statistics 12
Range = max – min
The range is rarely used in practice b/c it
tends to underestimate population range
and is not robust
12/24/2022 Summary Statistics 13
Standard deviation
x
xi 
Deviation =
 2
 
 x
x
SS i
Sum of squared deviations =
1
2


n
SS
s
Sample variance =
2
s
s 
Sample standard deviation =
Most common descriptive measure of spread
12/24/2022 Summary Statistics 14
Standard deviation (formula)
 

 2
)
(
1
1
x
x
n
s i
Sample standard deviation s is the unbiased estimator of
population standard deviation .
Population standard deviation  is rarely known in practice.
12/24/2022 Summary Statistics 15
New data set (“Metabolic Rates”)
This example is not in your lecture notes
Metabolic rates (cal/day), n = 7
1792 1666 1362 1614 1460 1867 1439
1600
7
200
,
11
7
1439
1867
1460
1614
1362
1666
1792









x
12/24/2022 Summary Statistics 16
Metabolic rates showing mean (*) and
deviations of first two observations
12/24/2022 Summary Statistics 17
Standard Deviation Calculation
metabolic.sav – introduced slide 15
Observations Deviations Squared deviations
1792 1792 1600 = 192 (192)2 = 36,864
1666 1666 1600 = 66 (66)2 = 4,356
1362 1362 1600 = -238 (-238)2 = 56,644
1614 1614 1600 = 14 (14)2 = 196
1460 1460 1600 = -140 (-140)2 = 19,600
1867 1867 1600 = 267 (267)2 = 71,289
1439 1439 1600 = -161 (-161)2 = 25,921
SUMS  0* SS = 214,870
x
xi 
i
x  2
x
xi 
* Sum of deviations will always equal zero
12/24/2022 Summary Statistics 18
Standard Deviation Metabolic data
(cont.)
2
2
calories
67
.
811
,
35
1
7
870
,
214
1





n
SS
s
calories
24
.
189
67
.
811
,
35
2


 s
s
Variance (s2)
Standard deviation (s)
12/24/2022 Summary Statistics 19
General rule for rounding means
and standard deviations
Report mean to one additional decimals above that of
the data
To achieve accuracy, intermediate calculations should
carry still an additional decimals
Illustrative example
 Suppose data is recorded with one decimal accuracy (i.e.,
xx.x)
 Report mean with two decimal accuracy (i.e., xx.xx)
 Carry all intermediate calculations with at least three decimal
accuracy (i.e., xx.xxx)
Even more important: Always use common sense and judgment.
12/24/2022 Summary Statistics 20
TI-30XIIS – about $12
In practice, we often use software
or a calculator to check our
standard deviation
12/24/2022 Summary Statistics 21
Interpretation of Standard Deviation
Larger standard deviation  greater variability
 s1 = 15 and s2 = 10  group 1 has more variability
68-95-99.7 rule – Normal data only
 68% of data with 1 SD of mean, 95% within 2 SD from
mean, and 99.7% within 3 SD of mean
 e.g., if mean = 30 and SD = 10, then 95% of individuals are
in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50)
Chebychev’s rule – All data
 at least 75% data within 2 SD of mean
 e.g., mean = 30 and SD = 10, then at least 75% of
individuals in range 30 ± (2)(10) = (10 to 50)
12/24/2022 Summary Statistics 22
Quartiles and IQR
Quartiles divide the ordered data into
four equally-sized groups
Q0 = minimum
Q1 = 25th %ile
Q2 = 50th %ile (Median)
Q3 = 75th %ile
Q4 = maximum
12/24/2022 Summary Statistics 23
Rule for quartiles
Find the median  Q2
Middle of lower half of data set  Q1
Middle of upper half of the data  Q3
Bottom half | Top half
05 11 21 24 27 | 28 30 42 50 52
  
Q1 Q2 Q3
IQR = Q3 – Q1 = 42 – 21 = 21
gives spread of middle 50% of the data
12/24/2022 Summary Statistics 24
5-Point Summary (sample.sav)
Q0 = 5 (minimum)
Q1 = 21 (lower hinge)
Q2 = 27.5 (median)
Q3 = 42 (upper hinge)
Q4 = 52 (maximum)
Best descriptive statistics for skewed data
12/24/2022 Summary Statistics 25
Illustrative example (metabolic.sav)
1362 1439 1460 1614 1666 1792 1867

median
Bottom half : 1362 1439 1460 1614

Q1 = (1439 + 1460) / 2 = 1449.5
Top half: 1614 1666 1792 1867

Q3 = (1666 + 1792) / 2 = 1729
5-point summary: 1362, 1449.5, 1614, 1729, 1867
12/24/2022 Summary Statistics 26
Box-and-whiskers plot (boxplot)
5 point summary + “outside values”
Procedure
 Determine 5-point summary
 Draw box from Q1 to Q3
 Draw line @ Q2
 Calculate IQR = Q3 – Q1
 Calculate fences
 FLower = Q1 – 1.5(IQR)
 FUpper = Q3 + 1.5(IQR)
 Determine if any outside values? If so, plot separately
 Determine inside values and draw whiskers from box to
inside values
12/24/2022 Summary Statistics 27
Boxplot example
5-point: 5, 21, 27.5, 42, 52
IQR = 42 – 21 = 21
FU = 42 + (1.5)(21) = 73.5
 No outside above (outside)
Upper inside value = 52
FL = 21 – (1.5)(21) = –10.5
 No values below (outside)
 Lower inside value = 5
05 11 21 24 27 28 30 42 50 52
60
50
40
30
20
10
0
Upper inside = 52
Q3 = 42
Q1 = 21
Lower inside = 5
Q2 = 27.5
12/24/2022 Summary Statistics 28
Boxplot example 2
5-point: 3, 22, 25.5, 29, 51
IQR = 29 – 22 = 7
FU = 29 + (1.5)(7) = 39.5
 One outside (51)
 Inside value = 31
FL = 22 – (1.5)(7) = 11.5
 One outside (3)
 Inside value = 21
3 21 22 24 25 26 28 29 31 51
60
50
40
30
20
10
0
Outside value (51)
Outside value (3)
Inside value (21)
Upper hinge (29)
Lower hinge (22)
Median (25.5)
Inside value (31)
12/24/2022 Summary Statistics 29
Boxplot example 3 (metabolic.sav)
5-point: 1362, 1449.5, 1614, 1729,
1867 (slide 30)
IQR = 1729 – 1449.5 = 279.5
FU = 1729 + (1.5)(279.5) =
2148.25
 None outside
 Upper inside = 1867
FL = 1449.5 – (1.5)(279.5) =
1030.25
 None outside
 Lower inside = 1362
1362 1439 1460 1614 1666 1792 1867
7
N =
Data source: Moore,
2000
1900
1800
1700
1600
1500
1400
1300
12/24/2022 Summary Statistics 30
Interpretation of boxplots
Location
 Position of median
 Position of box
Spread
 Hinge-spread (box length) = IQR
 Whisker-to-whisker spread (range or range minus
the outside values)
Shape
 Symmetry of box
 Size of whiskers
 Outside values (potential outliers)
12/24/2022 Summary Statistics 31
Side-by-side boxplots
Boxplots are especially useful for comparing groups:

More Related Content

Similar to sumstats.ppt

Measures of Variability.pptx
Measures of Variability.pptxMeasures of Variability.pptx
Measures of Variability.pptx
NehaMishra52555
 
Mean, median, and mode ug
Mean, median, and mode ugMean, median, and mode ug
Mean, median, and mode ugAbhishekDas15
 
Measures of dispersion range qd md
Measures of dispersion range qd mdMeasures of dispersion range qd md
Measures of dispersion range qd md
RekhaChoudhary24
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
Brian Miles
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8
Yasser Ahmed
 
Summary statistics (1)
Summary statistics (1)Summary statistics (1)
Summary statistics (1)Godwin Okley
 
Measures of central tendency - STATISTICS
Measures of central tendency - STATISTICSMeasures of central tendency - STATISTICS
Measures of central tendency - STATISTICS
indianeducation
 
assignment on business environment of abc
assignment on business environment of abcassignment on business environment of abc
assignment on business environment of abc
MdJahangirAlam877557
 
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
MaxineBoyd
 
Lecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfLecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdf
MDNomanCh
 
Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersionsCapricorn
 
Rm class-2 part-1
Rm class-2 part-1Rm class-2 part-1
Rm class-2 part-1
anupta jana
 
Math unit18 measure of variation
Math unit18 measure of variationMath unit18 measure of variation
Math unit18 measure of variation
eLearningJa
 
Medidas de tendencia central
Medidas de tendencia centralMedidas de tendencia central
Medidas de tendencia central
RONALD ANDRES RICARDO TOMALA
 
Measures of Central Tendency.pptx
Measures of Central Tendency.pptxMeasures of Central Tendency.pptx
Measures of Central Tendency.pptx
Melba Shaya Sweety
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
Shani729
 
8490370.ppt
8490370.ppt8490370.ppt
8490370.ppt
ssuserfa15e21
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Learnbay Datascience
 

Similar to sumstats.ppt (20)

S7 pn
S7 pnS7 pn
S7 pn
 
Measures of Variability.pptx
Measures of Variability.pptxMeasures of Variability.pptx
Measures of Variability.pptx
 
Mean, median, and mode ug
Mean, median, and mode ugMean, median, and mode ug
Mean, median, and mode ug
 
Measures of dispersion range qd md
Measures of dispersion range qd mdMeasures of dispersion range qd md
Measures of dispersion range qd md
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8
 
Summary statistics (1)
Summary statistics (1)Summary statistics (1)
Summary statistics (1)
 
Measures of central tendency - STATISTICS
Measures of central tendency - STATISTICSMeasures of central tendency - STATISTICS
Measures of central tendency - STATISTICS
 
assignment on business environment of abc
assignment on business environment of abcassignment on business environment of abc
assignment on business environment of abc
 
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
Introduction to Probability and Statistics 13th Edition Mendenhall Solutions ...
 
Lecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfLecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdf
 
Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersions
 
Rm class-2 part-1
Rm class-2 part-1Rm class-2 part-1
Rm class-2 part-1
 
Math unit18 measure of variation
Math unit18 measure of variationMath unit18 measure of variation
Math unit18 measure of variation
 
Medidas de tendencia central
Medidas de tendencia centralMedidas de tendencia central
Medidas de tendencia central
 
G7-quantitative
G7-quantitativeG7-quantitative
G7-quantitative
 
Measures of Central Tendency.pptx
Measures of Central Tendency.pptxMeasures of Central Tendency.pptx
Measures of Central Tendency.pptx
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
8490370.ppt
8490370.ppt8490370.ppt
8490370.ppt
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 

More from Gurumurthy B R

basic_rules.ppt
basic_rules.pptbasic_rules.ppt
basic_rules.ppt
Gurumurthy B R
 
3D_Printing.ppt
3D_Printing.ppt3D_Printing.ppt
3D_Printing.ppt
Gurumurthy B R
 
Gas Chromatography.ppt
Gas Chromatography.pptGas Chromatography.ppt
Gas Chromatography.ppt
Gurumurthy B R
 
damop_2005_gif.ppt
damop_2005_gif.pptdamop_2005_gif.ppt
damop_2005_gif.ppt
Gurumurthy B R
 
lecture3.pptx
lecture3.pptxlecture3.pptx
lecture3.pptx
Gurumurthy B R
 
vortrag070704.ppt
vortrag070704.pptvortrag070704.ppt
vortrag070704.ppt
Gurumurthy B R
 
verbrevs3.ppt
verbrevs3.pptverbrevs3.ppt
verbrevs3.ppt
Gurumurthy B R
 
American Revolutionppt.ppt
American Revolutionppt.pptAmerican Revolutionppt.ppt
American Revolutionppt.ppt
Gurumurthy B R
 
trs-7.ppt
trs-7.ppttrs-7.ppt
trs-7.ppt
Gurumurthy B R
 
ZP394sample_ImmigrationPP.ppt
ZP394sample_ImmigrationPP.pptZP394sample_ImmigrationPP.ppt
ZP394sample_ImmigrationPP.ppt
Gurumurthy B R
 
Immigrants in America.ppt
Immigrants in America.pptImmigrants in America.ppt
Immigrants in America.ppt
Gurumurthy B R
 
Lesson 3 American History - 1800 through the Civil War(1).pptx
Lesson 3 American History - 1800 through the Civil War(1).pptxLesson 3 American History - 1800 through the Civil War(1).pptx
Lesson 3 American History - 1800 through the Civil War(1).pptx
Gurumurthy B R
 
سادسةHistory_of_USA.ppt
سادسةHistory_of_USA.pptسادسةHistory_of_USA.ppt
سادسةHistory_of_USA.ppt
Gurumurthy B R
 
SJSUIntroSocTischlerChap8PPT.ppt
SJSUIntroSocTischlerChap8PPT.pptSJSUIntroSocTischlerChap8PPT.ppt
SJSUIntroSocTischlerChap8PPT.ppt
Gurumurthy B R
 
23634.ppt
23634.ppt23634.ppt
23634.ppt
Gurumurthy B R
 
nash_session1_e.ppt
nash_session1_e.pptnash_session1_e.ppt
nash_session1_e.ppt
Gurumurthy B R
 
Chapter 9.ppt
Chapter 9.pptChapter 9.ppt
Chapter 9.ppt
Gurumurthy B R
 
GeographyReview29_3Poverty.pptx
GeographyReview29_3Poverty.pptxGeographyReview29_3Poverty.pptx
GeographyReview29_3Poverty.pptx
Gurumurthy B R
 
CPRReportLaunch-Presentation-Sweden-010914-2.pptx
CPRReportLaunch-Presentation-Sweden-010914-2.pptxCPRReportLaunch-Presentation-Sweden-010914-2.pptx
CPRReportLaunch-Presentation-Sweden-010914-2.pptx
Gurumurthy B R
 
03-12-13Child Poverty.ppt
03-12-13Child Poverty.ppt03-12-13Child Poverty.ppt
03-12-13Child Poverty.ppt
Gurumurthy B R
 

More from Gurumurthy B R (20)

basic_rules.ppt
basic_rules.pptbasic_rules.ppt
basic_rules.ppt
 
3D_Printing.ppt
3D_Printing.ppt3D_Printing.ppt
3D_Printing.ppt
 
Gas Chromatography.ppt
Gas Chromatography.pptGas Chromatography.ppt
Gas Chromatography.ppt
 
damop_2005_gif.ppt
damop_2005_gif.pptdamop_2005_gif.ppt
damop_2005_gif.ppt
 
lecture3.pptx
lecture3.pptxlecture3.pptx
lecture3.pptx
 
vortrag070704.ppt
vortrag070704.pptvortrag070704.ppt
vortrag070704.ppt
 
verbrevs3.ppt
verbrevs3.pptverbrevs3.ppt
verbrevs3.ppt
 
American Revolutionppt.ppt
American Revolutionppt.pptAmerican Revolutionppt.ppt
American Revolutionppt.ppt
 
trs-7.ppt
trs-7.ppttrs-7.ppt
trs-7.ppt
 
ZP394sample_ImmigrationPP.ppt
ZP394sample_ImmigrationPP.pptZP394sample_ImmigrationPP.ppt
ZP394sample_ImmigrationPP.ppt
 
Immigrants in America.ppt
Immigrants in America.pptImmigrants in America.ppt
Immigrants in America.ppt
 
Lesson 3 American History - 1800 through the Civil War(1).pptx
Lesson 3 American History - 1800 through the Civil War(1).pptxLesson 3 American History - 1800 through the Civil War(1).pptx
Lesson 3 American History - 1800 through the Civil War(1).pptx
 
سادسةHistory_of_USA.ppt
سادسةHistory_of_USA.pptسادسةHistory_of_USA.ppt
سادسةHistory_of_USA.ppt
 
SJSUIntroSocTischlerChap8PPT.ppt
SJSUIntroSocTischlerChap8PPT.pptSJSUIntroSocTischlerChap8PPT.ppt
SJSUIntroSocTischlerChap8PPT.ppt
 
23634.ppt
23634.ppt23634.ppt
23634.ppt
 
nash_session1_e.ppt
nash_session1_e.pptnash_session1_e.ppt
nash_session1_e.ppt
 
Chapter 9.ppt
Chapter 9.pptChapter 9.ppt
Chapter 9.ppt
 
GeographyReview29_3Poverty.pptx
GeographyReview29_3Poverty.pptxGeographyReview29_3Poverty.pptx
GeographyReview29_3Poverty.pptx
 
CPRReportLaunch-Presentation-Sweden-010914-2.pptx
CPRReportLaunch-Presentation-Sweden-010914-2.pptxCPRReportLaunch-Presentation-Sweden-010914-2.pptx
CPRReportLaunch-Presentation-Sweden-010914-2.pptx
 
03-12-13Child Poverty.ppt
03-12-13Child Poverty.ppt03-12-13Child Poverty.ppt
03-12-13Child Poverty.ppt
 

Recently uploaded

6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 

Recently uploaded (20)

6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 

sumstats.ppt

  • 1. 12/24/2022 Summary Statistics 1 Summary Statistics Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread.
  • 2. 12/24/2022 Summary Statistics 2 Main Summary Statistics by Type Central location  Mean  Median  Mode Spread  Variance and standard deviation  Quartiles and Inter Quartile Range (IQR) Shape  Statistical measures of spread (e.g., skewness and kurtosis) are available but are seldom used in practice (not covered)
  • 3. 12/24/2022 Summary Statistics 3 Notation n  sample size X  variable xi  value of individual i   sum all values (capital sigma) Illustrative example (sample.sav), data: 21 42 5 11 30 50 28 27 24 52  n = 10  X = age  x1= 21, x2= 42, …, x10= 52  x = 21 + 42 + … + 52 = 290
  • 4. 12/24/2022 Summary Statistics 4 Sample Mean     i i x n n x x 1 0 . 29 ) 290 ( 10 1 1     i x n x Illustrative example: n = 10 (data & intermediate calculations on prior slide)
  • 5. 12/24/2022 Summary Statistics 5 Population Mean Same operation as sample mean, but based on entire population (N = population size) Not available in practice, but important conceptually     i i x N N x 1 
  • 6. 12/24/2022 Summary Statistics 6 Interpretation of xbar Sample mean used to predict  an observation drawn at random from a sample  an observation drawn at random from the population  the population mean Gravitational center (balance point) 0 10 20 30 40 50 60 Mean = 29
  • 7. 12/24/2022 Summary Statistics 7 Median – a different kind of average “Middle value” Covered last week  Order data  Depth of median is (n+1) / 2  When n is odd  middle value  When n is even  average two middle values Illustrative example, n = 10  median has depth (10+1) / 2 = 5.5 05 11 21 24 27 28 30 42 50 52  median = average of 27 and 28 = 27.5
  • 8. 12/24/2022 Summary Statistics 8 Median is “robust” Robust  resistant to skews and outliers This data set has a mean (xbar) of 1600: 1362 1439 1460 1614 1666 1792 1867 This data set has an outlier and a mean of 2743: 1362 1439 1460 1614 1666 1792 9867 Outlier The median is 1614 in both instances. The median was not influenced by the outlier.
  • 9. 12/24/2022 Summary Statistics 9 Mode Mode  value with greatest frequency e.g., {4, 7, 7, 7, 8, 8, 9} has mode = 7 Used only in very large data sets
  • 10. 12/24/2022 Summary Statistics 10 Mean, Median, Mode (A) Symmetrical data: mean = median (B) positive skew: mean > median [mean gets “pulled” by tail] (C) negative skew: mean < median Mean Mode Median (A)Symmetrica l Mode Median Mean Mean Median Mode (B)PositiveSkew (B)NegativeS kew
  • 11. 12/24/2022 Summary Statistics 11 Spread = Variability Variability  amount values spread above and below the average Measures of spread  Range and inter-quartile range  Standard deviation and variance (this week)
  • 12. 12/24/2022 Summary Statistics 12 Range = max – min The range is rarely used in practice b/c it tends to underestimate population range and is not robust
  • 13. 12/24/2022 Summary Statistics 13 Standard deviation x xi  Deviation =  2    x x SS i Sum of squared deviations = 1 2   n SS s Sample variance = 2 s s  Sample standard deviation = Most common descriptive measure of spread
  • 14. 12/24/2022 Summary Statistics 14 Standard deviation (formula)     2 ) ( 1 1 x x n s i Sample standard deviation s is the unbiased estimator of population standard deviation . Population standard deviation  is rarely known in practice.
  • 15. 12/24/2022 Summary Statistics 15 New data set (“Metabolic Rates”) This example is not in your lecture notes Metabolic rates (cal/day), n = 7 1792 1666 1362 1614 1460 1867 1439 1600 7 200 , 11 7 1439 1867 1460 1614 1362 1666 1792          x
  • 16. 12/24/2022 Summary Statistics 16 Metabolic rates showing mean (*) and deviations of first two observations
  • 17. 12/24/2022 Summary Statistics 17 Standard Deviation Calculation metabolic.sav – introduced slide 15 Observations Deviations Squared deviations 1792 1792 1600 = 192 (192)2 = 36,864 1666 1666 1600 = 66 (66)2 = 4,356 1362 1362 1600 = -238 (-238)2 = 56,644 1614 1614 1600 = 14 (14)2 = 196 1460 1460 1600 = -140 (-140)2 = 19,600 1867 1867 1600 = 267 (267)2 = 71,289 1439 1439 1600 = -161 (-161)2 = 25,921 SUMS  0* SS = 214,870 x xi  i x  2 x xi  * Sum of deviations will always equal zero
  • 18. 12/24/2022 Summary Statistics 18 Standard Deviation Metabolic data (cont.) 2 2 calories 67 . 811 , 35 1 7 870 , 214 1      n SS s calories 24 . 189 67 . 811 , 35 2    s s Variance (s2) Standard deviation (s)
  • 19. 12/24/2022 Summary Statistics 19 General rule for rounding means and standard deviations Report mean to one additional decimals above that of the data To achieve accuracy, intermediate calculations should carry still an additional decimals Illustrative example  Suppose data is recorded with one decimal accuracy (i.e., xx.x)  Report mean with two decimal accuracy (i.e., xx.xx)  Carry all intermediate calculations with at least three decimal accuracy (i.e., xx.xxx) Even more important: Always use common sense and judgment.
  • 20. 12/24/2022 Summary Statistics 20 TI-30XIIS – about $12 In practice, we often use software or a calculator to check our standard deviation
  • 21. 12/24/2022 Summary Statistics 21 Interpretation of Standard Deviation Larger standard deviation  greater variability  s1 = 15 and s2 = 10  group 1 has more variability 68-95-99.7 rule – Normal data only  68% of data with 1 SD of mean, 95% within 2 SD from mean, and 99.7% within 3 SD of mean  e.g., if mean = 30 and SD = 10, then 95% of individuals are in the range 30 ± (2)(10) = 30 ± 20 = (10 to 50) Chebychev’s rule – All data  at least 75% data within 2 SD of mean  e.g., mean = 30 and SD = 10, then at least 75% of individuals in range 30 ± (2)(10) = (10 to 50)
  • 22. 12/24/2022 Summary Statistics 22 Quartiles and IQR Quartiles divide the ordered data into four equally-sized groups Q0 = minimum Q1 = 25th %ile Q2 = 50th %ile (Median) Q3 = 75th %ile Q4 = maximum
  • 23. 12/24/2022 Summary Statistics 23 Rule for quartiles Find the median  Q2 Middle of lower half of data set  Q1 Middle of upper half of the data  Q3 Bottom half | Top half 05 11 21 24 27 | 28 30 42 50 52    Q1 Q2 Q3 IQR = Q3 – Q1 = 42 – 21 = 21 gives spread of middle 50% of the data
  • 24. 12/24/2022 Summary Statistics 24 5-Point Summary (sample.sav) Q0 = 5 (minimum) Q1 = 21 (lower hinge) Q2 = 27.5 (median) Q3 = 42 (upper hinge) Q4 = 52 (maximum) Best descriptive statistics for skewed data
  • 25. 12/24/2022 Summary Statistics 25 Illustrative example (metabolic.sav) 1362 1439 1460 1614 1666 1792 1867  median Bottom half : 1362 1439 1460 1614  Q1 = (1439 + 1460) / 2 = 1449.5 Top half: 1614 1666 1792 1867  Q3 = (1666 + 1792) / 2 = 1729 5-point summary: 1362, 1449.5, 1614, 1729, 1867
  • 26. 12/24/2022 Summary Statistics 26 Box-and-whiskers plot (boxplot) 5 point summary + “outside values” Procedure  Determine 5-point summary  Draw box from Q1 to Q3  Draw line @ Q2  Calculate IQR = Q3 – Q1  Calculate fences  FLower = Q1 – 1.5(IQR)  FUpper = Q3 + 1.5(IQR)  Determine if any outside values? If so, plot separately  Determine inside values and draw whiskers from box to inside values
  • 27. 12/24/2022 Summary Statistics 27 Boxplot example 5-point: 5, 21, 27.5, 42, 52 IQR = 42 – 21 = 21 FU = 42 + (1.5)(21) = 73.5  No outside above (outside) Upper inside value = 52 FL = 21 – (1.5)(21) = –10.5  No values below (outside)  Lower inside value = 5 05 11 21 24 27 28 30 42 50 52 60 50 40 30 20 10 0 Upper inside = 52 Q3 = 42 Q1 = 21 Lower inside = 5 Q2 = 27.5
  • 28. 12/24/2022 Summary Statistics 28 Boxplot example 2 5-point: 3, 22, 25.5, 29, 51 IQR = 29 – 22 = 7 FU = 29 + (1.5)(7) = 39.5  One outside (51)  Inside value = 31 FL = 22 – (1.5)(7) = 11.5  One outside (3)  Inside value = 21 3 21 22 24 25 26 28 29 31 51 60 50 40 30 20 10 0 Outside value (51) Outside value (3) Inside value (21) Upper hinge (29) Lower hinge (22) Median (25.5) Inside value (31)
  • 29. 12/24/2022 Summary Statistics 29 Boxplot example 3 (metabolic.sav) 5-point: 1362, 1449.5, 1614, 1729, 1867 (slide 30) IQR = 1729 – 1449.5 = 279.5 FU = 1729 + (1.5)(279.5) = 2148.25  None outside  Upper inside = 1867 FL = 1449.5 – (1.5)(279.5) = 1030.25  None outside  Lower inside = 1362 1362 1439 1460 1614 1666 1792 1867 7 N = Data source: Moore, 2000 1900 1800 1700 1600 1500 1400 1300
  • 30. 12/24/2022 Summary Statistics 30 Interpretation of boxplots Location  Position of median  Position of box Spread  Hinge-spread (box length) = IQR  Whisker-to-whisker spread (range or range minus the outside values) Shape  Symmetry of box  Size of whiskers  Outside values (potential outliers)
  • 31. 12/24/2022 Summary Statistics 31 Side-by-side boxplots Boxplots are especially useful for comparing groups:

Editor's Notes

  1. 12/24/2022
  2. 12/24/2022
  3. 12/24/2022
  4. 12/24/2022
  5. 12/24/2022
  6. 12/24/2022
  7. 12/24/2022
  8. 12/24/2022
  9. 12/24/2022
  10. 12/24/2022
  11. 12/24/2022
  12. 12/24/2022