SlideShare a Scribd company logo
STAT 3615: BIOLOGICAL STATISTICS Hamdy F. F. Mahmoud, PhD
Collegiate Assistant Professor
Statistics Department @ VTChapter 2: Describing distributions with numbers – Part II
IN THIS CHAPTER, WE COVER
2
 Measuring the center of data by
 Mean
 Median
 5-Number summary and Box plot
 Interquartile range and detecting outliers
 Modified Boxplot in case outliers are existed.
 Measuring spread of data by
 Interquartile range (IQR)
 Standard deviation
 Choosing a good measure for center and for spread.
❑ Interquartile Range (IQR)
IQR is another measure of spread or variability in a data set. The
interquartile range IQR is the distance between the first and the third
quartile, IQR=Q3 - Q1
❑ Detecting outliers:
The 1.5×IQR rule for outliers
➢ Lower limit = Q1 - 1.5×IQR
➢ Upper limit = Q3 + 1.5×IQR
Call an observation an outlier if it is greater than the upper limit or
less than the lower limit.
3
INTERQUARTILE RANGE AND DETECTING OUTLIERS
IQR and outliers
4
INTERQUARTILE RANGE AND DETECTING OUTLIERS
IQR and outliers
Example 7: Again spider silk data,
164.0 478.7 251.3 351.7 173.0 448.9 300.6
362.0 272.4 740.2 329.0 327.2 270.5 332.1
288.8 176.1 282.2 236.1 358.2 270.5 290.7
 Calculate the IQR.
 Check if the data have outliers or not.
 Draw the modified boxplot.
5
INTERQUARTILE RANGE AND DETECTING OUTLIERS
IQR and outliers
 ANSWER
6
160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760 800
5-number summary
Min=164
Q1 =260.9
Q2 =290.7
Q3 =354.95
Mix =740.2
➢ IQR
➢ Check for outliers
➢ Modified boxplot
Side by side boxplot – more than one boxplot in the same graph
Below is a side-by-side boxplot that shows the average
minutes per day spend in activity (standing and walking )
and in lying down for Lean subjects and Obese subjects.
Conclusion: if you want to be lean, be active!
7
Group
Stand_WalkLie
2121
700
600
500
400
300
200
Data
Boxplot of Lie and Stand_Walk 1 is Lean subjects
2 is Obese subjects• Look at mean
and median
for both
There is a big
difference
between lean and
obese people in
terms of (stand
and walk) time.
There is no much
difference
between lean and
obese people in
terms of lying
down time.
SOURCES OF OUTLIERS, AND HOW DO WE DEAL WITH OUTLIERS?
 Human error in recording information
Example 1: in an online survey, undergraduate students
enrolled in a biostatistics course were asked to record their
heights in inches. Out of the 251 numerical values
submitted, two wild observations appeared as 5.3 and 6.
These are obvious errors
Solution: try to correct them.
SOURCES OF OUTLIERS, AND HOW DO WE DEAL WITH OUTLIERS?
 Example 2: Perhaps the most famous example of
an outlier caused by data-recording error lies in the
story of Popeye the Sailor. Created in 1929, Popeye is
a friendly cartoon character who attains immediate
superman strength whenever he eats Spanish.
In 1870, a scientific publication reported that
Spanish has iron content more than any other green
leafy vegetable. The iron content of spinach was
miscalculated by a German chemist when he misplaced a
decimal point. While there are just 3.5 milligrams of iron in a
100g serving of spinach, the accepted number became 35
milligrams thanks to his mistake.
Sorry
Popeye,
spinach
DOESN'T
make your
muscles big!
SOURCES OF OUTLIERS AND HOW DO WE DEAL WITH OUTLIERS?(CONT.)
 Human error in experimentation or data collection
Example: a researcher records temperature everyday at
12:00pm and one day he recorded it at the evening. In this
case if someone else analyze this data he will find this value
is different from the others.
Solution: remove, but report it.
 Unexplained but apparently legitimate wild observations
Example: most studies in the life sciences are collected by
collecting data about a small sample. Because of this, it can be
difficult to determine whether a suspected outlier in a sample
truly is a wild or just because we study a small sample.
Solution: do the analysis with and without the outliers.
Measuring Data Spread or Variability
MEASURING SPREAD BY STANDARD DEVIATION
The standard deviation and its close relative, the variance,
measure spread by looking at how far the observations are from
their mean. The variance s2 of a set of observations
is
Or more compactly,
The standard deviation s is the square root of the variance s2,
12
x1, x2,....., xn
s2
=
(x1 -x)2
+(x2 -x)2
+....+(xn -x)2
n-1
s2
=
1
n-1
(xi - x)2
å
s =
1
n-1
(xi - x)2
å
A person’s metabolic rate is the rate at which the body
consumes energy. Metabolic rate is important in studies of
weight gain, dieting, and exercise. Here are the metabolic of 7
men who took part in a study of dieting. The units are
kilocalories (Cal) for 24-hour period.
1792 1666 1362 1614 1460 1867 1439
Because men are different in terms of metabolic, we need to
measure how much they are different by calculating variance and
standard deviation.
13
Measuring spread in a data set
Example 8: Metabolic Rate
14
1792
1666
1362
1614
1460
1867
1439
11200
s2
=
1
n-1
(xi - x)2
å
s =
1
n-1
(xi - x)2
å
Measuring spread in a data set
1792-1600= 192
1666-1600=66
-238
14
-140
267
-161
0
1922=36864
4356
56644
196
19600
71289
25921
214870
In the variance formula, n-1 (number of observations -1) is
used. That is because number of degree of freedom is n-1!
s measures spread about the mean and should be used only
when the mean is chosen as the measure of center.
s is always greater than or equal zero. s=0 when there is no
spread.
s has the same unit of measurement as the original
observations.
Like the mean, s is not resistant. A few outliers can make s
very large, but less than variance.
15
Measuring spread in a data set
NOTES ON STANDARD DEVIATION
MEASURING SPREAD BY INTERQUARTILE RANGE (IQR)
When we have outliers in our data or the distribution is
skewed, the variance and standard deviation are not resistant
(robust) measures for measuring spread. In this case, we
need a resistant measure which is IQR.
IQR = Q3 - Q1
16
Measuring spread in a data set
Example 9: Calculate IQR for Metabolic Rate data example
1792 1666 1362 1614 1460 1867 1439
18
In practice
CHOOSING MEASURES FOR CENTER AND SPREAD.
So far, we studied mean and median as measures for
center and variance, standard deviation, and IQR as
measures for spread.
If our data set is symmetric or fairly symmetric, we use
mean and standard deviation. If our data is highly
skewed (right or left) or having outliers, we use median
and IQR to measure center and spread of the data.
The End of Chapter 2

More Related Content

Similar to Chapter 02 describing distributions with numbers part II

Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
Vanmala Buchke
 
Kwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and modeKwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and mode
Aarti Vijaykumar
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
Southern Range, Berhampur, Odisha
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
Erik D. Davenport
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Sherri Gunder
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Neeraj Bhandari
 
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdfUnit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
AravindS199
 
Statistics
StatisticsStatistics
Statistics
Bob Smullen
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)
HennaAnsari
 
Lesson 8 measure of variation
Lesson 8  measure of variationLesson 8  measure of variation
Lesson 8 measure of variation
Maris Ganace
 
Advanced Statistics And Probability (MSC 615
Advanced Statistics And Probability (MSC 615Advanced Statistics And Probability (MSC 615
Advanced Statistics And Probability (MSC 615
Maria Perkins
 
Summary statistics
Summary statisticsSummary statistics
Summary statistics
Rupak Roy
 
10391737.ppt
10391737.ppt10391737.ppt
10391737.ppt
NelsonNelson56
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
Rai University
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
Perla Pelicano Corpez
 
variability final Range std deviation hardest topic so ready carefully
variability final Range std deviation hardest topic so ready carefullyvariability final Range std deviation hardest topic so ready carefully
variability final Range std deviation hardest topic so ready carefully
Nishant Taralkar
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
AhmadYarSukhera
 
Lecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptxLecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptx
NabeelAli89
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
NeurologyKota
 
3Data summarization.pptx
3Data summarization.pptx3Data summarization.pptx
3Data summarization.pptx
AmanuelMerga
 

Similar to Chapter 02 describing distributions with numbers part II (20)

Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
Kwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and modeKwoledge of calculation of mean,median and mode
Kwoledge of calculation of mean,median and mode
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
 
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdfUnit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
 
Statistics
StatisticsStatistics
Statistics
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)
 
Lesson 8 measure of variation
Lesson 8  measure of variationLesson 8  measure of variation
Lesson 8 measure of variation
 
Advanced Statistics And Probability (MSC 615
Advanced Statistics And Probability (MSC 615Advanced Statistics And Probability (MSC 615
Advanced Statistics And Probability (MSC 615
 
Summary statistics
Summary statisticsSummary statistics
Summary statistics
 
10391737.ppt
10391737.ppt10391737.ppt
10391737.ppt
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
 
variability final Range std deviation hardest topic so ready carefully
variability final Range std deviation hardest topic so ready carefullyvariability final Range std deviation hardest topic so ready carefully
variability final Range std deviation hardest topic so ready carefully
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
 
Lecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptxLecture. Introduction to Statistics (Measures of Dispersion).pptx
Lecture. Introduction to Statistics (Measures of Dispersion).pptx
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
 
3Data summarization.pptx
3Data summarization.pptx3Data summarization.pptx
3Data summarization.pptx
 

Recently uploaded

MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 

Recently uploaded (20)

MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 

Chapter 02 describing distributions with numbers part II

  • 1. STAT 3615: BIOLOGICAL STATISTICS Hamdy F. F. Mahmoud, PhD Collegiate Assistant Professor Statistics Department @ VTChapter 2: Describing distributions with numbers – Part II
  • 2. IN THIS CHAPTER, WE COVER 2  Measuring the center of data by  Mean  Median  5-Number summary and Box plot  Interquartile range and detecting outliers  Modified Boxplot in case outliers are existed.  Measuring spread of data by  Interquartile range (IQR)  Standard deviation  Choosing a good measure for center and for spread.
  • 3. ❑ Interquartile Range (IQR) IQR is another measure of spread or variability in a data set. The interquartile range IQR is the distance between the first and the third quartile, IQR=Q3 - Q1 ❑ Detecting outliers: The 1.5×IQR rule for outliers ➢ Lower limit = Q1 - 1.5×IQR ➢ Upper limit = Q3 + 1.5×IQR Call an observation an outlier if it is greater than the upper limit or less than the lower limit. 3 INTERQUARTILE RANGE AND DETECTING OUTLIERS IQR and outliers
  • 4. 4 INTERQUARTILE RANGE AND DETECTING OUTLIERS IQR and outliers
  • 5. Example 7: Again spider silk data, 164.0 478.7 251.3 351.7 173.0 448.9 300.6 362.0 272.4 740.2 329.0 327.2 270.5 332.1 288.8 176.1 282.2 236.1 358.2 270.5 290.7  Calculate the IQR.  Check if the data have outliers or not.  Draw the modified boxplot. 5 INTERQUARTILE RANGE AND DETECTING OUTLIERS IQR and outliers
  • 6.  ANSWER 6 160 200 240 280 320 360 400 440 480 520 560 600 640 680 720 760 800 5-number summary Min=164 Q1 =260.9 Q2 =290.7 Q3 =354.95 Mix =740.2 ➢ IQR ➢ Check for outliers ➢ Modified boxplot
  • 7. Side by side boxplot – more than one boxplot in the same graph Below is a side-by-side boxplot that shows the average minutes per day spend in activity (standing and walking ) and in lying down for Lean subjects and Obese subjects. Conclusion: if you want to be lean, be active! 7 Group Stand_WalkLie 2121 700 600 500 400 300 200 Data Boxplot of Lie and Stand_Walk 1 is Lean subjects 2 is Obese subjects• Look at mean and median for both There is a big difference between lean and obese people in terms of (stand and walk) time. There is no much difference between lean and obese people in terms of lying down time.
  • 8. SOURCES OF OUTLIERS, AND HOW DO WE DEAL WITH OUTLIERS?  Human error in recording information Example 1: in an online survey, undergraduate students enrolled in a biostatistics course were asked to record their heights in inches. Out of the 251 numerical values submitted, two wild observations appeared as 5.3 and 6. These are obvious errors Solution: try to correct them.
  • 9. SOURCES OF OUTLIERS, AND HOW DO WE DEAL WITH OUTLIERS?  Example 2: Perhaps the most famous example of an outlier caused by data-recording error lies in the story of Popeye the Sailor. Created in 1929, Popeye is a friendly cartoon character who attains immediate superman strength whenever he eats Spanish. In 1870, a scientific publication reported that Spanish has iron content more than any other green leafy vegetable. The iron content of spinach was miscalculated by a German chemist when he misplaced a decimal point. While there are just 3.5 milligrams of iron in a 100g serving of spinach, the accepted number became 35 milligrams thanks to his mistake. Sorry Popeye, spinach DOESN'T make your muscles big!
  • 10. SOURCES OF OUTLIERS AND HOW DO WE DEAL WITH OUTLIERS?(CONT.)  Human error in experimentation or data collection Example: a researcher records temperature everyday at 12:00pm and one day he recorded it at the evening. In this case if someone else analyze this data he will find this value is different from the others. Solution: remove, but report it.  Unexplained but apparently legitimate wild observations Example: most studies in the life sciences are collected by collecting data about a small sample. Because of this, it can be difficult to determine whether a suspected outlier in a sample truly is a wild or just because we study a small sample. Solution: do the analysis with and without the outliers.
  • 11. Measuring Data Spread or Variability
  • 12. MEASURING SPREAD BY STANDARD DEVIATION The standard deviation and its close relative, the variance, measure spread by looking at how far the observations are from their mean. The variance s2 of a set of observations is Or more compactly, The standard deviation s is the square root of the variance s2, 12 x1, x2,....., xn s2 = (x1 -x)2 +(x2 -x)2 +....+(xn -x)2 n-1 s2 = 1 n-1 (xi - x)2 å s = 1 n-1 (xi - x)2 å
  • 13. A person’s metabolic rate is the rate at which the body consumes energy. Metabolic rate is important in studies of weight gain, dieting, and exercise. Here are the metabolic of 7 men who took part in a study of dieting. The units are kilocalories (Cal) for 24-hour period. 1792 1666 1362 1614 1460 1867 1439 Because men are different in terms of metabolic, we need to measure how much they are different by calculating variance and standard deviation. 13 Measuring spread in a data set Example 8: Metabolic Rate
  • 14. 14 1792 1666 1362 1614 1460 1867 1439 11200 s2 = 1 n-1 (xi - x)2 å s = 1 n-1 (xi - x)2 å Measuring spread in a data set 1792-1600= 192 1666-1600=66 -238 14 -140 267 -161 0 1922=36864 4356 56644 196 19600 71289 25921 214870
  • 15. In the variance formula, n-1 (number of observations -1) is used. That is because number of degree of freedom is n-1! s measures spread about the mean and should be used only when the mean is chosen as the measure of center. s is always greater than or equal zero. s=0 when there is no spread. s has the same unit of measurement as the original observations. Like the mean, s is not resistant. A few outliers can make s very large, but less than variance. 15 Measuring spread in a data set NOTES ON STANDARD DEVIATION
  • 16. MEASURING SPREAD BY INTERQUARTILE RANGE (IQR) When we have outliers in our data or the distribution is skewed, the variance and standard deviation are not resistant (robust) measures for measuring spread. In this case, we need a resistant measure which is IQR. IQR = Q3 - Q1 16 Measuring spread in a data set
  • 17. Example 9: Calculate IQR for Metabolic Rate data example 1792 1666 1362 1614 1460 1867 1439
  • 18. 18 In practice CHOOSING MEASURES FOR CENTER AND SPREAD. So far, we studied mean and median as measures for center and variance, standard deviation, and IQR as measures for spread. If our data set is symmetric or fairly symmetric, we use mean and standard deviation. If our data is highly skewed (right or left) or having outliers, we use median and IQR to measure center and spread of the data.
  • 19. The End of Chapter 2