SlideShare a Scribd company logo
1 of 53
1
Inferential Statistics
2
Measures of Central Tendency
 Mean
 Arithmetic Average
 Median
 “Middle” Value after
ranking data
 Not affected by “outliers”
 Mode
 Most Occurring Value
 Useful for non-numeric
data
n
X
X i


3
Example
Anthony’s Pizza, a Detroit based company, offers pizza
delivery to its customers. A driver for Anthony’s Pizza will
often make several deliveries on a single delivery run. A
sample of 5 delivery runs by a driver showed that the total
number of pizzas delivered on each run
2 2 5 9 12
What is the Average?
a) 2
b) 5
c) 6
4
Example – 5 Recent Home Sales
 $500,000
 $600,000
 $600,000
 $700,000
 $2,600,000
5
Positively Skewed Data Set
Mean > Median
6
Negatively Skewed Data Set
Mean < Median
7
Symmetric Data Set
Mean = Median
8
Measures of Variability
 Range
 Variance
 Standard Deviation
 Interquartile Range (percentiles)
9
Range
Range = Max(Xi) –Min(Xi) (high – low)
Example – Pizza Delivery
Max = 12 pizzas
Min = 2 pizzas
Range =12 – 2 =10 pizzas
10
Sample Variance
1
)
( 2
2




n
x
x
s i
1
/
)
( 2
2
2



 
n
n
x
x
s i
i
11
Sample Standard Deviation
1
)
( 2




n
x
x
s i
12
Variance and Standard Deviation
2
2
5
9
12
30
X
Xi   2
X
Xi 
i
X
13
Variance and Standard Deviation
2 -4 16
2 -4 16
5 -1 1
9 3 9
12 6 36
30 0 78
X
Xi   2
X
Xi 
i
X
42
.
4
5
.
19
5
.
19
4
78
2




s
s
14
Interpreting the Standard
Deviation
 Chebyshev’s Rule
 At least 100 x (1-(1/k)2)% of any data set must be
within k standard deviations of the mean.
 Empirical Rule (68-95-99 rule)
 Bell shaped data
 68% within 1 standard deviation of mean
 95% within 2 standard deviations of mean
 99.7% within 3 standard deviations of mean
15
Empirical Rule
Example
 An exam has a mean score of 70 and a
standard deviation of 10
 68% of scores are between 60 and 80
 95% of scores are between 50 and 90
 99.7% of scores are between 40 and 100
16
68% of Data
17
95% of Data
18
99.7%% of Data
19
20
Measures of Relative Standing
 Z-score
 Percentile
 Quartiles
 Box Plots
21
Z-score
 The number of Standard Deviations from
the Mean
 Z>0, Xi is greater than mean
 Z<0, Xi is less than mean
s
X
X
Z i 

22
Percentile Rank
Formula for ungrouped data
 The location is (n+1)p (interpolated or rounded)
 n= sample size
 p = percentile
23
Quartiles
 25th percentile is 1st quartile
 50th percentile is median
 75th percentile is 3rd quartile
 75th percentile – 25th percentile is called
the Interquartile Range which
represents the “middle 50%”
24
Daily Minutes upload/download on
the Internet - 30 students
102
71
103
105
109
124
104
116
97
99
108
112
85
107
105
86
118
122
67
99
103
87
87
78
101
82
95
100
125
92
25
Stem and Leaf Graph
6 7
7 18
8 25677
9 25799
10 01233455789
11 268
12 245
26
IQR Tme on Internet data
n+1=31
.25 x 31 = 7.75 location 8 = 87  1st Quartile
.75 x 31 = 23.25 location 23 = 108  3rd Quartile
Interquartile Range (IQR) =108 – 87 = 21
27
Box Plots
 A box plot is a graphical display, based on quartiles,
that helps to picture a set of data.
 Five pieces of data are needed to construct a box
plot:
 Minimum Value
 First Quartile
 Median
 Third Quartile
 Maximum Value.
4-26
28
Boxplot
29
Outliers
 An outlier is data point that is far
removed from the other entries in the
data set.
 Outliers could be
 Mistakes made in recording data
 Data that don’t belong in population
 True rare events
30
Outliers have a dramatic effect
on some statistics
 Example quarterly home sales for
10 realtors:
2 2 3 4 5 5 6 6 7 50
with outlier without outlier
Mean 9.00 4.44
Median 5.00 5.00
Std Dev 14.51 1.81
IQR 3.00 3.50
31
Using Box Plot to find outliers
 The “box” is the region between the 1st and 3rd quartiles.
 Possible outliers are more than 1.5 IQR’s from the box (inner fence)
 Probable outliers are more than 3 IQR’s from the box (outer fence)
 In the box plot below, the dotted lines represent the “fences” that are
1.5 and 3 IQR’s from the box. See how the data point 50 is well
outside the outer fence and therefore an almost certain outlier.
BoxPlot
0 10 20 30 40 50 60
# 1
32
Using Z-score to detect outliers
 Calculate the mean and standard deviation
without the suspected outlier.
 Calculate the Z-score of the suspected
outlier.
 If the Z-score is more than 3 or less than -3,
that data point is a probable outlier.
2
.
25
81
.
1
4
.
4
50



Z
33
Outliers – what to do
 Remove or not remove, there is no clear answer.
 For some populations, outliers don’t dramatically change the
overall statistical analysis. Example: the tallest person in the
world will not dramatically change the mean height of 10000
people.
 However, for some populations, a single outlier will have a
dramatic effect on statistical analysis (called “Black Swan” by
Nicholas Taleb) and inferential statistics may be invalid in
analyzing these populations. Example: the richest person in the
world will dramatically change the mean wealth of 10000
people.
34
Bivariate Data
 Ordered numeric pairs (X,Y)
 Both values are numeric
 Paired by a common characteristic
 Graph as Scatterplot
35
Example of Bivariate Data
 Housing Data
 X = Square Footage
 Y = Price
36
Example of Scatterplot
Housing Prices and Square Footage
0
20
40
60
80
100
120
140
160
180
200
10 15 20 25 30
Size
Price
37
Another Example
Housing Prices and Square Footage - San Jose Only
40
50
60
70
80
90
100
110
120
130
15 20 25 30
Size
Price
38
Correlation Analysis
 Correlation Analysis: A group of statistical
techniques used to measure the strength of the
relationship (correlation) between two variables.
 Scatter Diagram: A chart that portrays the
relationship between the two variables of
interest.
 Dependent Variable: The variable that is being
predicted or estimated. “Effect”
 Independent Variable: The variable that
provides the basis for estimation. It is the
predictor variable. “Cause?” (Maybe!)
12-3
39
The Coefficient of Correlation, r
 The Coefficient of Correlation (r) is a
measure of the strength of the
relationship between two variables.
 It requires interval or ratio-scaled data (variables).
 It can range from -1 to 1.
 Values of -1 or 1 indicate perfect and strong
correlation.
 Values close to 0 indicate weak correlation.
 Negative values indicate an inverse relationship
and positive values indicate a direct relationship.
12-4
40
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
0
X
Y
12-6
Perfect Positive Correlation
41
Perfect Negative Correlation
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
0
X
Y
12-5
42
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
0
X
Y
12-7
Zero Correlation
43
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
0
X
Y
12-8
Strong Positive Correlation
44
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
0
X
Y
12-8
Weak Negative Correlation
45
Causation
 Correlation does not necessarily imply
causation.
 There are 4 possibilities if X and Y are
correlated:
1. X causes Y
2. Y causes X
3. X and Y are caused by something else.
4. Confounding - The effect of X and Y are
hopelessly mixed up with other variables.
46
Causation - Examples
 City with more police per capita have
more crime per capita.
 As Ice cream sales go up, shark attacks
go up.
 People with a cold who take a cough
medicine feel better after some rest.
47
Formula for correlation coefficient r
SSY
SSX
SSXY
r


12-9
 
 
 
Y
X
XY
SSXY
Y
Y
SSY
X
X
SSX
n
n
n














1
2
1
2
2
1
2
48
Example
 X = Average Annual Rainfall (Inches)
 Y = Average Sale of Sunglasses/1000
 Make a Scatter Diagram
 Find the correlation coefficient
X 10 15 20 30 40
Y 40 35 25 25 15
49
Example continued
 Make a Scatter Diagram
 Find the correlation coefficient
50
Example continued
X Y X2
Y2
XY
10 40 100 1600 400
15 35 225 1225 525
20 25 400 625 500
30 25 900 625 750
40 15 1600 225 600
115 140 3225 4300 2775
• SSX = 3225 - 1152/5 = 580
• SSY = 4300 - 1402/5 = 380
• SSXY= 2775 - (115)(140)/5 = -445
51
Example continued
 Strong negative correlation
9479
.
0
330
580
445







r
SSY
SSX
SSXY
r
52
Minitab: Graphs>Scatterplot
Minitab - Correlation
53
Stat>
Basic Statistics>
Correlation

More Related Content

Similar to Module 1 Powerpoint 2.pptx

Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10Ruru Chowdhury
 
Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10westy67968
 
Lecture 2 Descriptive statistics.pptx
Lecture 2 Descriptive statistics.pptxLecture 2 Descriptive statistics.pptx
Lecture 2 Descriptive statistics.pptxABCraftsman
 
1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docxjackiewalcutt
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsBabasab Patil
 
Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)Ron_Eick
 
Question 1 Independent random samples taken on two university .docx
Question 1 Independent random samples taken on two university .docxQuestion 1 Independent random samples taken on two university .docx
Question 1 Independent random samples taken on two university .docxIRESH3
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statisticsmscartersmaths
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
raghu veera stats.ppt
raghu veera stats.pptraghu veera stats.ppt
raghu veera stats.pptDevarajuBn
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handoutfatima d
 

Similar to Module 1 Powerpoint 2.pptx (20)

Ch01_03.ppt
Ch01_03.pptCh01_03.ppt
Ch01_03.ppt
 
Chapter12
Chapter12Chapter12
Chapter12
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10
 
Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10Year 12 Maths A Textbook - Chapter 10
Year 12 Maths A Textbook - Chapter 10
 
Lecture 2 Descriptive statistics.pptx
Lecture 2 Descriptive statistics.pptxLecture 2 Descriptive statistics.pptx
Lecture 2 Descriptive statistics.pptx
 
Chisquared test.pptx
Chisquared test.pptxChisquared test.pptx
Chisquared test.pptx
 
1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec doms
 
Chapter04
Chapter04Chapter04
Chapter04
 
Chapter04
Chapter04Chapter04
Chapter04
 
Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)
 
Question 1 Independent random samples taken on two university .docx
Question 1 Independent random samples taken on two university .docxQuestion 1 Independent random samples taken on two university .docx
Question 1 Independent random samples taken on two university .docx
 
Normal Distribution.pptx
Normal Distribution.pptxNormal Distribution.pptx
Normal Distribution.pptx
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statistics
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
Frequency.pptx
Frequency.pptxFrequency.pptx
Frequency.pptx
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
raghu veera stats.ppt
raghu veera stats.pptraghu veera stats.ppt
raghu veera stats.ppt
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
 

More from ZyrenMisaki

More from ZyrenMisaki (8)

Biodiversity.pdf
Biodiversity.pdfBiodiversity.pdf
Biodiversity.pdf
 
Logic 1
Logic 1Logic 1
Logic 1
 
Valid Arguments
Valid ArgumentsValid Arguments
Valid Arguments
 
I Will Face My Giant.pptx
I Will Face My Giant.pptxI Will Face My Giant.pptx
I Will Face My Giant.pptx
 
Statistical Tools
Statistical ToolsStatistical Tools
Statistical Tools
 
The T-test
The T-testThe T-test
The T-test
 
The Information Age
The Information AgeThe Information Age
The Information Age
 
STS Chapter 2
STS Chapter 2STS Chapter 2
STS Chapter 2
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Module 1 Powerpoint 2.pptx

  • 2. 2 Measures of Central Tendency  Mean  Arithmetic Average  Median  “Middle” Value after ranking data  Not affected by “outliers”  Mode  Most Occurring Value  Useful for non-numeric data n X X i  
  • 3. 3 Example Anthony’s Pizza, a Detroit based company, offers pizza delivery to its customers. A driver for Anthony’s Pizza will often make several deliveries on a single delivery run. A sample of 5 delivery runs by a driver showed that the total number of pizzas delivered on each run 2 2 5 9 12 What is the Average? a) 2 b) 5 c) 6
  • 4. 4 Example – 5 Recent Home Sales  $500,000  $600,000  $600,000  $700,000  $2,600,000
  • 5. 5 Positively Skewed Data Set Mean > Median
  • 6. 6 Negatively Skewed Data Set Mean < Median
  • 8. 8 Measures of Variability  Range  Variance  Standard Deviation  Interquartile Range (percentiles)
  • 9. 9 Range Range = Max(Xi) –Min(Xi) (high – low) Example – Pizza Delivery Max = 12 pizzas Min = 2 pizzas Range =12 – 2 =10 pizzas
  • 10. 10 Sample Variance 1 ) ( 2 2     n x x s i 1 / ) ( 2 2 2      n n x x s i i
  • 11. 11 Sample Standard Deviation 1 ) ( 2     n x x s i
  • 12. 12 Variance and Standard Deviation 2 2 5 9 12 30 X Xi   2 X Xi  i X
  • 13. 13 Variance and Standard Deviation 2 -4 16 2 -4 16 5 -1 1 9 3 9 12 6 36 30 0 78 X Xi   2 X Xi  i X 42 . 4 5 . 19 5 . 19 4 78 2     s s
  • 14. 14 Interpreting the Standard Deviation  Chebyshev’s Rule  At least 100 x (1-(1/k)2)% of any data set must be within k standard deviations of the mean.  Empirical Rule (68-95-99 rule)  Bell shaped data  68% within 1 standard deviation of mean  95% within 2 standard deviations of mean  99.7% within 3 standard deviations of mean
  • 16. Example  An exam has a mean score of 70 and a standard deviation of 10  68% of scores are between 60 and 80  95% of scores are between 50 and 90  99.7% of scores are between 40 and 100 16
  • 20. 20 Measures of Relative Standing  Z-score  Percentile  Quartiles  Box Plots
  • 21. 21 Z-score  The number of Standard Deviations from the Mean  Z>0, Xi is greater than mean  Z<0, Xi is less than mean s X X Z i  
  • 22. 22 Percentile Rank Formula for ungrouped data  The location is (n+1)p (interpolated or rounded)  n= sample size  p = percentile
  • 23. 23 Quartiles  25th percentile is 1st quartile  50th percentile is median  75th percentile is 3rd quartile  75th percentile – 25th percentile is called the Interquartile Range which represents the “middle 50%”
  • 24. 24 Daily Minutes upload/download on the Internet - 30 students 102 71 103 105 109 124 104 116 97 99 108 112 85 107 105 86 118 122 67 99 103 87 87 78 101 82 95 100 125 92
  • 25. 25 Stem and Leaf Graph 6 7 7 18 8 25677 9 25799 10 01233455789 11 268 12 245
  • 26. 26 IQR Tme on Internet data n+1=31 .25 x 31 = 7.75 location 8 = 87  1st Quartile .75 x 31 = 23.25 location 23 = 108  3rd Quartile Interquartile Range (IQR) =108 – 87 = 21
  • 27. 27 Box Plots  A box plot is a graphical display, based on quartiles, that helps to picture a set of data.  Five pieces of data are needed to construct a box plot:  Minimum Value  First Quartile  Median  Third Quartile  Maximum Value. 4-26
  • 29. 29 Outliers  An outlier is data point that is far removed from the other entries in the data set.  Outliers could be  Mistakes made in recording data  Data that don’t belong in population  True rare events
  • 30. 30 Outliers have a dramatic effect on some statistics  Example quarterly home sales for 10 realtors: 2 2 3 4 5 5 6 6 7 50 with outlier without outlier Mean 9.00 4.44 Median 5.00 5.00 Std Dev 14.51 1.81 IQR 3.00 3.50
  • 31. 31 Using Box Plot to find outliers  The “box” is the region between the 1st and 3rd quartiles.  Possible outliers are more than 1.5 IQR’s from the box (inner fence)  Probable outliers are more than 3 IQR’s from the box (outer fence)  In the box plot below, the dotted lines represent the “fences” that are 1.5 and 3 IQR’s from the box. See how the data point 50 is well outside the outer fence and therefore an almost certain outlier. BoxPlot 0 10 20 30 40 50 60 # 1
  • 32. 32 Using Z-score to detect outliers  Calculate the mean and standard deviation without the suspected outlier.  Calculate the Z-score of the suspected outlier.  If the Z-score is more than 3 or less than -3, that data point is a probable outlier. 2 . 25 81 . 1 4 . 4 50    Z
  • 33. 33 Outliers – what to do  Remove or not remove, there is no clear answer.  For some populations, outliers don’t dramatically change the overall statistical analysis. Example: the tallest person in the world will not dramatically change the mean height of 10000 people.  However, for some populations, a single outlier will have a dramatic effect on statistical analysis (called “Black Swan” by Nicholas Taleb) and inferential statistics may be invalid in analyzing these populations. Example: the richest person in the world will dramatically change the mean wealth of 10000 people.
  • 34. 34 Bivariate Data  Ordered numeric pairs (X,Y)  Both values are numeric  Paired by a common characteristic  Graph as Scatterplot
  • 35. 35 Example of Bivariate Data  Housing Data  X = Square Footage  Y = Price
  • 36. 36 Example of Scatterplot Housing Prices and Square Footage 0 20 40 60 80 100 120 140 160 180 200 10 15 20 25 30 Size Price
  • 37. 37 Another Example Housing Prices and Square Footage - San Jose Only 40 50 60 70 80 90 100 110 120 130 15 20 25 30 Size Price
  • 38. 38 Correlation Analysis  Correlation Analysis: A group of statistical techniques used to measure the strength of the relationship (correlation) between two variables.  Scatter Diagram: A chart that portrays the relationship between the two variables of interest.  Dependent Variable: The variable that is being predicted or estimated. “Effect”  Independent Variable: The variable that provides the basis for estimation. It is the predictor variable. “Cause?” (Maybe!) 12-3
  • 39. 39 The Coefficient of Correlation, r  The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables.  It requires interval or ratio-scaled data (variables).  It can range from -1 to 1.  Values of -1 or 1 indicate perfect and strong correlation.  Values close to 0 indicate weak correlation.  Negative values indicate an inverse relationship and positive values indicate a direct relationship. 12-4
  • 40. 40 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y 12-6 Perfect Positive Correlation
  • 41. 41 Perfect Negative Correlation 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y 12-5
  • 42. 42 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y 12-7 Zero Correlation
  • 43. 43 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y 12-8 Strong Positive Correlation
  • 44. 44 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y 12-8 Weak Negative Correlation
  • 45. 45 Causation  Correlation does not necessarily imply causation.  There are 4 possibilities if X and Y are correlated: 1. X causes Y 2. Y causes X 3. X and Y are caused by something else. 4. Confounding - The effect of X and Y are hopelessly mixed up with other variables.
  • 46. 46 Causation - Examples  City with more police per capita have more crime per capita.  As Ice cream sales go up, shark attacks go up.  People with a cold who take a cough medicine feel better after some rest.
  • 47. 47 Formula for correlation coefficient r SSY SSX SSXY r   12-9       Y X XY SSXY Y Y SSY X X SSX n n n               1 2 1 2 2 1 2
  • 48. 48 Example  X = Average Annual Rainfall (Inches)  Y = Average Sale of Sunglasses/1000  Make a Scatter Diagram  Find the correlation coefficient X 10 15 20 30 40 Y 40 35 25 25 15
  • 49. 49 Example continued  Make a Scatter Diagram  Find the correlation coefficient
  • 50. 50 Example continued X Y X2 Y2 XY 10 40 100 1600 400 15 35 225 1225 525 20 25 400 625 500 30 25 900 625 750 40 15 1600 225 600 115 140 3225 4300 2775 • SSX = 3225 - 1152/5 = 580 • SSY = 4300 - 1402/5 = 380 • SSXY= 2775 - (115)(140)/5 = -445
  • 51. 51 Example continued  Strong negative correlation 9479 . 0 330 580 445        r SSY SSX SSXY r
  • 53. Minitab - Correlation 53 Stat> Basic Statistics> Correlation

Editor's Notes

  1. © Maurice Geraghty 2008
  2. © Maurice Geraghty 2008
  3. © Maurice Geraghty 2008
  4. © Maurice Geraghty 2008
  5. © Maurice Geraghty 2008
  6. © Maurice Geraghty 2008
  7. © Maurice Geraghty 2008
  8. © Maurice Geraghty 2008
  9. © Maurice Geraghty 2008
  10. © Maurice Geraghty 2008
  11. © Maurice Geraghty 2008
  12. © Maurice Geraghty 2008
  13. © Maurice Geraghty 2008
  14. © Maurice Geraghty 2008
  15. © Maurice Geraghty 2008
  16. © Maurice Geraghty 2008
  17. © Maurice Geraghty 2008
  18. © Maurice Geraghty 2008
  19. © Maurice Geraghty 2008
  20. © Maurice Geraghty 2008
  21. © Maurice Geraghty 2008
  22. © Maurice Geraghty 2008
  23. © Maurice Geraghty 2008
  24. © Maurice Geraghty 2008
  25. © Maurice Geraghty 2008
  26. © Maurice Geraghty 2008
  27. © Maurice Geraghty 2008
  28. © Maurice Geraghty 2008
  29. © Maurice Geraghty 2008
  30. © Maurice Geraghty 2008
  31. © Maurice Geraghty 2008
  32. © Maurice Geraghty 2008
  33. © Maurice Geraghty 2008
  34. © Maurice Geraghty 2008
  35. © Maurice Geraghty 2008
  36. © Maurice Geraghty 2008
  37. © Maurice Geraghty 2008
  38. © Maurice Geraghty 2008
  39. © Maurice Geraghty 2008
  40. © Maurice Geraghty 2008
  41. © Maurice Geraghty 2008
  42. © Maurice Geraghty 2008