SlideShare a Scribd company logo
1 of 36
Download to read offline
INTRODUCTION TO
STATISTICS & PROBABILITY
Chapter 2:
Looking at Data–Relationships (Part 3)
1
Dr. Nahid Sultana
Chapter 2:
Looking at Data–Relationships
2
2.1: Scatterplots
2.2: Correlation
2.3: Least-Squares Regression
2.5: Data Analysis for Two-Way Tables
3
Objectives
 Regression lines
 Prediction and Extrapolation
 Correlation and r2
2.3 Least-Squares Regression
Regression line
4
 Correlation tells us about strength and direction of the linear
relationship between two quantitative variables.
 In Regression we study the association between two variables in
order to explain the values of one from the values of the other
(i.e., make predictions).
 When there is a linear association between two variables, then a
straight line equation can be used to model the relationship.
 In regression the distinction between Response and Explanatory is
important.
Regression line (Cont…)
5
 A regression line is a line that best describes the linear
relationship between the two variables, and it is expressed by
means of an equation of the form:
Where is the slope and is the intercept.
 Once the equation of the regression line is established, we can
use it to predict the response y for a specific value of the
explanatory variable x .
The least-squares regression line
6
The least-squares regression line is the line that makes the sum of
the squares of the vertical distances of the data points from the
line as small as possible.
The least-squares regression line (Cont.)
7
is the predicted y value (y hat)
b1 is the slope
b0 is the y-intercept
ˆy
xbby 10ˆ +=
The equation of the least-squares regression line of y on x is
b1 = r
sy
sx
First we calculate the slope of the line,
Where
r is the correlation,
sy is the standard deviation of the response variable y,
sx is the standard deviation of the explanatory variable x.
Once we know b1, the slope, we can calculate b0, the y-intercept:
b0 = y − b1 x
Where and are the sample means of the x and y variables
How to plot the least-squares
regression line
8
Typically, we use stats software.
x y
How to plot the least-squares
regression line (Cont…)
9
To plot the regression line you only need to plug the x values into the
equation, get y, and draw the line that goes through those points.
Hint: The regression line always passes through the mean of x and y.
9
The points you use for
drawing the regression
line are derived from the
equation.
They are NOT points from
your sample data (except
by pure coincidence).
9
Two different regression lines can be drawn if we
interchange the roles of x and y.
Example:
10
Correlation coefficient of NEA and Fat, r = -0.779 stay same in both cases
Nonexercise activity (calories)
Fatgain(Kilograms)
7006005004003002001000-100
4
3
2
1
0
Fitted Line Plot
Fat = 3.505 - 0.003441 NEA
Fat gain (Kilograms)
Nonexerciseactivity(calories)
43210
700
600
500
400
300
200
100
0
-100
Fitted Line Plot
NEA = 745.3 - 176.1 Fat
BEWARE!!!
Not all calculators and software use the same convention. Some use:
And some use:
bxay +=ˆ
ˆy = ax + b
Make sure you know what YOUR calculator gives you for a and b before
you answer homework or exam questions.
11
Making predictions
The equation of the least-squares regression allows you to predict y
for any x within the range studied.
yˆ
ˆy = 0.0144x + 0.0008
Nobody in the study drank 6.5
beers, but by finding the value
from the regression line for x = 6.5
we would expect a blood alcohol
content of 0.094 mg/ml.
mg/ml0944.00008.0936.0ˆ
0008.05.6*0144.0ˆ
=+=
+=
y
y
Year Powerboats Dead Manate es
1977 447 13
1978 460 21
1979 481 24
1980 498 16
1981 513 24
1982 512 20
1983 526 15
1984 559 34
1985 585 33
1986 614 33
1987 645 39
1988 675 43
1989 711 50
1990 719 47
 There is a positive linear relationship between the number of
powerboats registered and the number of manatee deaths.
(in 1000s)
1.214.415.62ˆ4.41)500(125.0ˆ =−=⇒−= yy
 Roughly 21 manatees.
 Thus if we were to limit the number of powerboat registrations to
500,000, what could we expect for the number of manatee deaths?
 The least squares regression line has the equation: ˆy = 0.125 x − 41.4
ˆy = 0.125 x − 41.4
13 ----Could we use this regression line to predict the number of manatee
deaths for a year with 200,000 powerboat registrations?
Extrapolation is the use of a
regression line for prediction
far outside the range of values
of x used to obtain the line.
Such predictions are often not
accurate.
!!!
!!!
Extrapolation
14
Sometimes the y-intercept is not biologically possible.
Here we have negative blood alcohol content, which makes no sense…
y-intercept shows
negative blood alcohol
But the negative value is
appropriate for the equation
of the regression line.
There is a lot of scatter in the
data, and the line is just an
estimate.
The y intercept
15
Coefficient of determination, r2
16
 Least-squares regression looks at the distances of the data points
from the line only in the y direction.
 The variables x and y play different roles in regression.
 Even though correlation r ignores the distinction between x and y,
there is a close connection between correlation and regression.
 r2 is called the coefficient of determination.
 r2 represents the percentage of the variance in y (vertical scatter
from the regression line) that can be explained by changes in x.
r = -1
r2 = 1
Changes in x
explain 100% of
the variations in y.
Y can be entirely
predicted for any
given value of x.
r = 0
r2 = 0
Changes in x
explain 0% of the
variations in y.
The values y takes
are entirely
independent of
what value x
takes.
Here the change in x only
explains 76% of the change in
y. The rest of the change in y
(the vertical scatter, shown as
red arrows) must be explained
by something other than x.
r = 0.87
r2 = 0.76
17
17
r = –0.3, r 2 = 0.09, or 9%
The regression model explains not even 10%
of the variations in y.
r = –0.7, r 2 = 0.49, or 49%
The regression model explains nearly half of
the variations in y.
r = –0.99, r 2 = 0.9801, or ~98%
The regression model explains almost all of
the variations in y.
r = –0.3, r 2 = 0.09, or 9%
The regression model explains not even 10%
of the variations in y.
r = –0.7, r 2 = 0.49, or 49%
The regression model explains nearly half of
the variations in y.
r = –0.99, r 2 = 0.9801, or ~98%
The regression model explains almost all of
the variations in y.
18
Observed y
Predicted ŷ
residual)ˆ(dist. =− yy
Residuals
19
Points above the
line have a positive
residual.
Points below the line have a
negative residual.
A residual is the difference between an observed value of the
response variable and the value predicted by the regression line:
residual = observed y – predicted y = y − ˆy
The sum of these
residuals is always 0.
 A residual plot is a scatterplot of the regression residuals against
the explanatory variable.
 Residual plots help us assess the fit of a regression line.
 If residuals are scattered randomly around 0, chances are your
data fit a linear model, was normally distributed, and you didn’t
have outliers.
Residual plots
20
The x-axis in a residual plot is
the same as on the
scatterplot.
Only the y-axis is different.
21
Residuals are randomly
scattered—good!
22
Curved pattern—means the
relationship you are looking at is
not linear.
A change in variability across a
plot is a warning sign. You need to
find out why it is, and remember
that predictions made in areas of
larger variability will not be as
good.
2.5 Data Analysis for Two-Way Tables
23
Objectives
 The Two-Way Table
 Marginal Distribution
 Conditional Distributions
23
24
Two-way tables
Two-way tables summarize data about two categorical variables (or
factors) collected on the same set of individuals.
Example (Smoking Survey in Arizona): High school students were
asked whether they smoke and whether their parents smoke.
Does parental smoking influence the smoking habits of their high school
children?
Explanatory Variable: Smoking habit of student’s parents
(both smoke/ one smoke/ neither smoke)
Response variable: Smoking habit of student
(smokes/does not smoke)
To analyze the relationship we can summarize the result in a Two-way
table:
25
Two-way tables (Cont …)
Explanatory (Row) Variable: Smoking habit of student’s parents
Response (Column) variable: Smoking habit of student
This 3X2 two-way table has 3 rows and 2 columns. Numbers are counts
or frequency
400 1380
416 1823
188 1168
First factor:
Parent smoking status
Second factor:
Student smoking status
High school students were asked whether they smoke,
and whether their parents smoke:
26
Margins
Margins show the total for each column and each row.
 For each cell, we can compute a proportion by dividing the cell
entry by the total sample size.
 The collection of these proportions is the joint distribution of the
two categorical variables.
400 1380
416 1823
188 1168
Margin for parental
smoking
Margin for student smoking
27
Marginal distributions
(When examine the distribution of a single variable in a two-way table)
 Marginal distributions: Distribution of column variable separately (or
row variable separately) expressed in counts or percent.
%1.33
5375
1780
≈
%7.18
5375
1004
=
400 1380 33.1%
416 1823 41.7%
188 1168 25.2%
18.7% 81.3% 100%
400 1380
416 1823
188 1168
28
Marginal distribution (Cont..)
The marginal distributions can
be displayed on separate bar
graphs, typically expressed as
percents instead of raw counts.
Each graph represents only one
of the two variables, ignoring
the second one. Each marginal
distribution can also be shown
in a pie chart.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Smoker Nonsmoker
Percentofstudentsinterviewed
Sum of Counts
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Both One Neither
Percentofstudentsinterviewed
Sum of Counts Parental smoking
Student smoking
29
Conditional Distribution
A conditional distribution is the distribution of one factor for each
level of the other factor.
A conditional percent is computed using the counts within a single row
or a single column. The denominator is the corresponding row or
column total (rather than the table grand total).
Percent of students who smoke when both parents smoke = 400/1780 = 22.5%
400 1380
416 1823
188 1168
Percent of students who smoke when both parents smoke = 400/1780 = 22.5%
400 1380
416 1823
188 1168
30
Conditional distributions (Cont…)
Conditional distribution of student smokers for different parental smoking statuses:
Percent of students who smoke when both parents smoke = 400/1780 = 22.5%
Percent of students who smoke when one parent smokes = 416/2239 = 18.6%
Percent of students who smoke when neither parent smokes = 188/1356 = 13.9%
400 1380
416 1823
188 1168
 Comparing conditional distributions helps us describe the “relationship"
between the two categorical variables.
 We can compare the percent of individuals in one level of factor 1 for
each level of factor 2.
31
Conditional distributions (Cont…)
Conditional distribution of student smoking status for different levels of parental
smoking status: Percent who
smoke
Percent who
do not smoke
Row total
Both parents smoke 22% 78% 100%
One parent smokes 19% 81% 100%
Neither parent smokes 14% 86% 100%
The conditional distributions can be compared graphically by displaying the percents
making up one level of one factor, for each level of the other factor.
32
Conditional Distribution
 In the table below, the 25 to 34 age group occupies the first column.
33
Conditional distributions (Cont…)
Here the percents are calculated
by age range (columns).
29.30% = 11071
37785
= cell total .
column total
34
The conditional distributions can be graphically compared using side by
side bar graphs of one variable for each value of the other variable.
Here, the percents are
calculated by age range
(columns).
34
35
Music and wine purchase decision
We want to compare the conditional distributions of the response variable
(wine purchased) for each value of the explanatory variable (music
played). Therefore, we calculate column percents.
What is the relationship between type of
music played in supermarkets and type of
wine purchased?
We calculate the column
conditional percents similarly for
each of the nine cells in the table:
Calculations: When no music was played, there
were 84 bottles of wine sold. Of these, 30 were
French wine. 30/84 = 0.357  35.7% of the wine
sold was French when no music was played.
30 = 35.7%
84
= cell total .
column total
For every two-way table, there are two
sets of possible conditional distributions.
Wine purchased for each kind of
music played (column percents)
Music played for each
kind of wine purchased
(row percents)
Does background music
in supermarkets
influence customer
purchasing decisions?
36

More Related Content

What's hot

What's hot (20)

Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
 
Poisson distribution
Poisson distributionPoisson distribution
Poisson distribution
 
Correlation
CorrelationCorrelation
Correlation
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression
RegressionRegression
Regression
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Binomial and Poisson Distribution
Binomial and Poisson  DistributionBinomial and Poisson  Distribution
Binomial and Poisson Distribution
 
Regression
RegressionRegression
Regression
 

Viewers also liked

Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 
The least square method
The least square methodThe least square method
The least square methodkevinlefol
 
Space Hustlers Comic
Space Hustlers ComicSpace Hustlers Comic
Space Hustlers ComicSteve Owen
 
Department of clinical pharmacy an overview with renal system (2)
Department of clinical pharmacy an overview with renal system (2)Department of clinical pharmacy an overview with renal system (2)
Department of clinical pharmacy an overview with renal system (2)Andrew Agbenin
 
Портфолио Чекусовой
Портфолио Чекусовой Портфолио Чекусовой
Портфолио Чекусовой Harokol
 
Портрет слова группа 2
Портрет слова группа 2Портрет слова группа 2
Портрет слова группа 2Harokol
 
DMDL EditorXとToad Editorの紹介
DMDL EditorXとToad Editorの紹介DMDL EditorXとToad Editorの紹介
DMDL EditorXとToad Editorの紹介hishidama
 
samoupravlenye
samoupravlenyesamoupravlenye
samoupravlenyeHarokol
 
Презентация памятники Волгодонска. Петрова Алла
Презентация памятники Волгодонска. Петрова АллаПрезентация памятники Волгодонска. Петрова Алла
Презентация памятники Волгодонска. Петрова АллаHarokol
 
Проект Павленко "Безопасные каникулы".
Проект Павленко "Безопасные каникулы".Проект Павленко "Безопасные каникулы".
Проект Павленко "Безопасные каникулы".Harokol
 
samoupravlenie
samoupravleniesamoupravlenie
samoupravlenieHarokol
 
LABORATORY AND PHYSICAL ASSESSMENT DATA (1)
LABORATORY AND PHYSICAL ASSESSMENT DATA (1)LABORATORY AND PHYSICAL ASSESSMENT DATA (1)
LABORATORY AND PHYSICAL ASSESSMENT DATA (1)Andrew Agbenin
 
Expecting Parents Guide to Birth Defects ebook
Expecting Parents Guide to Birth Defects ebookExpecting Parents Guide to Birth Defects ebook
Expecting Parents Guide to Birth Defects ebookPerey Law
 
2016: A good year to invest in Spanish property?
2016: A good year to invest in Spanish property?2016: A good year to invest in Spanish property?
2016: A good year to invest in Spanish property?Simon Birch
 
Chapter 2 part2-Correlation
Chapter 2 part2-CorrelationChapter 2 part2-Correlation
Chapter 2 part2-Correlationnszakir
 
Report submitted to (1)
Report submitted to (1)Report submitted to (1)
Report submitted to (1)Andrew Agbenin
 
JJUG CCC 2016 Fall hishidama
JJUG CCC 2016 Fall hishidamaJJUG CCC 2016 Fall hishidama
JJUG CCC 2016 Fall hishidamahishidama
 
Chapter 3 part2- Sampling Design
Chapter 3 part2- Sampling DesignChapter 3 part2- Sampling Design
Chapter 3 part2- Sampling Designnszakir
 
why rape jokes are bad
why rape jokes are badwhy rape jokes are bad
why rape jokes are badAmy Robison
 

Viewers also liked (20)

Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
The least square method
The least square methodThe least square method
The least square method
 
Space Hustlers Comic
Space Hustlers ComicSpace Hustlers Comic
Space Hustlers Comic
 
Department of clinical pharmacy an overview with renal system (2)
Department of clinical pharmacy an overview with renal system (2)Department of clinical pharmacy an overview with renal system (2)
Department of clinical pharmacy an overview with renal system (2)
 
Портфолио Чекусовой
Портфолио Чекусовой Портфолио Чекусовой
Портфолио Чекусовой
 
Портрет слова группа 2
Портрет слова группа 2Портрет слова группа 2
Портрет слова группа 2
 
DMDL EditorXとToad Editorの紹介
DMDL EditorXとToad Editorの紹介DMDL EditorXとToad Editorの紹介
DMDL EditorXとToad Editorの紹介
 
samoupravlenye
samoupravlenyesamoupravlenye
samoupravlenye
 
Презентация памятники Волгодонска. Петрова Алла
Презентация памятники Волгодонска. Петрова АллаПрезентация памятники Волгодонска. Петрова Алла
Презентация памятники Волгодонска. Петрова Алла
 
Проект Павленко "Безопасные каникулы".
Проект Павленко "Безопасные каникулы".Проект Павленко "Безопасные каникулы".
Проект Павленко "Безопасные каникулы".
 
samoupravlenie
samoupravleniesamoupravlenie
samoupravlenie
 
Health literacy
Health literacyHealth literacy
Health literacy
 
LABORATORY AND PHYSICAL ASSESSMENT DATA (1)
LABORATORY AND PHYSICAL ASSESSMENT DATA (1)LABORATORY AND PHYSICAL ASSESSMENT DATA (1)
LABORATORY AND PHYSICAL ASSESSMENT DATA (1)
 
Expecting Parents Guide to Birth Defects ebook
Expecting Parents Guide to Birth Defects ebookExpecting Parents Guide to Birth Defects ebook
Expecting Parents Guide to Birth Defects ebook
 
2016: A good year to invest in Spanish property?
2016: A good year to invest in Spanish property?2016: A good year to invest in Spanish property?
2016: A good year to invest in Spanish property?
 
Chapter 2 part2-Correlation
Chapter 2 part2-CorrelationChapter 2 part2-Correlation
Chapter 2 part2-Correlation
 
Report submitted to (1)
Report submitted to (1)Report submitted to (1)
Report submitted to (1)
 
JJUG CCC 2016 Fall hishidama
JJUG CCC 2016 Fall hishidamaJJUG CCC 2016 Fall hishidama
JJUG CCC 2016 Fall hishidama
 
Chapter 3 part2- Sampling Design
Chapter 3 part2- Sampling DesignChapter 3 part2- Sampling Design
Chapter 3 part2- Sampling Design
 
why rape jokes are bad
why rape jokes are badwhy rape jokes are bad
why rape jokes are bad
 

Similar to Chapter 2 part3-Least-Squares Regression

Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate dataUlster BOCES
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).pptMuhammadAftab89
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.pptRidaIrfan10
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptkrunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptMoinPasha12
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Sciencessuser71ac73
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download herekeerthanakshatriya20
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
 

Similar to Chapter 2 part3-Least-Squares Regression (20)

Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
ML Module 3.pdf
ML Module 3.pdfML Module 3.pdf
ML Module 3.pdf
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Reg
RegReg
Reg
 
Correlation
CorrelationCorrelation
Correlation
 
Regression
RegressionRegression
Regression
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download here
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
 
Regression
RegressionRegression
Regression
 

More from nszakir

Chapter-4: More on Direct Proof and Proof by Contrapositive
Chapter-4: More on Direct Proof and Proof by ContrapositiveChapter-4: More on Direct Proof and Proof by Contrapositive
Chapter-4: More on Direct Proof and Proof by Contrapositivenszakir
 
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVE
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVEChapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVE
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVEnszakir
 
Chapter 2: Relations
Chapter 2: RelationsChapter 2: Relations
Chapter 2: Relationsnszakir
 
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...nszakir
 
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...nszakir
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
 
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
 
Chapter 5 part1- The Sampling Distribution of a Sample Mean
Chapter 5 part1- The Sampling Distribution of a Sample MeanChapter 5 part1- The Sampling Distribution of a Sample Mean
Chapter 5 part1- The Sampling Distribution of a Sample Meannszakir
 
Chapter 4 part4- General Probability Rules
Chapter 4 part4- General Probability RulesChapter 4 part4- General Probability Rules
Chapter 4 part4- General Probability Rulesnszakir
 
Chapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random VariablesChapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random Variablesnszakir
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variablesnszakir
 
Chapter 4 part1-Probability Model
Chapter 4 part1-Probability ModelChapter 4 part1-Probability Model
Chapter 4 part1-Probability Modelnszakir
 
Chapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical InferenceChapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical Inferencenszakir
 
Chapter 3 part1-Design of Experiments
Chapter 3 part1-Design of ExperimentsChapter 3 part1-Design of Experiments
Chapter 3 part1-Design of Experimentsnszakir
 
Chapter 2 part1-Scatterplots
Chapter 2 part1-ScatterplotsChapter 2 part1-Scatterplots
Chapter 2 part1-Scatterplotsnszakir
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributionsnszakir
 
Describing Distributions with Numbers
Describing Distributions with NumbersDescribing Distributions with Numbers
Describing Distributions with Numbersnszakir
 
Displaying Distributions with Graphs
Displaying Distributions with GraphsDisplaying Distributions with Graphs
Displaying Distributions with Graphsnszakir
 

More from nszakir (18)

Chapter-4: More on Direct Proof and Proof by Contrapositive
Chapter-4: More on Direct Proof and Proof by ContrapositiveChapter-4: More on Direct Proof and Proof by Contrapositive
Chapter-4: More on Direct Proof and Proof by Contrapositive
 
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVE
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVEChapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVE
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVE
 
Chapter 2: Relations
Chapter 2: RelationsChapter 2: Relations
Chapter 2: Relations
 
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
 
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
 
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
 
Chapter 5 part1- The Sampling Distribution of a Sample Mean
Chapter 5 part1- The Sampling Distribution of a Sample MeanChapter 5 part1- The Sampling Distribution of a Sample Mean
Chapter 5 part1- The Sampling Distribution of a Sample Mean
 
Chapter 4 part4- General Probability Rules
Chapter 4 part4- General Probability RulesChapter 4 part4- General Probability Rules
Chapter 4 part4- General Probability Rules
 
Chapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random VariablesChapter 4 part3- Means and Variances of Random Variables
Chapter 4 part3- Means and Variances of Random Variables
 
Chapter 4 part2- Random Variables
Chapter 4 part2- Random VariablesChapter 4 part2- Random Variables
Chapter 4 part2- Random Variables
 
Chapter 4 part1-Probability Model
Chapter 4 part1-Probability ModelChapter 4 part1-Probability Model
Chapter 4 part1-Probability Model
 
Chapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical InferenceChapter 3 part3-Toward Statistical Inference
Chapter 3 part3-Toward Statistical Inference
 
Chapter 3 part1-Design of Experiments
Chapter 3 part1-Design of ExperimentsChapter 3 part1-Design of Experiments
Chapter 3 part1-Design of Experiments
 
Chapter 2 part1-Scatterplots
Chapter 2 part1-ScatterplotsChapter 2 part1-Scatterplots
Chapter 2 part1-Scatterplots
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributions
 
Describing Distributions with Numbers
Describing Distributions with NumbersDescribing Distributions with Numbers
Describing Distributions with Numbers
 
Displaying Distributions with Graphs
Displaying Distributions with GraphsDisplaying Distributions with Graphs
Displaying Distributions with Graphs
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

Chapter 2 part3-Least-Squares Regression

  • 1. INTRODUCTION TO STATISTICS & PROBABILITY Chapter 2: Looking at Data–Relationships (Part 3) 1 Dr. Nahid Sultana
  • 2. Chapter 2: Looking at Data–Relationships 2 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way Tables
  • 3. 3 Objectives  Regression lines  Prediction and Extrapolation  Correlation and r2 2.3 Least-Squares Regression
  • 4. Regression line 4  Correlation tells us about strength and direction of the linear relationship between two quantitative variables.  In Regression we study the association between two variables in order to explain the values of one from the values of the other (i.e., make predictions).  When there is a linear association between two variables, then a straight line equation can be used to model the relationship.  In regression the distinction between Response and Explanatory is important.
  • 5. Regression line (Cont…) 5  A regression line is a line that best describes the linear relationship between the two variables, and it is expressed by means of an equation of the form: Where is the slope and is the intercept.  Once the equation of the regression line is established, we can use it to predict the response y for a specific value of the explanatory variable x .
  • 6. The least-squares regression line 6 The least-squares regression line is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
  • 7. The least-squares regression line (Cont.) 7 is the predicted y value (y hat) b1 is the slope b0 is the y-intercept ˆy xbby 10ˆ += The equation of the least-squares regression line of y on x is
  • 8. b1 = r sy sx First we calculate the slope of the line, Where r is the correlation, sy is the standard deviation of the response variable y, sx is the standard deviation of the explanatory variable x. Once we know b1, the slope, we can calculate b0, the y-intercept: b0 = y − b1 x Where and are the sample means of the x and y variables How to plot the least-squares regression line 8 Typically, we use stats software. x y
  • 9. How to plot the least-squares regression line (Cont…) 9 To plot the regression line you only need to plug the x values into the equation, get y, and draw the line that goes through those points. Hint: The regression line always passes through the mean of x and y. 9 The points you use for drawing the regression line are derived from the equation. They are NOT points from your sample data (except by pure coincidence). 9
  • 10. Two different regression lines can be drawn if we interchange the roles of x and y. Example: 10 Correlation coefficient of NEA and Fat, r = -0.779 stay same in both cases Nonexercise activity (calories) Fatgain(Kilograms) 7006005004003002001000-100 4 3 2 1 0 Fitted Line Plot Fat = 3.505 - 0.003441 NEA Fat gain (Kilograms) Nonexerciseactivity(calories) 43210 700 600 500 400 300 200 100 0 -100 Fitted Line Plot NEA = 745.3 - 176.1 Fat
  • 11. BEWARE!!! Not all calculators and software use the same convention. Some use: And some use: bxay +=ˆ ˆy = ax + b Make sure you know what YOUR calculator gives you for a and b before you answer homework or exam questions. 11
  • 12. Making predictions The equation of the least-squares regression allows you to predict y for any x within the range studied. yˆ ˆy = 0.0144x + 0.0008 Nobody in the study drank 6.5 beers, but by finding the value from the regression line for x = 6.5 we would expect a blood alcohol content of 0.094 mg/ml. mg/ml0944.00008.0936.0ˆ 0008.05.6*0144.0ˆ =+= += y y
  • 13. Year Powerboats Dead Manate es 1977 447 13 1978 460 21 1979 481 24 1980 498 16 1981 513 24 1982 512 20 1983 526 15 1984 559 34 1985 585 33 1986 614 33 1987 645 39 1988 675 43 1989 711 50 1990 719 47  There is a positive linear relationship between the number of powerboats registered and the number of manatee deaths. (in 1000s) 1.214.415.62ˆ4.41)500(125.0ˆ =−=⇒−= yy  Roughly 21 manatees.  Thus if we were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths?  The least squares regression line has the equation: ˆy = 0.125 x − 41.4 ˆy = 0.125 x − 41.4 13 ----Could we use this regression line to predict the number of manatee deaths for a year with 200,000 powerboat registrations?
  • 14. Extrapolation is the use of a regression line for prediction far outside the range of values of x used to obtain the line. Such predictions are often not accurate. !!! !!! Extrapolation 14
  • 15. Sometimes the y-intercept is not biologically possible. Here we have negative blood alcohol content, which makes no sense… y-intercept shows negative blood alcohol But the negative value is appropriate for the equation of the regression line. There is a lot of scatter in the data, and the line is just an estimate. The y intercept 15
  • 16. Coefficient of determination, r2 16  Least-squares regression looks at the distances of the data points from the line only in the y direction.  The variables x and y play different roles in regression.  Even though correlation r ignores the distinction between x and y, there is a close connection between correlation and regression.  r2 is called the coefficient of determination.  r2 represents the percentage of the variance in y (vertical scatter from the regression line) that can be explained by changes in x.
  • 17. r = -1 r2 = 1 Changes in x explain 100% of the variations in y. Y can be entirely predicted for any given value of x. r = 0 r2 = 0 Changes in x explain 0% of the variations in y. The values y takes are entirely independent of what value x takes. Here the change in x only explains 76% of the change in y. The rest of the change in y (the vertical scatter, shown as red arrows) must be explained by something other than x. r = 0.87 r2 = 0.76 17 17
  • 18. r = –0.3, r 2 = 0.09, or 9% The regression model explains not even 10% of the variations in y. r = –0.7, r 2 = 0.49, or 49% The regression model explains nearly half of the variations in y. r = –0.99, r 2 = 0.9801, or ~98% The regression model explains almost all of the variations in y. r = –0.3, r 2 = 0.09, or 9% The regression model explains not even 10% of the variations in y. r = –0.7, r 2 = 0.49, or 49% The regression model explains nearly half of the variations in y. r = –0.99, r 2 = 0.9801, or ~98% The regression model explains almost all of the variations in y. 18
  • 19. Observed y Predicted ŷ residual)ˆ(dist. =− yy Residuals 19 Points above the line have a positive residual. Points below the line have a negative residual. A residual is the difference between an observed value of the response variable and the value predicted by the regression line: residual = observed y – predicted y = y − ˆy The sum of these residuals is always 0.
  • 20.  A residual plot is a scatterplot of the regression residuals against the explanatory variable.  Residual plots help us assess the fit of a regression line.  If residuals are scattered randomly around 0, chances are your data fit a linear model, was normally distributed, and you didn’t have outliers. Residual plots 20
  • 21. The x-axis in a residual plot is the same as on the scatterplot. Only the y-axis is different. 21
  • 22. Residuals are randomly scattered—good! 22 Curved pattern—means the relationship you are looking at is not linear. A change in variability across a plot is a warning sign. You need to find out why it is, and remember that predictions made in areas of larger variability will not be as good.
  • 23. 2.5 Data Analysis for Two-Way Tables 23 Objectives  The Two-Way Table  Marginal Distribution  Conditional Distributions 23
  • 24. 24 Two-way tables Two-way tables summarize data about two categorical variables (or factors) collected on the same set of individuals. Example (Smoking Survey in Arizona): High school students were asked whether they smoke and whether their parents smoke. Does parental smoking influence the smoking habits of their high school children? Explanatory Variable: Smoking habit of student’s parents (both smoke/ one smoke/ neither smoke) Response variable: Smoking habit of student (smokes/does not smoke) To analyze the relationship we can summarize the result in a Two-way table:
  • 25. 25 Two-way tables (Cont …) Explanatory (Row) Variable: Smoking habit of student’s parents Response (Column) variable: Smoking habit of student This 3X2 two-way table has 3 rows and 2 columns. Numbers are counts or frequency 400 1380 416 1823 188 1168 First factor: Parent smoking status Second factor: Student smoking status High school students were asked whether they smoke, and whether their parents smoke:
  • 26. 26 Margins Margins show the total for each column and each row.  For each cell, we can compute a proportion by dividing the cell entry by the total sample size.  The collection of these proportions is the joint distribution of the two categorical variables. 400 1380 416 1823 188 1168 Margin for parental smoking Margin for student smoking
  • 27. 27 Marginal distributions (When examine the distribution of a single variable in a two-way table)  Marginal distributions: Distribution of column variable separately (or row variable separately) expressed in counts or percent. %1.33 5375 1780 ≈ %7.18 5375 1004 = 400 1380 33.1% 416 1823 41.7% 188 1168 25.2% 18.7% 81.3% 100% 400 1380 416 1823 188 1168
  • 28. 28 Marginal distribution (Cont..) The marginal distributions can be displayed on separate bar graphs, typically expressed as percents instead of raw counts. Each graph represents only one of the two variables, ignoring the second one. Each marginal distribution can also be shown in a pie chart. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Smoker Nonsmoker Percentofstudentsinterviewed Sum of Counts 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% Both One Neither Percentofstudentsinterviewed Sum of Counts Parental smoking Student smoking
  • 29. 29 Conditional Distribution A conditional distribution is the distribution of one factor for each level of the other factor. A conditional percent is computed using the counts within a single row or a single column. The denominator is the corresponding row or column total (rather than the table grand total). Percent of students who smoke when both parents smoke = 400/1780 = 22.5% 400 1380 416 1823 188 1168 Percent of students who smoke when both parents smoke = 400/1780 = 22.5% 400 1380 416 1823 188 1168
  • 30. 30 Conditional distributions (Cont…) Conditional distribution of student smokers for different parental smoking statuses: Percent of students who smoke when both parents smoke = 400/1780 = 22.5% Percent of students who smoke when one parent smokes = 416/2239 = 18.6% Percent of students who smoke when neither parent smokes = 188/1356 = 13.9% 400 1380 416 1823 188 1168  Comparing conditional distributions helps us describe the “relationship" between the two categorical variables.  We can compare the percent of individuals in one level of factor 1 for each level of factor 2.
  • 31. 31 Conditional distributions (Cont…) Conditional distribution of student smoking status for different levels of parental smoking status: Percent who smoke Percent who do not smoke Row total Both parents smoke 22% 78% 100% One parent smokes 19% 81% 100% Neither parent smokes 14% 86% 100% The conditional distributions can be compared graphically by displaying the percents making up one level of one factor, for each level of the other factor.
  • 32. 32 Conditional Distribution  In the table below, the 25 to 34 age group occupies the first column.
  • 33. 33 Conditional distributions (Cont…) Here the percents are calculated by age range (columns). 29.30% = 11071 37785 = cell total . column total
  • 34. 34 The conditional distributions can be graphically compared using side by side bar graphs of one variable for each value of the other variable. Here, the percents are calculated by age range (columns). 34
  • 35. 35 Music and wine purchase decision We want to compare the conditional distributions of the response variable (wine purchased) for each value of the explanatory variable (music played). Therefore, we calculate column percents. What is the relationship between type of music played in supermarkets and type of wine purchased? We calculate the column conditional percents similarly for each of the nine cells in the table: Calculations: When no music was played, there were 84 bottles of wine sold. Of these, 30 were French wine. 30/84 = 0.357  35.7% of the wine sold was French when no music was played. 30 = 35.7% 84 = cell total . column total
  • 36. For every two-way table, there are two sets of possible conditional distributions. Wine purchased for each kind of music played (column percents) Music played for each kind of wine purchased (row percents) Does background music in supermarkets influence customer purchasing decisions? 36