SlideShare a Scribd company logo
CORRELATION AND REGRESSION
INTRODUCTION
• We deal with data that consists of random pairs (
or sets) of observations. Elements in each are
observations from the same subject
Ways to deal with such data
• Ignore any relation between the variables and
analyze them separately
• Use correlation to describe the intensity of
association between the two variables
• Use regression analysis to assess the degree and
nature of association between the variables
CORRELATION
• If we have two continuous variables X and Y
we can summarize them with five parameters
namely
• The two means(µx, µy)
• The two variances(σx,σy) and
• The covariance (σxy)
Covariance
• The sample covariance is calculated as the
sum of cross products ( of deviations ) divided
by the degrees of freedom
Properties of covariance
• If large values of X pair with large values of Y
and small values of x with small values of y the
covariance will be positive.
• If large x go with small Y and vice versa then
the covariance will be negative
• If X and Y are independent then covariance
will be zero
Correlation coefficient
• Can replace the covariance without any loss of
information. Its denoted by ρ while the
statistic is denoted by r
Computation of r
The sample correlation coefficient can be
calculated by
Properties of correlation coefficient
• The value of r is always between -1 and 1.
• Positive values indicate a positive association
between the variables
• Negative values indicate a negative
association between the variables
• If r=1 or r =-1 then all of the cases fall on a
straight line
Coefficient of determination
• This is the square of the correlation
coefficient.
• Recall that the total sum of squares is a
measure of variability of a variable
Cont
• The sum of squares may be given as
• Similarly
Is a measure of the total joint variability of X and
Y
Cont
• Measure of variability of Y over and above
that of the joint variability of both X and Y
(SS[Y|X]) is called the sum of squares due to
regression of Y on X denotes as SSr
• It can be shown that r2 is the ratio of SSr to
SSto
Cont`
• Coefficient of determination is therefore a
measure of variability in Y that is explained by
the variable X
Properties of r2
• Coefficeient of determination lies between 0
and 1
• When the variables are highly correlated r2 is
near 1 and near 0 when they are not
correlated.
Example
• Consider the following data
X Y
9 0
9 9
8 1
5 1
7 9
-Find the variance of x and y and the covariance of x and
y
Find the correlation coefficient and the coefficient of
determination
Spearman rank correlation
Sampling distribution of r
• The sampling distribution is only symmetric
when the parameter ρ=0.
• It becomes skewed as ρ moves away from 0
• Hence we cannot use CLT in computing
confidence interval for ρ and in hypothesis
testing
• Two variables are correlated if r>0.5 and the
sample is large enough
Testing hypothesis about ρ
Test H0:ρ=0
• Recall that if ρ=0 then the two variables are
not correlated
• The test assesses whether there is correlation
between variables .The test statistic
Hypothesis testing cont`
Test H0 :ρ=ρ0 whereρ0 is not equal to zero.
We transform to z` and the test statistic is
Where , and
95% C.I will be given by z`±1.96×σz
Example
• Consider the data used above ; Are x and y
correlated?
REGRESSION
• Model of relationships between some
covariates and outcome.
• Often used for exploratory settings
• Sometimes be used for confirmatory studies
• A regression line is an equation that describes
the relationship between a response variable
y(outcome) and an explanatory variable x(
covariates.
Regression continued
• Statistical relationships may be linear,
exponential , polynomial logarithmic etc
• Simplest form is the linear
• Linear means linear in the coefficients i.e. y is
a linear functions of the coefficients
• Non linear relations can be modified into
forms that are approximately linear through
the transformation
Simple linear regression
• Linear relationship may be summarized using an
equation
y = 0 + 1x
where 0 is the intercept and 1 the slope of the
line.
• For observation i ( i = 1,2,…,10 ) whose value of
the explanatory variable is xi one would expect
the corresponding response yi to be such that
E(yi) = 0 + 1xi
• The statistical model fitted for a simple linear
regression is of the form
yi= 0 + 1xi +εi i = 1, ..., n
EXAMPLE
Study of the effect of temperature on the rate of development
of the potato leafhooper, Empoasca fabae. The response (y)
was the mean length of the development period (in days)
from egg to adult.
Temperature (F) Mean length (days)
59.8 30.2
67.6 27.3
70.0 26.8
70.4 23.3
74.0 19.1
75.3 19.0
78.0 16.5
80.4 15.9
81.4 14.8
83.2 14.2
Mean length of development period of
potato leafhopper versus temperature
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90
Genstat Analysis using Stats, Regression
Analysis, Linear Models…
Output from Genstat
• Regression Analysis
• Response variate: length
• Fitted terms: Constant, temp
• Summary of analysis
• d.f. s.s. m.s. v.r. F pr.
• Regression 1 282.28 282.282 120.85 <.001
• Residual 8 18.69 2.336
• Total 9 300.97 33.441
• Estimates of parameters
• estimate s.e. t(8) t pr.
• Constant 78.09 5.24 14.90 <.001
• temp -0.7753 0.0705 -10.99 <.001
ASSUMPTIONS
• Error terms have constant variance(
Homoscedascity)
• The error terms are independent
• The error terms are normally distributed
• The regression function is linear
• Outliers
• Important independent variables in the model
– Must be checked
How to check the assumptions
• Diagnostic plots
• The plots tell you whether the regression is
even appropriate.
• Include univariate plots, bivariate plots,
Residual analysis plots
Univariate plots of X and Y
• To look for outliers
• Examine the shape of the distribution
• Include box plots, stem plots , histograms and
dot plots for x and y
Bivariate plots
• Plots of X vs Y
• Is the relationship between the two variables
linear?
• Are there two dimensional outliers?
• Does the assumption of constant variance
look reasonable?
Plots of residuals versus X
• Useful for detecting non linearity
• Any observable pattern in the residual versus
X plot indicate a problem with model
assumption.
Plot the residuals versus Y'
• For one predictor variable its has same
information as previous.
• For multiple linear regression the plot lets us
examine patterns of the residuals with
increasing response.
Plot the standardized residuals versus
x

More Related Content

Similar to CORRELATION AND REGRESSION.pptx

Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
Khulna University
 
REGRESSION METasdfghjklmjhgftrHODS1.pptx
REGRESSION METasdfghjklmjhgftrHODS1.pptxREGRESSION METasdfghjklmjhgftrHODS1.pptx
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
Regression
RegressionRegression
Regression
Regent University
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Alichy Sowmya
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
vigia41
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
Rione Drevale
 
rugs koco.pptx
rugs koco.pptxrugs koco.pptx
rugs koco.pptx
AbdalrahmanTahaJaya
 
Regression
Regression Regression
Regression
sayantansarkar50
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
Parag Shah
 
11. simple regression and correlation analysis
11. simple regression and correlation analysis11. simple regression and correlation analysis
11. simple regression and correlation analysis
Yohanes Kevin
 
Correlations
CorrelationsCorrelations
Regression analysis
Regression analysisRegression analysis
Regression analysis
Awais Salman
 
regression
regressionregression
correlationandregression-180119033043.pdf
correlationandregression-180119033043.pdfcorrelationandregression-180119033043.pdf
correlationandregression-180119033043.pdf
MuhammadAftab89
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
Ajendra7846
 
Statistics ppt
Statistics pptStatistics ppt
Statistics ppt
PMuruganBalaMurugan
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 

Similar to CORRELATION AND REGRESSION.pptx (20)

Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 
Research Methodology-Chapter 14
Research Methodology-Chapter 14Research Methodology-Chapter 14
Research Methodology-Chapter 14
 
REGRESSION METasdfghjklmjhgftrHODS1.pptx
REGRESSION METasdfghjklmjhgftrHODS1.pptxREGRESSION METasdfghjklmjhgftrHODS1.pptx
REGRESSION METasdfghjklmjhgftrHODS1.pptx
 
Regression
RegressionRegression
Regression
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Lect w8 w9_correlation_regression
Lect w8 w9_correlation_regressionLect w8 w9_correlation_regression
Lect w8 w9_correlation_regression
 
rugs koco.pptx
rugs koco.pptxrugs koco.pptx
rugs koco.pptx
 
Regression
Regression Regression
Regression
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
11. simple regression and correlation analysis
11. simple regression and correlation analysis11. simple regression and correlation analysis
11. simple regression and correlation analysis
 
Correlations
CorrelationsCorrelations
Correlations
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
regression
regressionregression
regression
 
correlationandregression-180119033043.pdf
correlationandregression-180119033043.pdfcorrelationandregression-180119033043.pdf
correlationandregression-180119033043.pdf
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Statistics ppt
Statistics pptStatistics ppt
Statistics ppt
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 

Recently uploaded (20)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 

CORRELATION AND REGRESSION.pptx

  • 2. INTRODUCTION • We deal with data that consists of random pairs ( or sets) of observations. Elements in each are observations from the same subject Ways to deal with such data • Ignore any relation between the variables and analyze them separately • Use correlation to describe the intensity of association between the two variables • Use regression analysis to assess the degree and nature of association between the variables
  • 3. CORRELATION • If we have two continuous variables X and Y we can summarize them with five parameters namely • The two means(µx, µy) • The two variances(σx,σy) and • The covariance (σxy)
  • 4. Covariance • The sample covariance is calculated as the sum of cross products ( of deviations ) divided by the degrees of freedom
  • 5. Properties of covariance • If large values of X pair with large values of Y and small values of x with small values of y the covariance will be positive. • If large x go with small Y and vice versa then the covariance will be negative • If X and Y are independent then covariance will be zero
  • 6. Correlation coefficient • Can replace the covariance without any loss of information. Its denoted by ρ while the statistic is denoted by r
  • 7. Computation of r The sample correlation coefficient can be calculated by
  • 8. Properties of correlation coefficient • The value of r is always between -1 and 1. • Positive values indicate a positive association between the variables • Negative values indicate a negative association between the variables • If r=1 or r =-1 then all of the cases fall on a straight line
  • 9. Coefficient of determination • This is the square of the correlation coefficient. • Recall that the total sum of squares is a measure of variability of a variable
  • 10. Cont • The sum of squares may be given as • Similarly Is a measure of the total joint variability of X and Y
  • 11. Cont • Measure of variability of Y over and above that of the joint variability of both X and Y (SS[Y|X]) is called the sum of squares due to regression of Y on X denotes as SSr • It can be shown that r2 is the ratio of SSr to SSto
  • 12. Cont` • Coefficient of determination is therefore a measure of variability in Y that is explained by the variable X
  • 13. Properties of r2 • Coefficeient of determination lies between 0 and 1 • When the variables are highly correlated r2 is near 1 and near 0 when they are not correlated.
  • 14. Example • Consider the following data X Y 9 0 9 9 8 1 5 1 7 9 -Find the variance of x and y and the covariance of x and y Find the correlation coefficient and the coefficient of determination
  • 16. Sampling distribution of r • The sampling distribution is only symmetric when the parameter ρ=0. • It becomes skewed as ρ moves away from 0 • Hence we cannot use CLT in computing confidence interval for ρ and in hypothesis testing • Two variables are correlated if r>0.5 and the sample is large enough
  • 17. Testing hypothesis about ρ Test H0:ρ=0 • Recall that if ρ=0 then the two variables are not correlated • The test assesses whether there is correlation between variables .The test statistic
  • 18. Hypothesis testing cont` Test H0 :ρ=ρ0 whereρ0 is not equal to zero. We transform to z` and the test statistic is Where , and 95% C.I will be given by z`±1.96×σz
  • 19. Example • Consider the data used above ; Are x and y correlated?
  • 20. REGRESSION • Model of relationships between some covariates and outcome. • Often used for exploratory settings • Sometimes be used for confirmatory studies • A regression line is an equation that describes the relationship between a response variable y(outcome) and an explanatory variable x( covariates.
  • 21. Regression continued • Statistical relationships may be linear, exponential , polynomial logarithmic etc • Simplest form is the linear • Linear means linear in the coefficients i.e. y is a linear functions of the coefficients • Non linear relations can be modified into forms that are approximately linear through the transformation
  • 22. Simple linear regression • Linear relationship may be summarized using an equation y = 0 + 1x where 0 is the intercept and 1 the slope of the line. • For observation i ( i = 1,2,…,10 ) whose value of the explanatory variable is xi one would expect the corresponding response yi to be such that E(yi) = 0 + 1xi
  • 23. • The statistical model fitted for a simple linear regression is of the form yi= 0 + 1xi +εi i = 1, ..., n
  • 24. EXAMPLE Study of the effect of temperature on the rate of development of the potato leafhooper, Empoasca fabae. The response (y) was the mean length of the development period (in days) from egg to adult. Temperature (F) Mean length (days) 59.8 30.2 67.6 27.3 70.0 26.8 70.4 23.3 74.0 19.1 75.3 19.0 78.0 16.5 80.4 15.9 81.4 14.8 83.2 14.2
  • 25. Mean length of development period of potato leafhopper versus temperature 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90
  • 26. Genstat Analysis using Stats, Regression Analysis, Linear Models…
  • 27. Output from Genstat • Regression Analysis • Response variate: length • Fitted terms: Constant, temp • Summary of analysis • d.f. s.s. m.s. v.r. F pr. • Regression 1 282.28 282.282 120.85 <.001 • Residual 8 18.69 2.336 • Total 9 300.97 33.441 • Estimates of parameters • estimate s.e. t(8) t pr. • Constant 78.09 5.24 14.90 <.001 • temp -0.7753 0.0705 -10.99 <.001
  • 28. ASSUMPTIONS • Error terms have constant variance( Homoscedascity) • The error terms are independent • The error terms are normally distributed • The regression function is linear • Outliers • Important independent variables in the model – Must be checked
  • 29. How to check the assumptions • Diagnostic plots • The plots tell you whether the regression is even appropriate. • Include univariate plots, bivariate plots, Residual analysis plots
  • 30. Univariate plots of X and Y • To look for outliers • Examine the shape of the distribution • Include box plots, stem plots , histograms and dot plots for x and y
  • 31. Bivariate plots • Plots of X vs Y • Is the relationship between the two variables linear? • Are there two dimensional outliers? • Does the assumption of constant variance look reasonable?
  • 32. Plots of residuals versus X • Useful for detecting non linearity • Any observable pattern in the residual versus X plot indicate a problem with model assumption.
  • 33. Plot the residuals versus Y' • For one predictor variable its has same information as previous. • For multiple linear regression the plot lets us examine patterns of the residuals with increasing response.
  • 34. Plot the standardized residuals versus x