SlideShare a Scribd company logo
Prepared by: Sudip Pokhrel
Probability and Statistics
Contents Unit 5: Correlation and Regression Analysis
Correlation
Definition, Scatter diagram, Karl Pearson’s
correlation coefficient
Definition:
 Correlation analysis is a statistical tool which studies the
association or relationship between two or more variables.
 Correlation means association - more precisely it is a measure of
the extent to which two or more variables are related.
• A scatter diagram (Also known as scatter plot, scatter graph, and
correlation chart) is a tool for analyzing relationships between two
variables for determining how closely the two variables are related.
• One variable is plotted on the horizontal axis and the other is plotted
on the vertical axis. The pattern of their intersecting points can
graphically show relationship patterns.
 Rectangular coordinate
 Two quantitative variables
 One variable is called independent (X) and the
second is called dependent (Y)
 Points are not joined
 No frequency table
 Most common way for visualizing the association
between two quantitative variables
 What we need to see in scatter plot
i) Linearity (Straight line) ii) Spread
iii) Outliers iv) Correlation
Scatter Diagram
Y
* *
*
X
Scatter Plots
The pattern of data is indicative of the type of relationship between
two variables:
Positive Relationship
Negative Relationship
No Relationship
• Positive Correlation: The correlation is said to be positive correlation if the
values of two variables changing with same direction.
Ex. Pub. Exp. & sales, Height & weight, study time and grades.
• Negative correlation: The correlation is said to be negative correlation when the
values of variables change with opposite direction.
Ex. Price & qty. demand, alcohol consumption and driving ability.
Positive Relationship
Negative Relationship
Strength
Age of buildings
No relationship
Scatter Diagram (Graphic method)
+ve Correlation
Zero correlation
-ve correlation
Linear and Non- Linear correlation
Simple Correlation Coefficient
The most common measure of correlation; also called Pearson
coefficient of correlation
 Is an index of relationship between two variables
Reflects the degree of linear relationship between two variables
It is symmetric in nature
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the association as illustrated by
the following diagram.
Interpretation
Degree of Correlation Positive Negative
Perfect Correlation +1 -1
Very high degree + 0.9 or more - 0.9 or more
High degree + 0.75 to + 0.89 - 0.75 to - 0.89
Moderate degree + 0.50 to + 0.74 - 0.50 to - 0.74
Low degree + 0.25 to + 0.49 - 0.25 to - 0.49
Very low degree Less than + 0.25 Less than - 0.25
No correlation 0 (Zero)
Assumptions:
Two variables should be measured at the interval or ratio level (i.e.,
continuous)
There is a linear relationship between two variables.
There should be no significant outliers.
Variables should be approximately normally distributed.
r =
xy −
x y
n
x2 −
( x)2
n
. y2 −
( y)2
n
How to compute the simple correlation
coefficient (r)
Calculation Example
Out puts
In No.
Years of
Experience
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
y=321 x=73 xy=3142 y2
=14111
x2
=713
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12 14
0.886
]
(321)
][8(14111)
(73)
[8(713)
(73)(321)
8(3142)
]
y)
(
)
y
][n(
x)
(
)
x
[n(
y
x
xy
n
r
2
2
2
2
2
2









   
  
Years of experiance, x
Out put
y
Calculation Example
(continued)
r = 0.886 → relatively strong positive
linear association between x and y
Interpretation:
For example, If r= 0.7, then r*r = 0.7*0.7 = 0.49= 0.49*100=
49%
About 49% of the variation (out of total variation) in variable1
is explained by variable2 and remaining 51% is due to
unknown factors.
Partial Correlation Coefficient
Partial correlation estimates the relationship between two variables
while removing the influence of a third variable from the
relationship.
Examples: Relationship between a guy and girl while removing the
influence of effect of video games
Relationship between unit sales of ice cream and profit removing
the influence of daily temperature.
Assumptions
• You have one (dependent) variable and one (independent) variable and these are
both measured on a continuous scale (i.e., they are measured on
an interval or ratio scale).
• You have one or more control variables, also known as covariates (i.e., control
variables are just variables that you are using to adjust the relationship between the
other two variables; that is, your dependent and independent variables). These control
variables are also measured on a continuous scale (i.e., they are continuous
variables).
• There needs to be a linear relationship between all three variables. That is, all possible
pairs of variables must show a linear relationship.
• There should be no significant outliers.
• Your variables should be approximately normally distributed.
How to compute the Partial correlation coefficient (r)
Where,
• rAB = simple correlation coeff. between A and B
• rAC = simple correlation coeff. between A and C
• rBC = simple correlation coeff. between B and C
Note: rAB = rBA, rAC= rCA, rBC = rCB
From above formula,
We calculate partial correlation coefficient between variables A
and B , assuming variable C as constant.
• What will be the formula if we wanted to calculate partial
correlation coefficient between B and C assuming A as constant?
• What will be the formula if we wanted to calculate partial
correlation coefficient between A and C assuming B as constant?
Note: a) It’s coefficient always lies between -1 to +1 b) rBC.A = rCB.A and
so on. i.e. the subscript of left hand side do not affect the value c)
Square of partial correlation coefficient gives coefficient of partial
determination
Interpretation: Out of total variation about 25% of the variation in the
variable A has been explained by variable B assuming variable C as constant
Daily Temperature
(* C)
Profit
Sales Unit
Serial no.
25
120
70
1
20
80
60
2
30
120
80
3
27
100
50
4
21
115
60
5
32
135
90
6
1. Find Partial Correlation Coefficient Between Ice- Cream’s Sales Unit And
Profit Assuming Daily Temperature As Constant
2. Find Partial Correlation Coefficient Between Ice- Cream’s Sales Unit And
Daily Temperature Assuming Profit As Constant
3. Find Partial Correlation Coefficient Between Ice- Cream’s Profit And Daily
Temperature Assuming Sales Unit As Constant.
4. Also calculate coefficient of partial determination for questions 1, 2 and 3. Also
interpret results.
Note: Calculate simple correlation coefficient between Sales Unit vs Daily
Temperature, Profit vs Daily Temperature and Sales Unit vs Profit.
Multiple Correlation Coefficient
• The multiple correlation coefficient denoting a correlation of one
variable with multiple other variables.
• The multiple correlation coefficient is denoted as RA.BCDE……K
• Which denotes that A is correlated with B, C, D, up to K variables
• Its value lies between 0 and 1
How To Compute Multiple Correlation Coefficient (R)
• What will be the formula if we wanted to calculate multiple correlation
coefficient assuming B as dependent variable?
• What will be the formula if we wanted to calculate multiple correlation
coefficient assuming C as dependent variable?
Note: a) It’s coefficient always lies between 0 to +1 i.e. Always Non negative b)
rA.BC = rA.CB and so on. i.e. the subscript of right hand side do not affect the value
c) Square of multiple correlation coefficient gives coefficient of multiple
determination.
Coefficient of Multiple Determination
• The square of multiple correlation coefficients is called the coeff. of multiple
determination
• It is denoted by R2
1.23 , R2
2.13 , R2
3.12
• Let multiple corr. coeff. of yields of a wheat (x1) and joint effects fertilizer (X2)and
quality of seeds (X3) on yields of wheat (X1) is
R 1.23= 0.9, and R2
1.23 = 0.81= 0.81*100= 81% then R2
1.23 is interpreted as 81% of
variation on yields of a wheat is explained by variables fertilizer and quality of seeds
and remaining 19% by unknown factors.
Limitations Of Correlation
• We are only considering LINEAR relationships
• Correlation (r) NOT resistant to outliers
• There may be variables other than x which are not studied, yet do influence
the response variable
• A strong correlation does NOT imply cause and effect relationship
Daily Temperature
(* C)
Profit
Sales Unit
Serial no.
25
120
70
1
20
80
60
2
30
120
80
3
27
100
50
4
21
115
60
5
32
135
90
6
1. Find Multiple Correlation Coefficient between Daily temperature and
profit assuming Ice- Cream’s Sales Unit as dependent Variable.
• Find Multiple Correlation Coefficient between Profit and Unit sales
of ice-cream assuming Ice- Cream’s Daily Temperature As
dependent Variable.
• Find Multiple Correlation Coefficient between Daily temperature
and Unit sales assuming Profit as dependent Variable.
ΣX1= 10 ΣX2= 20 ΣX3= 30
ΣX1
2= 20 ΣX2
2= 68 ΣX3
2= 170
ΣX1 X2= 10 ΣX1 X3= 15 ΣX2 X3= 64
A sample of 10 values of the variables X1, X2 and X3 were obtained as
Find
a) Partial correlation between X2 and X3 eliminating the effect of X1 also calculate
and interpret coefficient of partial determination.
b) Multiple correlation between X2 and X3 assuming X1 as dependent variable also
calculate and interpret coefficient of multiple determination.
Regression Analysis
Regression Analysis
 Regression Analysis is a very powerful tool in the field of statistical analysis in
predicting the value of one variable, given the value of another variable, when
those variables are related to each other.
 It investigates the relationship between a dependent variable (target) and
independent variable(s) (predictor).
 Regression Analysis is mathematical measure of average relationship between two
or more variables.
It is a statistical tool used in prediction of value of unknown variable from a known
variable.
 There are various kinds of regression techniques available to make predictions.
 These techniques are mostly driven by three metrics (number of independent
variables, type of dependent variables and shape of regression line).
 There are multiple benefits of using regression analysis. They are as follows:
• It indicates the significant relationships between dependent variable and
independent variable.
• It indicates the strength of impact of multiple independent variables on a
dependent variable.
• Regression can be used with many continuous and binary independent
variables (x).
Linear Regression
 It is one of the most widely known modeling technique. Linear regression is
usually among the first few topics which people pick while learning predictive
modeling.
 In this technique, the dependent variable is continuous, independent variable(s)
can be continuous or discrete, and nature of regression line is linear.
 Linear Regression establishes a relationship between dependent variable (Y) and
one or more independent variables (X) using a best fit straight line.
Regression Equation:
The algebraic expression of the of regression line are called regression equation.
For two variables having one dependent variable and one independent variable
there are two regression equation.
Regression equation of y on x given by y = a+bx in which y is dependent
variable and x is independent variable. – modern
Regression equation of x on y given by x = a+by in which x is dependent
variable and y is independent variable.--- classical
Note: (Regression equation of y on x ) ≠ (Regression equation of x on y).
What is Simple Linear Regression?
• Simple linear regression is a statistical method that allows us to
summarize and study relationships between two continuous
(quantitative) variables:
• One variable, denoted x, is regarded as the predictor, explanatory,
or independent variable.
• The other variable, denoted y, is regarded as the response, outcome,
or dependent variable.
• Simple linear regression gets its adjective "simple" because it concerns
the study of only one predictor variable
Dependent variables
• The single variable being explained/predicted by the regression model
• Denoted by y- variable
Independent variable
• The explanatory variable(s) used to predict the dependent variable
• Denoted by x- variables
Estimation of Coefficients using Least Square Method(OLS):
The regression equation of y on x given by y = a+bx in which y is dependent
variable and x is independent variable.
The value of a and b are determined by using the principle of least square by
minimizing error sum of square.
Here,
error(e) = (y − 𝑦) , so that 𝑒2
= (y − 𝑦)2
After differentiating both w.r.t. to ‘a’ and ‘b’ we get two equations;
𝒚 = 𝒏𝒂 + 𝒃 𝒙 ………….. (i)
𝒙𝒚 = 𝒂 𝒙 + 𝒃 𝒙𝟐 …….. (ii)
 Shortcut Method:
Here, u = x − 𝐴 and v = y − 𝐵 , then the regression equation v on u is;
v = a+bu
and the value or a and b are calculated as;
𝒗 = 𝒏𝒂 + 𝒃 𝒖 ………….. (i)
𝒖𝒗 = 𝒂 𝒖 + 𝒃 𝒖𝟐
…….. (ii)
Then substitute the value of ‘u’ and ‘v’ to get equation y = a + bx.
 Step Deviation Method:
Here, ul =
x−A
h
and vl =
y−B
h
and the value of a and b are calculated as; vl = 𝒏𝒂 + 𝒃 ul………….. (i)
𝑢l
vl
= 𝒂 ul
+ 𝒃 ul𝟐
….. (ii)
Then substitute the value of ul
and vl
to get equation y = a + bx
Or using formulas we can obtained the estimate values of
and as follows
Standard Error and Coefficient of Determination
Properties of Regression Coefficient:
 Correlation coefficient is geometric mean of two regression coefficient
r = +_
𝒃𝒙𝒚 × 𝒃𝒚𝒙
 If one of the regression coefficient is greater than unity then other must be less
than unity.
 Product of two regression coefficients must be less than or equal to 1.
𝒃𝒙𝒚 × 𝒃𝒚𝒙 ≤ 1
 Regression coefficients are independent of change of origin but not a scale.
Football Games 20 30 10 12 15 25 34
Minor Accidents 6 9 4 5 7 8 9
Q1. The city council of Bowie, Maryland, has gathered data on the no. of minor
traffic accidents and the no. of youth football games that occur in town over a
weekend.
•Plot these data.
•Develop the equation that best describes these data.
•Predict the no. of minor traffic accidents that will occur on a weekend
during which 17 football games take place in Bowie.
•Calculate coeff of determination.
Q2. Ms. Patsy Knowlet, a water quality engineer, noted that there
seemed to be a close connection between an important streamflow
water quality parameter, y, and the flow, x m3/s. She found that 9 pairs
of observations yielded the following data: ∑x = 15.2; ∑x2 = 57.6; ∑y =
45.6; ∑y2 = 518.3; ∑xy = 172.6. She would like to develop an equation
that would allow her to predict y knowing x.
• Find the best estimate of the linear regression line of y on x.
• Find the correlation coefficient between x and y.
Multiple Regression
• It is the logical extension of simple linear regression
• Multiple regression extends linear regression to allow for 2 or
more independent variables.
• There is still only one dependent variable.
The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)
i
ki
k
2i
2
1i
1
0
i e
X
β
X
β
X
β
β
Y 




 
Multiple Regression Model with k Independent Variables:
Y-intercept Population slopes Random Error
Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
ki
k
2i
2
1i
1
i X
b
X
b
X
b
a
Ŷ 



 
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated
intercept
We will always use software to obtain the regression slope
coefficients and other regression summary measures.
Multiple regression model
Two independent variables
model
Y
X1
X2
2
2
1
1
X
b
X
b
a
Ŷ 


Multiple Regression Equation
(continued)
Example:
2 Independent Variables
• A distributor of frozen desert pies wants to evaluate
factors thought to influence demand
• Dependent variable: Pie sales (units per week)
• Independent variables: Price (in $)
Advertising ($100’s)
• Data are collected for 15 weeks
350 5.5 3.3 1925 1155 18.15 122500 30.25 10.89
460 7.5 3.3 3450 1518 24.75 211600 56.25 10.89
350 8 3 2800 1050 24 122500 64 9
430 8 3 3440 1290 24 184900 64 9
350 6.8 3 2380 1050 20.4 122500 46.24 9
380 7.5 4 2850 1520 30 144400 56.25 16
430 4.5 3 1935 1290 13.5 184900 20.25 9
470 6.4 3.7 3008 1739 23.68 220900 40.96 13.69
450 7 3.5 3150 1575 24.5 202500 49 12.25
490 5 4 2450 1960 20 240100 25 16
340 7.2 3.5 2448 1190 25.2 115600 51.84 12.25
300 7.9 3.2 2370 960 25.28 90000 62.41 10.24
440 5.9 4 2596 1760 23.6 193600 34.81 16
450 5 3.5 2250 1575 17.5 202500 25 12.25
300 7 2.7 2100 810 18.9 90000 49 7.29
5990 99.2 50.7 39152 20442 333.46 2448500 675.26 173.75
1
X 2
X
Y 1
YX 2
YX 2
1X
X 2
1
X 2
2
X
2
Y
Total

 








 


  


2
2
2
2
1
1
2
2
2
1
2
2
1
1
1
1
2
2
1
1
X
b
X
X
b
X
a
Y
X
X
X
b
X
b
X
a
Y
X
X
b
X
b
na
Y
5990 =15a + 99.2 1
b + 50.7
39152 = 99.2a +675.2 +333.46
1
b
20442 = 333.46a + 50.7 +173.75
1
b
2
b
2
b
2
b
2
2
1
1
X
b
X
b
a
Ŷ 


Let the linear estimate equation be
The normal equations are as follows:
Solving above three simultaneous equations we obtain best
estimate values of a, 1
b 2
b
and
a=306.526 , = -24.975, and =74.131
1
b 2
b
The Multiple Regression Equation
ertising)
74.131(Adv
ce)
24.975(Pri
-
306.526
Sales 

b1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
b2 = 74.131: sales will
increase, on average,
by 74.131 pies per
week for each $100
increase in
advertising, net of the
effects of changes
due to price
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
Using The Equation to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Predicted sales
is 428.62 pies
428.62
(3.5)
74.131
(5.50)
24.975
-
306.526
ertising)
74.131(Adv
ce)
24.975(Pri
-
306.526
Sales





Note that Advertising is
in $100’s, so $350
means that X2 = 3.5
Monthly expenditure
on food (Rs. 1000) 10 15 20 25 30 35 40 45
Monthly Income (Rs.
1000)
20 40 60 50 70 60 80 85
Size of family (No.)
3 4 5 6 8 7 5 9
Q1. A household survey of monthly expenditure on food yield following
data:
a) Estimate the line of best fit.
b) Estimate the expenditure on food of a family with monthly income
Rs. 75000 and having 10 family members

More Related Content

Similar to Correlation and Regression Analysis.pptx

Regression & correlation coefficient
Regression & correlation coefficientRegression & correlation coefficient
Regression & correlation coefficient
MuhamamdZiaSamad
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
Dr. Tushar J Bhatt
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdf
Ravinandan A P
 
Regression
RegressionRegression
Regression
Sauravurp
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
CallplanetsDeveloper
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
CallplanetsDeveloper
 
correlation.ppt
correlation.pptcorrelation.ppt
correlation.ppt
NayanPatil59
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
Rohit77460
 
5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx
5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx
5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx
AnnMichelleJolo
 
Correlation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptxCorrelation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptx
HaimanotReta
 
Statistics
Statistics Statistics
Statistics
KafiPati
 
Correlation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docxCorrelation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docx
faithxdunce63732
 
Correlations
CorrelationsCorrelations
Regression
RegressionRegression
Regression
nandini patil
 
Correlation
CorrelationCorrelation
correlation-analysis.pptx
correlation-analysis.pptxcorrelation-analysis.pptx
correlation-analysis.pptx
SoujanyaLk1
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
nuwan udugampala
 

Similar to Correlation and Regression Analysis.pptx (20)

Regression & correlation coefficient
Regression & correlation coefficientRegression & correlation coefficient
Regression & correlation coefficient
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdf
 
Regression
RegressionRegression
Regression
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
 
correlation.ppt
correlation.pptcorrelation.ppt
correlation.ppt
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
 
5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx
5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx
5.-SIMPLE-LINEAR-REGRESSION-MEASURES-OF-CORRELATION.pptx
 
Correlation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptxCorrelation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptx
 
Statistics
Statistics Statistics
Statistics
 
Correlation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docxCorrelation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docx
 
Correlations
CorrelationsCorrelations
Correlations
 
Regression
RegressionRegression
Regression
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation continued
Correlation continuedCorrelation continued
Correlation continued
 
correlation-analysis.pptx
correlation-analysis.pptxcorrelation-analysis.pptx
correlation-analysis.pptx
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
 
Chapter05
Chapter05Chapter05
Chapter05
 

Recently uploaded

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
obonagu
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx
benykoy2024
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
bhadouriyakaku
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 

Recently uploaded (20)

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx01-GPON Fundamental fttx ftth basic .pptx
01-GPON Fundamental fttx ftth basic .pptx
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 

Correlation and Regression Analysis.pptx

  • 1. Prepared by: Sudip Pokhrel Probability and Statistics
  • 2. Contents Unit 5: Correlation and Regression Analysis
  • 3. Correlation Definition, Scatter diagram, Karl Pearson’s correlation coefficient
  • 4. Definition:  Correlation analysis is a statistical tool which studies the association or relationship between two or more variables.  Correlation means association - more precisely it is a measure of the extent to which two or more variables are related.
  • 5.
  • 6. • A scatter diagram (Also known as scatter plot, scatter graph, and correlation chart) is a tool for analyzing relationships between two variables for determining how closely the two variables are related. • One variable is plotted on the horizontal axis and the other is plotted on the vertical axis. The pattern of their intersecting points can graphically show relationship patterns.
  • 7.  Rectangular coordinate  Two quantitative variables  One variable is called independent (X) and the second is called dependent (Y)  Points are not joined  No frequency table  Most common way for visualizing the association between two quantitative variables  What we need to see in scatter plot i) Linearity (Straight line) ii) Spread iii) Outliers iv) Correlation Scatter Diagram Y * * * X
  • 8. Scatter Plots The pattern of data is indicative of the type of relationship between two variables: Positive Relationship Negative Relationship No Relationship
  • 9. • Positive Correlation: The correlation is said to be positive correlation if the values of two variables changing with same direction. Ex. Pub. Exp. & sales, Height & weight, study time and grades. • Negative correlation: The correlation is said to be negative correlation when the values of variables change with opposite direction. Ex. Price & qty. demand, alcohol consumption and driving ability.
  • 13. Scatter Diagram (Graphic method) +ve Correlation Zero correlation -ve correlation
  • 14. Linear and Non- Linear correlation
  • 15. Simple Correlation Coefficient The most common measure of correlation; also called Pearson coefficient of correlation  Is an index of relationship between two variables Reflects the degree of linear relationship between two variables It is symmetric in nature The value of r ranges between ( -1) and ( +1) The value of r denotes the strength of the association as illustrated by the following diagram.
  • 16. Interpretation Degree of Correlation Positive Negative Perfect Correlation +1 -1 Very high degree + 0.9 or more - 0.9 or more High degree + 0.75 to + 0.89 - 0.75 to - 0.89 Moderate degree + 0.50 to + 0.74 - 0.50 to - 0.74 Low degree + 0.25 to + 0.49 - 0.25 to - 0.49 Very low degree Less than + 0.25 Less than - 0.25 No correlation 0 (Zero)
  • 17. Assumptions: Two variables should be measured at the interval or ratio level (i.e., continuous) There is a linear relationship between two variables. There should be no significant outliers. Variables should be approximately normally distributed.
  • 18. r = xy − x y n x2 − ( x)2 n . y2 − ( y)2 n How to compute the simple correlation coefficient (r)
  • 19. Calculation Example Out puts In No. Years of Experience y x xy y2 x2 35 8 280 1225 64 49 9 441 2401 81 27 7 189 729 49 33 6 198 1089 36 60 13 780 3600 169 21 7 147 441 49 45 11 495 2025 121 51 12 612 2601 144 y=321 x=73 xy=3142 y2 =14111 x2 =713
  • 20. 0 10 20 30 40 50 60 70 0 2 4 6 8 10 12 14 0.886 ] (321) ][8(14111) (73) [8(713) (73)(321) 8(3142) ] y) ( ) y ][n( x) ( ) x [n( y x xy n r 2 2 2 2 2 2                 Years of experiance, x Out put y Calculation Example (continued) r = 0.886 → relatively strong positive linear association between x and y
  • 21.
  • 22. Interpretation: For example, If r= 0.7, then r*r = 0.7*0.7 = 0.49= 0.49*100= 49% About 49% of the variation (out of total variation) in variable1 is explained by variable2 and remaining 51% is due to unknown factors.
  • 24. Partial correlation estimates the relationship between two variables while removing the influence of a third variable from the relationship. Examples: Relationship between a guy and girl while removing the influence of effect of video games Relationship between unit sales of ice cream and profit removing the influence of daily temperature.
  • 25. Assumptions • You have one (dependent) variable and one (independent) variable and these are both measured on a continuous scale (i.e., they are measured on an interval or ratio scale). • You have one or more control variables, also known as covariates (i.e., control variables are just variables that you are using to adjust the relationship between the other two variables; that is, your dependent and independent variables). These control variables are also measured on a continuous scale (i.e., they are continuous variables). • There needs to be a linear relationship between all three variables. That is, all possible pairs of variables must show a linear relationship. • There should be no significant outliers. • Your variables should be approximately normally distributed.
  • 26. How to compute the Partial correlation coefficient (r)
  • 27. Where, • rAB = simple correlation coeff. between A and B • rAC = simple correlation coeff. between A and C • rBC = simple correlation coeff. between B and C Note: rAB = rBA, rAC= rCA, rBC = rCB From above formula, We calculate partial correlation coefficient between variables A and B , assuming variable C as constant.
  • 28. • What will be the formula if we wanted to calculate partial correlation coefficient between B and C assuming A as constant? • What will be the formula if we wanted to calculate partial correlation coefficient between A and C assuming B as constant? Note: a) It’s coefficient always lies between -1 to +1 b) rBC.A = rCB.A and so on. i.e. the subscript of left hand side do not affect the value c) Square of partial correlation coefficient gives coefficient of partial determination
  • 29. Interpretation: Out of total variation about 25% of the variation in the variable A has been explained by variable B assuming variable C as constant
  • 30. Daily Temperature (* C) Profit Sales Unit Serial no. 25 120 70 1 20 80 60 2 30 120 80 3 27 100 50 4 21 115 60 5 32 135 90 6 1. Find Partial Correlation Coefficient Between Ice- Cream’s Sales Unit And Profit Assuming Daily Temperature As Constant
  • 31. 2. Find Partial Correlation Coefficient Between Ice- Cream’s Sales Unit And Daily Temperature Assuming Profit As Constant 3. Find Partial Correlation Coefficient Between Ice- Cream’s Profit And Daily Temperature Assuming Sales Unit As Constant. 4. Also calculate coefficient of partial determination for questions 1, 2 and 3. Also interpret results. Note: Calculate simple correlation coefficient between Sales Unit vs Daily Temperature, Profit vs Daily Temperature and Sales Unit vs Profit.
  • 33. • The multiple correlation coefficient denoting a correlation of one variable with multiple other variables. • The multiple correlation coefficient is denoted as RA.BCDE……K • Which denotes that A is correlated with B, C, D, up to K variables • Its value lies between 0 and 1
  • 34. How To Compute Multiple Correlation Coefficient (R)
  • 35. • What will be the formula if we wanted to calculate multiple correlation coefficient assuming B as dependent variable? • What will be the formula if we wanted to calculate multiple correlation coefficient assuming C as dependent variable? Note: a) It’s coefficient always lies between 0 to +1 i.e. Always Non negative b) rA.BC = rA.CB and so on. i.e. the subscript of right hand side do not affect the value c) Square of multiple correlation coefficient gives coefficient of multiple determination.
  • 36. Coefficient of Multiple Determination • The square of multiple correlation coefficients is called the coeff. of multiple determination • It is denoted by R2 1.23 , R2 2.13 , R2 3.12 • Let multiple corr. coeff. of yields of a wheat (x1) and joint effects fertilizer (X2)and quality of seeds (X3) on yields of wheat (X1) is R 1.23= 0.9, and R2 1.23 = 0.81= 0.81*100= 81% then R2 1.23 is interpreted as 81% of variation on yields of a wheat is explained by variables fertilizer and quality of seeds and remaining 19% by unknown factors.
  • 37. Limitations Of Correlation • We are only considering LINEAR relationships • Correlation (r) NOT resistant to outliers • There may be variables other than x which are not studied, yet do influence the response variable • A strong correlation does NOT imply cause and effect relationship
  • 38. Daily Temperature (* C) Profit Sales Unit Serial no. 25 120 70 1 20 80 60 2 30 120 80 3 27 100 50 4 21 115 60 5 32 135 90 6 1. Find Multiple Correlation Coefficient between Daily temperature and profit assuming Ice- Cream’s Sales Unit as dependent Variable.
  • 39. • Find Multiple Correlation Coefficient between Profit and Unit sales of ice-cream assuming Ice- Cream’s Daily Temperature As dependent Variable. • Find Multiple Correlation Coefficient between Daily temperature and Unit sales assuming Profit as dependent Variable.
  • 40. ΣX1= 10 ΣX2= 20 ΣX3= 30 ΣX1 2= 20 ΣX2 2= 68 ΣX3 2= 170 ΣX1 X2= 10 ΣX1 X3= 15 ΣX2 X3= 64 A sample of 10 values of the variables X1, X2 and X3 were obtained as Find a) Partial correlation between X2 and X3 eliminating the effect of X1 also calculate and interpret coefficient of partial determination. b) Multiple correlation between X2 and X3 assuming X1 as dependent variable also calculate and interpret coefficient of multiple determination.
  • 41.
  • 42.
  • 44. Regression Analysis  Regression Analysis is a very powerful tool in the field of statistical analysis in predicting the value of one variable, given the value of another variable, when those variables are related to each other.  It investigates the relationship between a dependent variable (target) and independent variable(s) (predictor).  Regression Analysis is mathematical measure of average relationship between two or more variables. It is a statistical tool used in prediction of value of unknown variable from a known variable.
  • 45.  There are various kinds of regression techniques available to make predictions.  These techniques are mostly driven by three metrics (number of independent variables, type of dependent variables and shape of regression line).  There are multiple benefits of using regression analysis. They are as follows: • It indicates the significant relationships between dependent variable and independent variable. • It indicates the strength of impact of multiple independent variables on a dependent variable. • Regression can be used with many continuous and binary independent variables (x).
  • 46.
  • 47. Linear Regression  It is one of the most widely known modeling technique. Linear regression is usually among the first few topics which people pick while learning predictive modeling.  In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear.  Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line.
  • 48. Regression Equation: The algebraic expression of the of regression line are called regression equation. For two variables having one dependent variable and one independent variable there are two regression equation. Regression equation of y on x given by y = a+bx in which y is dependent variable and x is independent variable. – modern Regression equation of x on y given by x = a+by in which x is dependent variable and y is independent variable.--- classical Note: (Regression equation of y on x ) ≠ (Regression equation of x on y).
  • 49. What is Simple Linear Regression? • Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: • One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. • The other variable, denoted y, is regarded as the response, outcome, or dependent variable. • Simple linear regression gets its adjective "simple" because it concerns the study of only one predictor variable
  • 50. Dependent variables • The single variable being explained/predicted by the regression model • Denoted by y- variable Independent variable • The explanatory variable(s) used to predict the dependent variable • Denoted by x- variables
  • 51. Estimation of Coefficients using Least Square Method(OLS): The regression equation of y on x given by y = a+bx in which y is dependent variable and x is independent variable. The value of a and b are determined by using the principle of least square by minimizing error sum of square. Here, error(e) = (y − 𝑦) , so that 𝑒2 = (y − 𝑦)2 After differentiating both w.r.t. to ‘a’ and ‘b’ we get two equations; 𝒚 = 𝒏𝒂 + 𝒃 𝒙 ………….. (i) 𝒙𝒚 = 𝒂 𝒙 + 𝒃 𝒙𝟐 …….. (ii)
  • 52.  Shortcut Method: Here, u = x − 𝐴 and v = y − 𝐵 , then the regression equation v on u is; v = a+bu and the value or a and b are calculated as; 𝒗 = 𝒏𝒂 + 𝒃 𝒖 ………….. (i) 𝒖𝒗 = 𝒂 𝒖 + 𝒃 𝒖𝟐 …….. (ii) Then substitute the value of ‘u’ and ‘v’ to get equation y = a + bx.  Step Deviation Method: Here, ul = x−A h and vl = y−B h and the value of a and b are calculated as; vl = 𝒏𝒂 + 𝒃 ul………….. (i) 𝑢l vl = 𝒂 ul + 𝒃 ul𝟐 ….. (ii) Then substitute the value of ul and vl to get equation y = a + bx
  • 53. Or using formulas we can obtained the estimate values of and as follows
  • 54. Standard Error and Coefficient of Determination
  • 55. Properties of Regression Coefficient:  Correlation coefficient is geometric mean of two regression coefficient r = +_ 𝒃𝒙𝒚 × 𝒃𝒚𝒙  If one of the regression coefficient is greater than unity then other must be less than unity.  Product of two regression coefficients must be less than or equal to 1. 𝒃𝒙𝒚 × 𝒃𝒚𝒙 ≤ 1  Regression coefficients are independent of change of origin but not a scale.
  • 56. Football Games 20 30 10 12 15 25 34 Minor Accidents 6 9 4 5 7 8 9 Q1. The city council of Bowie, Maryland, has gathered data on the no. of minor traffic accidents and the no. of youth football games that occur in town over a weekend. •Plot these data. •Develop the equation that best describes these data. •Predict the no. of minor traffic accidents that will occur on a weekend during which 17 football games take place in Bowie. •Calculate coeff of determination.
  • 57. Q2. Ms. Patsy Knowlet, a water quality engineer, noted that there seemed to be a close connection between an important streamflow water quality parameter, y, and the flow, x m3/s. She found that 9 pairs of observations yielded the following data: ∑x = 15.2; ∑x2 = 57.6; ∑y = 45.6; ∑y2 = 518.3; ∑xy = 172.6. She would like to develop an equation that would allow her to predict y knowing x. • Find the best estimate of the linear regression line of y on x. • Find the correlation coefficient between x and y.
  • 58. Multiple Regression • It is the logical extension of simple linear regression • Multiple regression extends linear regression to allow for 2 or more independent variables. • There is still only one dependent variable.
  • 59.
  • 60.
  • 61.
  • 62. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi) i ki k 2i 2 1i 1 0 i e X β X β X β β Y        Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error
  • 63. Multiple Regression Equation The coefficients of the multiple regression model are estimated using sample data ki k 2i 2 1i 1 i X b X b X b a Ŷ       Estimated (or predicted) value of Y Estimated slope coefficients Multiple regression equation with k independent variables: Estimated intercept We will always use software to obtain the regression slope coefficients and other regression summary measures.
  • 65. Two independent variables model Y X1 X2 2 2 1 1 X b X b a Ŷ    Multiple Regression Equation (continued)
  • 66. Example: 2 Independent Variables • A distributor of frozen desert pies wants to evaluate factors thought to influence demand • Dependent variable: Pie sales (units per week) • Independent variables: Price (in $) Advertising ($100’s) • Data are collected for 15 weeks
  • 67. 350 5.5 3.3 1925 1155 18.15 122500 30.25 10.89 460 7.5 3.3 3450 1518 24.75 211600 56.25 10.89 350 8 3 2800 1050 24 122500 64 9 430 8 3 3440 1290 24 184900 64 9 350 6.8 3 2380 1050 20.4 122500 46.24 9 380 7.5 4 2850 1520 30 144400 56.25 16 430 4.5 3 1935 1290 13.5 184900 20.25 9 470 6.4 3.7 3008 1739 23.68 220900 40.96 13.69 450 7 3.5 3150 1575 24.5 202500 49 12.25 490 5 4 2450 1960 20 240100 25 16 340 7.2 3.5 2448 1190 25.2 115600 51.84 12.25 300 7.9 3.2 2370 960 25.28 90000 62.41 10.24 440 5.9 4 2596 1760 23.6 193600 34.81 16 450 5 3.5 2250 1575 17.5 202500 25 12.25 300 7 2.7 2100 810 18.9 90000 49 7.29 5990 99.2 50.7 39152 20442 333.46 2448500 675.26 173.75 1 X 2 X Y 1 YX 2 YX 2 1X X 2 1 X 2 2 X 2 Y Total
  • 68.                     2 2 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 2 1 1 X b X X b X a Y X X X b X b X a Y X X b X b na Y 5990 =15a + 99.2 1 b + 50.7 39152 = 99.2a +675.2 +333.46 1 b 20442 = 333.46a + 50.7 +173.75 1 b 2 b 2 b 2 b 2 2 1 1 X b X b a Ŷ    Let the linear estimate equation be The normal equations are as follows:
  • 69. Solving above three simultaneous equations we obtain best estimate values of a, 1 b 2 b and a=306.526 , = -24.975, and =74.131 1 b 2 b
  • 70. The Multiple Regression Equation ertising) 74.131(Adv ce) 24.975(Pri - 306.526 Sales   b1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, net of the effects of changes due to price where Sales is in number of pies per week Price is in $ Advertising is in $100’s.
  • 71. Using The Equation to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Predicted sales is 428.62 pies 428.62 (3.5) 74.131 (5.50) 24.975 - 306.526 ertising) 74.131(Adv ce) 24.975(Pri - 306.526 Sales      Note that Advertising is in $100’s, so $350 means that X2 = 3.5
  • 72. Monthly expenditure on food (Rs. 1000) 10 15 20 25 30 35 40 45 Monthly Income (Rs. 1000) 20 40 60 50 70 60 80 85 Size of family (No.) 3 4 5 6 8 7 5 9 Q1. A household survey of monthly expenditure on food yield following data: a) Estimate the line of best fit. b) Estimate the expenditure on food of a family with monthly income Rs. 75000 and having 10 family members