A presentation on correlation and regression for engineering students studying probability and statistics. The presentation is designed according to syllabus of Institute of Engineering (IOE), Tribhuvan University. But the course content is similar to that of almost all the engineering universities.
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7. Scattered diagram method
8. Interpretation of correlation coefficient
9. Lines of Regression
10. regression Equations
11. Difference between correlation and regression
12. Related examples
Unit-I, BP801T. BIOSTATISITCS AND RESEARCH METHODOLOGY (Theory)
Correlation: Definition, Karl Pearson’s coefficient of correlation, Multiple correlations -
Pharmaceuticals examples.
Correlation: is there a relationship between 2
variables.
Correlation Analysis
Correlation Analysis
Correlation measures the relationship between two quantitative variables
Linear correlation measures if the ordered paired data follow a straight-line relationship between quantitative variables.
The correlation coefficient (r) computed from the sample data measures the strength and the direction of a linear relationship between two variables.
The range of correlation coefficient is -1 to +1. When there is no linear relationship between the two variables or only a weak relationship, the value of correlation coefficient will be close to 0.
Things to Remember
Correlation coefficient cutoff points
+0.30 to + 0.49 weak positive association.
+ 0.5 to +0.69 medium positive association.
+0.7to + 1.0 strong positive association.
- 0.5 to - 0.69 medium negative association.
- 0.7 to - 1.0 strong negative association.
- 0.30 to - 0.49 weak negative association.
0 to - 0.29 little or no association.
0 to + 0.29 little or no association.
Relationships of Linear Correlation
As x increases, no definite shift in y: no correlation.
As x increase, a definite shift in y: correlation.
Positive correlation: x increases, y increases.
Negative correlation: x increases, y decreases.
If the points exhibit some other nonlinear pattern: no linear relationship.
Example: No correlation.
As x increases, there is no definite shift in y.
Example: Positive/direct correlation.
As x increases, y also increases.
Example: Negative/indirect/inverse correlation.
As x increases, y decreases.
Coefficient of linear correlation: r, measures the strength of the linear relationship between two variables.
Pearson Correlation formula:
Note:
r = +1: perfect positive correlation
r = -1 : perfect negative correlation
Use the calculated value of the coefficient of linear correlation, r, to make an inference about the population correlation coefficient r.
Example 1: Is there a relationship between age of the children and their score on the Child Medical Fear Scale (CMFS), using the data shown in Table 1?
H0: There is no significant relationship between the age of the children and their score on the CMFS
Or
H0: r = 0
IDAge (x)CMFS (y)183129253940410275113569297825893498441011191172812647136421483715935161216171512181323191026201036
Table 1
Scattergram (Scatterplot)
Age (x) = Independent variable, CMFS (y)= Dependent variable
Correlation Coefficient
The Results:
a. Decision: Reject H0.
b. Conclusion: There is evidence to suggest that there is a significant linear relationship between the age of the child and the score on the CMFS.
Answers the question of whether there is a significant linear relationship or not
Simple Linear Regression Analysis
Linear Regression Analysis
Linear Regression analysis finds the equation of the line that predicts the dependent variable based on the independent variable.
210 190 165 150 130 115 100 90 70 60 40 25 35 6.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.1: Correlation
This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7. Scattered diagram method
8. Interpretation of correlation coefficient
9. Lines of Regression
10. regression Equations
11. Difference between correlation and regression
12. Related examples
Unit-I, BP801T. BIOSTATISITCS AND RESEARCH METHODOLOGY (Theory)
Correlation: Definition, Karl Pearson’s coefficient of correlation, Multiple correlations -
Pharmaceuticals examples.
Correlation: is there a relationship between 2
variables.
Correlation Analysis
Correlation Analysis
Correlation measures the relationship between two quantitative variables
Linear correlation measures if the ordered paired data follow a straight-line relationship between quantitative variables.
The correlation coefficient (r) computed from the sample data measures the strength and the direction of a linear relationship between two variables.
The range of correlation coefficient is -1 to +1. When there is no linear relationship between the two variables or only a weak relationship, the value of correlation coefficient will be close to 0.
Things to Remember
Correlation coefficient cutoff points
+0.30 to + 0.49 weak positive association.
+ 0.5 to +0.69 medium positive association.
+0.7to + 1.0 strong positive association.
- 0.5 to - 0.69 medium negative association.
- 0.7 to - 1.0 strong negative association.
- 0.30 to - 0.49 weak negative association.
0 to - 0.29 little or no association.
0 to + 0.29 little or no association.
Relationships of Linear Correlation
As x increases, no definite shift in y: no correlation.
As x increase, a definite shift in y: correlation.
Positive correlation: x increases, y increases.
Negative correlation: x increases, y decreases.
If the points exhibit some other nonlinear pattern: no linear relationship.
Example: No correlation.
As x increases, there is no definite shift in y.
Example: Positive/direct correlation.
As x increases, y also increases.
Example: Negative/indirect/inverse correlation.
As x increases, y decreases.
Coefficient of linear correlation: r, measures the strength of the linear relationship between two variables.
Pearson Correlation formula:
Note:
r = +1: perfect positive correlation
r = -1 : perfect negative correlation
Use the calculated value of the coefficient of linear correlation, r, to make an inference about the population correlation coefficient r.
Example 1: Is there a relationship between age of the children and their score on the Child Medical Fear Scale (CMFS), using the data shown in Table 1?
H0: There is no significant relationship between the age of the children and their score on the CMFS
Or
H0: r = 0
IDAge (x)CMFS (y)183129253940410275113569297825893498441011191172812647136421483715935161216171512181323191026201036
Table 1
Scattergram (Scatterplot)
Age (x) = Independent variable, CMFS (y)= Dependent variable
Correlation Coefficient
The Results:
a. Decision: Reject H0.
b. Conclusion: There is evidence to suggest that there is a significant linear relationship between the age of the child and the score on the CMFS.
Answers the question of whether there is a significant linear relationship or not
Simple Linear Regression Analysis
Linear Regression Analysis
Linear Regression analysis finds the equation of the line that predicts the dependent variable based on the independent variable.
210 190 165 150 130 115 100 90 70 60 40 25 35 6.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.1: Correlation
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
4. Definition:
Correlation analysis is a statistical tool which studies the
association or relationship between two or more variables.
Correlation means association - more precisely it is a measure of
the extent to which two or more variables are related.
5.
6. • A scatter diagram (Also known as scatter plot, scatter graph, and
correlation chart) is a tool for analyzing relationships between two
variables for determining how closely the two variables are related.
• One variable is plotted on the horizontal axis and the other is plotted
on the vertical axis. The pattern of their intersecting points can
graphically show relationship patterns.
7. Rectangular coordinate
Two quantitative variables
One variable is called independent (X) and the
second is called dependent (Y)
Points are not joined
No frequency table
Most common way for visualizing the association
between two quantitative variables
What we need to see in scatter plot
i) Linearity (Straight line) ii) Spread
iii) Outliers iv) Correlation
Scatter Diagram
Y
* *
*
X
8. Scatter Plots
The pattern of data is indicative of the type of relationship between
two variables:
Positive Relationship
Negative Relationship
No Relationship
9. • Positive Correlation: The correlation is said to be positive correlation if the
values of two variables changing with same direction.
Ex. Pub. Exp. & sales, Height & weight, study time and grades.
• Negative correlation: The correlation is said to be negative correlation when the
values of variables change with opposite direction.
Ex. Price & qty. demand, alcohol consumption and driving ability.
15. Simple Correlation Coefficient
The most common measure of correlation; also called Pearson
coefficient of correlation
Is an index of relationship between two variables
Reflects the degree of linear relationship between two variables
It is symmetric in nature
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the association as illustrated by
the following diagram.
16. Interpretation
Degree of Correlation Positive Negative
Perfect Correlation +1 -1
Very high degree + 0.9 or more - 0.9 or more
High degree + 0.75 to + 0.89 - 0.75 to - 0.89
Moderate degree + 0.50 to + 0.74 - 0.50 to - 0.74
Low degree + 0.25 to + 0.49 - 0.25 to - 0.49
Very low degree Less than + 0.25 Less than - 0.25
No correlation 0 (Zero)
17. Assumptions:
Two variables should be measured at the interval or ratio level (i.e.,
continuous)
There is a linear relationship between two variables.
There should be no significant outliers.
Variables should be approximately normally distributed.
18. r =
xy −
x y
n
x2 −
( x)2
n
. y2 −
( y)2
n
How to compute the simple correlation
coefficient (r)
19. Calculation Example
Out puts
In No.
Years of
Experience
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
y=321 x=73 xy=3142 y2
=14111
x2
=713
20. 0
10
20
30
40
50
60
70
0 2 4 6 8 10 12 14
0.886
]
(321)
][8(14111)
(73)
[8(713)
(73)(321)
8(3142)
]
y)
(
)
y
][n(
x)
(
)
x
[n(
y
x
xy
n
r
2
2
2
2
2
2
Years of experiance, x
Out put
y
Calculation Example
(continued)
r = 0.886 → relatively strong positive
linear association between x and y
21.
22. Interpretation:
For example, If r= 0.7, then r*r = 0.7*0.7 = 0.49= 0.49*100=
49%
About 49% of the variation (out of total variation) in variable1
is explained by variable2 and remaining 51% is due to
unknown factors.
24. Partial correlation estimates the relationship between two variables
while removing the influence of a third variable from the
relationship.
Examples: Relationship between a guy and girl while removing the
influence of effect of video games
Relationship between unit sales of ice cream and profit removing
the influence of daily temperature.
25. Assumptions
• You have one (dependent) variable and one (independent) variable and these are
both measured on a continuous scale (i.e., they are measured on
an interval or ratio scale).
• You have one or more control variables, also known as covariates (i.e., control
variables are just variables that you are using to adjust the relationship between the
other two variables; that is, your dependent and independent variables). These control
variables are also measured on a continuous scale (i.e., they are continuous
variables).
• There needs to be a linear relationship between all three variables. That is, all possible
pairs of variables must show a linear relationship.
• There should be no significant outliers.
• Your variables should be approximately normally distributed.
27. Where,
• rAB = simple correlation coeff. between A and B
• rAC = simple correlation coeff. between A and C
• rBC = simple correlation coeff. between B and C
Note: rAB = rBA, rAC= rCA, rBC = rCB
From above formula,
We calculate partial correlation coefficient between variables A
and B , assuming variable C as constant.
28. • What will be the formula if we wanted to calculate partial
correlation coefficient between B and C assuming A as constant?
• What will be the formula if we wanted to calculate partial
correlation coefficient between A and C assuming B as constant?
Note: a) It’s coefficient always lies between -1 to +1 b) rBC.A = rCB.A and
so on. i.e. the subscript of left hand side do not affect the value c)
Square of partial correlation coefficient gives coefficient of partial
determination
29. Interpretation: Out of total variation about 25% of the variation in the
variable A has been explained by variable B assuming variable C as constant
30. Daily Temperature
(* C)
Profit
Sales Unit
Serial no.
25
120
70
1
20
80
60
2
30
120
80
3
27
100
50
4
21
115
60
5
32
135
90
6
1. Find Partial Correlation Coefficient Between Ice- Cream’s Sales Unit And
Profit Assuming Daily Temperature As Constant
31. 2. Find Partial Correlation Coefficient Between Ice- Cream’s Sales Unit And
Daily Temperature Assuming Profit As Constant
3. Find Partial Correlation Coefficient Between Ice- Cream’s Profit And Daily
Temperature Assuming Sales Unit As Constant.
4. Also calculate coefficient of partial determination for questions 1, 2 and 3. Also
interpret results.
Note: Calculate simple correlation coefficient between Sales Unit vs Daily
Temperature, Profit vs Daily Temperature and Sales Unit vs Profit.
33. • The multiple correlation coefficient denoting a correlation of one
variable with multiple other variables.
• The multiple correlation coefficient is denoted as RA.BCDE……K
• Which denotes that A is correlated with B, C, D, up to K variables
• Its value lies between 0 and 1
35. • What will be the formula if we wanted to calculate multiple correlation
coefficient assuming B as dependent variable?
• What will be the formula if we wanted to calculate multiple correlation
coefficient assuming C as dependent variable?
Note: a) It’s coefficient always lies between 0 to +1 i.e. Always Non negative b)
rA.BC = rA.CB and so on. i.e. the subscript of right hand side do not affect the value
c) Square of multiple correlation coefficient gives coefficient of multiple
determination.
36. Coefficient of Multiple Determination
• The square of multiple correlation coefficients is called the coeff. of multiple
determination
• It is denoted by R2
1.23 , R2
2.13 , R2
3.12
• Let multiple corr. coeff. of yields of a wheat (x1) and joint effects fertilizer (X2)and
quality of seeds (X3) on yields of wheat (X1) is
R 1.23= 0.9, and R2
1.23 = 0.81= 0.81*100= 81% then R2
1.23 is interpreted as 81% of
variation on yields of a wheat is explained by variables fertilizer and quality of seeds
and remaining 19% by unknown factors.
37. Limitations Of Correlation
• We are only considering LINEAR relationships
• Correlation (r) NOT resistant to outliers
• There may be variables other than x which are not studied, yet do influence
the response variable
• A strong correlation does NOT imply cause and effect relationship
38. Daily Temperature
(* C)
Profit
Sales Unit
Serial no.
25
120
70
1
20
80
60
2
30
120
80
3
27
100
50
4
21
115
60
5
32
135
90
6
1. Find Multiple Correlation Coefficient between Daily temperature and
profit assuming Ice- Cream’s Sales Unit as dependent Variable.
39. • Find Multiple Correlation Coefficient between Profit and Unit sales
of ice-cream assuming Ice- Cream’s Daily Temperature As
dependent Variable.
• Find Multiple Correlation Coefficient between Daily temperature
and Unit sales assuming Profit as dependent Variable.
40. ΣX1= 10 ΣX2= 20 ΣX3= 30
ΣX1
2= 20 ΣX2
2= 68 ΣX3
2= 170
ΣX1 X2= 10 ΣX1 X3= 15 ΣX2 X3= 64
A sample of 10 values of the variables X1, X2 and X3 were obtained as
Find
a) Partial correlation between X2 and X3 eliminating the effect of X1 also calculate
and interpret coefficient of partial determination.
b) Multiple correlation between X2 and X3 assuming X1 as dependent variable also
calculate and interpret coefficient of multiple determination.
44. Regression Analysis
Regression Analysis is a very powerful tool in the field of statistical analysis in
predicting the value of one variable, given the value of another variable, when
those variables are related to each other.
It investigates the relationship between a dependent variable (target) and
independent variable(s) (predictor).
Regression Analysis is mathematical measure of average relationship between two
or more variables.
It is a statistical tool used in prediction of value of unknown variable from a known
variable.
45. There are various kinds of regression techniques available to make predictions.
These techniques are mostly driven by three metrics (number of independent
variables, type of dependent variables and shape of regression line).
There are multiple benefits of using regression analysis. They are as follows:
• It indicates the significant relationships between dependent variable and
independent variable.
• It indicates the strength of impact of multiple independent variables on a
dependent variable.
• Regression can be used with many continuous and binary independent
variables (x).
46.
47. Linear Regression
It is one of the most widely known modeling technique. Linear regression is
usually among the first few topics which people pick while learning predictive
modeling.
In this technique, the dependent variable is continuous, independent variable(s)
can be continuous or discrete, and nature of regression line is linear.
Linear Regression establishes a relationship between dependent variable (Y) and
one or more independent variables (X) using a best fit straight line.
48. Regression Equation:
The algebraic expression of the of regression line are called regression equation.
For two variables having one dependent variable and one independent variable
there are two regression equation.
Regression equation of y on x given by y = a+bx in which y is dependent
variable and x is independent variable. – modern
Regression equation of x on y given by x = a+by in which x is dependent
variable and y is independent variable.--- classical
Note: (Regression equation of y on x ) ≠ (Regression equation of x on y).
49. What is Simple Linear Regression?
• Simple linear regression is a statistical method that allows us to
summarize and study relationships between two continuous
(quantitative) variables:
• One variable, denoted x, is regarded as the predictor, explanatory,
or independent variable.
• The other variable, denoted y, is regarded as the response, outcome,
or dependent variable.
• Simple linear regression gets its adjective "simple" because it concerns
the study of only one predictor variable
50. Dependent variables
• The single variable being explained/predicted by the regression model
• Denoted by y- variable
Independent variable
• The explanatory variable(s) used to predict the dependent variable
• Denoted by x- variables
51. Estimation of Coefficients using Least Square Method(OLS):
The regression equation of y on x given by y = a+bx in which y is dependent
variable and x is independent variable.
The value of a and b are determined by using the principle of least square by
minimizing error sum of square.
Here,
error(e) = (y − 𝑦) , so that 𝑒2
= (y − 𝑦)2
After differentiating both w.r.t. to ‘a’ and ‘b’ we get two equations;
𝒚 = 𝒏𝒂 + 𝒃 𝒙 ………….. (i)
𝒙𝒚 = 𝒂 𝒙 + 𝒃 𝒙𝟐 …….. (ii)
52. Shortcut Method:
Here, u = x − 𝐴 and v = y − 𝐵 , then the regression equation v on u is;
v = a+bu
and the value or a and b are calculated as;
𝒗 = 𝒏𝒂 + 𝒃 𝒖 ………….. (i)
𝒖𝒗 = 𝒂 𝒖 + 𝒃 𝒖𝟐
…….. (ii)
Then substitute the value of ‘u’ and ‘v’ to get equation y = a + bx.
Step Deviation Method:
Here, ul =
x−A
h
and vl =
y−B
h
and the value of a and b are calculated as; vl = 𝒏𝒂 + 𝒃 ul………….. (i)
𝑢l
vl
= 𝒂 ul
+ 𝒃 ul𝟐
….. (ii)
Then substitute the value of ul
and vl
to get equation y = a + bx
53. Or using formulas we can obtained the estimate values of
and as follows
55. Properties of Regression Coefficient:
Correlation coefficient is geometric mean of two regression coefficient
r = +_
𝒃𝒙𝒚 × 𝒃𝒚𝒙
If one of the regression coefficient is greater than unity then other must be less
than unity.
Product of two regression coefficients must be less than or equal to 1.
𝒃𝒙𝒚 × 𝒃𝒚𝒙 ≤ 1
Regression coefficients are independent of change of origin but not a scale.
56. Football Games 20 30 10 12 15 25 34
Minor Accidents 6 9 4 5 7 8 9
Q1. The city council of Bowie, Maryland, has gathered data on the no. of minor
traffic accidents and the no. of youth football games that occur in town over a
weekend.
•Plot these data.
•Develop the equation that best describes these data.
•Predict the no. of minor traffic accidents that will occur on a weekend
during which 17 football games take place in Bowie.
•Calculate coeff of determination.
57. Q2. Ms. Patsy Knowlet, a water quality engineer, noted that there
seemed to be a close connection between an important streamflow
water quality parameter, y, and the flow, x m3/s. She found that 9 pairs
of observations yielded the following data: ∑x = 15.2; ∑x2 = 57.6; ∑y =
45.6; ∑y2 = 518.3; ∑xy = 172.6. She would like to develop an equation
that would allow her to predict y knowing x.
• Find the best estimate of the linear regression line of y on x.
• Find the correlation coefficient between x and y.
58. Multiple Regression
• It is the logical extension of simple linear regression
• Multiple regression extends linear regression to allow for 2 or
more independent variables.
• There is still only one dependent variable.
59.
60.
61.
62. The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)
i
ki
k
2i
2
1i
1
0
i e
X
β
X
β
X
β
β
Y
Multiple Regression Model with k Independent Variables:
Y-intercept Population slopes Random Error
63. Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
ki
k
2i
2
1i
1
i X
b
X
b
X
b
a
Ŷ
Estimated
(or predicted)
value of Y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated
intercept
We will always use software to obtain the regression slope
coefficients and other regression summary measures.
66. Example:
2 Independent Variables
• A distributor of frozen desert pies wants to evaluate
factors thought to influence demand
• Dependent variable: Pie sales (units per week)
• Independent variables: Price (in $)
Advertising ($100’s)
• Data are collected for 15 weeks
68.
2
2
2
2
1
1
2
2
2
1
2
2
1
1
1
1
2
2
1
1
X
b
X
X
b
X
a
Y
X
X
X
b
X
b
X
a
Y
X
X
b
X
b
na
Y
5990 =15a + 99.2 1
b + 50.7
39152 = 99.2a +675.2 +333.46
1
b
20442 = 333.46a + 50.7 +173.75
1
b
2
b
2
b
2
b
2
2
1
1
X
b
X
b
a
Ŷ
Let the linear estimate equation be
The normal equations are as follows:
69. Solving above three simultaneous equations we obtain best
estimate values of a, 1
b 2
b
and
a=306.526 , = -24.975, and =74.131
1
b 2
b
70. The Multiple Regression Equation
ertising)
74.131(Adv
ce)
24.975(Pri
-
306.526
Sales
b1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
b2 = 74.131: sales will
increase, on average,
by 74.131 pies per
week for each $100
increase in
advertising, net of the
effects of changes
due to price
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
71. Using The Equation to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Predicted sales
is 428.62 pies
428.62
(3.5)
74.131
(5.50)
24.975
-
306.526
ertising)
74.131(Adv
ce)
24.975(Pri
-
306.526
Sales
Note that Advertising is
in $100’s, so $350
means that X2 = 3.5
72. Monthly expenditure
on food (Rs. 1000) 10 15 20 25 30 35 40 45
Monthly Income (Rs.
1000)
20 40 60 50 70 60 80 85
Size of family (No.)
3 4 5 6 8 7 5 9
Q1. A household survey of monthly expenditure on food yield following
data:
a) Estimate the line of best fit.
b) Estimate the expenditure on food of a family with monthly income
Rs. 75000 and having 10 family members