3. INTRODUCTION
• According to Oxford dictionary the word ‘regression’ means
‘stepping back’ or ‘returning to average value’.
• The term was first used by famous biological scientist Sir
Francis Galton relating to a study of hereditary characteristics.
• Regression Analysis is a statistical process for estimating the
relationship among variables, so that one may be able to
predict the unknown value of one variable for a known value of
another variable.
4. CONTENTS
• What is Regression Analysis
• Utility of Regression
• Difference between Regression and Correlation
• Types of Regression
• Simple, Multiple, Linear, Non-linear, Partial, Total
• Scatter Diagrams and Relationships
• Estimation using a Regression Lines (for linear only)
• The Method of Least Squares
• A Solved Example
• Correlation Analysis
• Using Regression and Correlation Analysis: Limitations, Errors
and Caveats
• Conclusion
5. WHAT IS
REGRESSION?
The process of predicting one variable from another by
statistical means, using previous data (regression).
Regression Line– A line fitted to a set of data points to
estimate the relationship between two variables.
Regression Analysis attempts to create an estimating
equation for prediction of data (by using previous data).
6. UTILITY OF
REGRESSION
1. Regression Analysis explains the nature of relationship
between two variables.
2. It is one of the most commonly used tool for business
analysis.
3. It is widely useful for quality control in corporate sector.
4. It is useful for estimation of statistical curve for demand,
supply, price consumption and cost.
7. DIFFERENCE BETWEEN
REGRESSION AND
CORELATION ANALYISIS
CO-RELATION
1. Correlation is a relationship
between two or more
variables.
2. It is the measure of degree of
relationship between two
variables.
3. The coefficient of correlation
is relative measure. The
range lies between -1 and +1.
4. It is not very much useful for
further algebraic treatment
5. Correlation coefficient is
independent of change in
origin and scale.
REGRESSION
1. It is mathematical relation
showing the average relation
between two variables.
2. It is the measure of nature of
relationship between
variables.
3. Regression coefficient is an
absolute figure. Value of an
dependent variable is found
from an independent variable.
4. It is very much useful for
further algebraic treatment.
5. Regression coefficient is
independent of the change of
origin but not the scale.
8. USE IN BUSINESS
Regression is widely used in the field of business.
Businessmen are interested in predicting future production, consumption,
investments, price, profit, sales etc. So the success of business depends
upon the correctness of the various estimates that they are required to
make.
It is also used in sociological study and economical planning to find the
projections of population, birth rates, death rates etc.
9. TYPES OF REGRESSION
ANALYSIS
Simple and Multiple Regression
Linear and Non-Linear Regression
Partial and Total Regression
10. SIMPLE REGRESSION
MULTIPLE REGRESSION
In Simple Regression Analysis we study about only two variables
at a time in which one variable is dependent and other is
independent.
eg. the functional relationship between income and expenditure
In Multiple Regression Analysis we study about multiple variables,
among which one is dependent and others are independent.
eg. the study of effect of rain and irrigation on the yield of wheat is
an example of multiple regression.
11. LINEAR REGRESSION
NON-LINEAR REGRESSION
When one variable changes with another variable in a fixed ratio,
it is known as linear regression.
Such type of relationship is depicted on graph by straight line or
first degree equation.
When one variable changes with another variable in a changing
ration then it is referred to as curvilinear/non-linear regression.
Such type of relationship on a graph paper takes the form of a
curve. This is presented by second or third degree equation.
12. PARTIAL REGRESSION
TOTAL REGRESSION
When two or more variables are studied for functional relationship
but at a time, relationship between two variables are studied and
other variables are held constant.
And on the other hand when we study about all variables at a
time, then it is called Total Regression
13. SCATTER DIAGRAM
AND RELATIONS
• Determine whether a relationship between two variable exists
or not
• Defines a very basic nature of relationship between two
variables— independent (X) and dependent (Y) variable
• Helps us to draw a regression line ( a line fitted between the
scatter points to derive a relation between two variable)
• An important note: Regression Line is attempted to be drawn
such that most of the points lie on it and equal number of
points fall on either side of it.
16. Estimation Using Regression Line
0, 3
1, 5
2, 7
0
2
4
6
8
0 2 4
Moneyspent(inK)
No. of stores visited
Mrs. Hudson’s Shopping
Y-Values
Estimating equation for a straight line:
Y = a + bX
where,
Y = Dependent variable
a = Y intercept (constant)
b = slope of the line (constant)
X = Independent variable
Mrs. Hudson wants to determine how much money
she will end up spending if she visits 5 stores. (Let b
be 2)
By graph: a = 3
b = 2
X = 10
Therefore: Y = a + bX
Y = 3 + 2(5)
Y = 13
Therefore Mrs. Hudson comes to know she
will probably spend 13K when she’ll visit 5
stores.
17. Finding the Slope of Straight Line
0, 3
1, 5
2, 7
0
2
4
6
8
0 2 4
Moneyspent(inK)
No. of stores visited
Mrs. Hudson’s Shopping
Y-Values
Slope of a straight line:
b =
𝒀2−𝒀1
𝑿2−𝑿1
Let us try determining the slope for Mrs. Hudson’s
Shopping graph.
Therefore,
b =
𝒀2−𝒀1
𝑿2−𝑿1
b =
7−5
2−1
b =
2
1
= 2
2 is the Slope of line
18. METHOD OF LEAST
SQUARE
• How can we a “fit” a line mathematically if none of the points lie on
the line?
• We shall determine how to determine equation for a line drawn
through set points.
• The Estimating Line Equation:
• Ŷ symbolizes the estimated points, or the points that lie on the
regression line.
• Let’s take an example.
𝑌 = 𝑎 + 𝑏𝑋
22. (D) IS A BETTER FIT,
BUT…
4
2
7
4
2
3
0
2
4
6
8
0 10 20
(C)
Actual Points
Est. Points
Error= 4
4
2
7
5
3
4
0
2
4
6
8
0 10 20
(D)
Actual
Points
Est. Points
Linear (Est.
Points)
The
Absolute
Error
│Y – Ŷ│
│4 – 4│ = 0
│7 – 3│ = 4
│2 – 2│ = 0
Total = 4
The
Absolute
Error
│Y – Ŷ│
│5 – 4│ = 1
│7 – 4│ = 3
│3 – 2│ = 1
Total = 5
The Value of Absolute error
in D is high, so only absolute
error is not reliable to find
accurate regression line,
which is why we shall use….
23. THE METHOD OF
LEAST SQUARES
4
2
7
4
2
3
0
2
4
6
8
0 10 20
(C)
Actual Points
Est. Points
Error= 4
4
2
7
5
3
4
0
2
4
6
8
0 10 20
(D)
Actual
Points
Est. Points
Linear (Est.
Points)
The
Absolute
Error
(Y – Ŷ) 2
(4 – 4) 2 = 0
(7 – 3) 2 = (4) 2 = 16
(2 – 2) 2 = 0
Total = 16
The
Absolute
Error
(Y – Ŷ) 2
(5 – 4) 2 = 1
(7 – 4) 2 = (3)2 = 9
(3 – 2) 2 = 1
Total = 11
24. THE METHOD OF
LEAST SQUARE
What does it do?
• It magnifies or penalizes the larger errors
• It cancels the effect of the positive and negative values (no
need for mod operator)
• Because we look for the estimating line that minimizes the sum
of the squares of the errors we call this the least squares
method.
• Now let us take a look at how to calculate the constants for the
line estimated in such a way
25. (Slope of the best-fitting regression line):
b =
∑𝑋𝑌 − 𝑛𝑋 𝑌
∑𝑋2− 𝑛𝑋2
where,
𝑋 = mean of values of the independent variable
𝑌 = mean of values of the dependent variable
n = the number of data points
X = values of the independent variable
Y = values of the dependent variable
26. Y- Intercept of the best-fitting regression line:
a = 𝑌 − 𝑏𝑋
where,
𝑋 = mean of values of the independent variable
𝑌 = mean of values of the dependent variable
b = slope from previous equation
Truck No. Age of
Truck (X)
Repair
Expense (Y)
XY, so ∑XY=
78
101 5 7 35
102 3 7 21
103 3 6 18
104 1 4 4
eg. taking the Data beside:
𝑿 = 3
𝒀 = 6
n = 4 (data points)
28. The Director of the Beverly Hills Sanitation Department is interested
in the relationship between the age of a garbage truck and the
repairing expenses she should expect to incur.
So data is gathered by analysts on 5 trucks and arranged and is
presented as:
Truck No. Age of Truck (X) Repair Expenses (Y) (in 100$)
101 3 6
102 2 1
103 7 8
104 4 5
105 8 9
34. SECOND METHOD (2)
Normal equation to find ‘a’ is:
∑ 𝑌 = 𝑛𝑎 + 𝑏 ∑ 𝑋
Normal equation to find ‘b’ is:
∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋2
Substituting values from the table, we get:
29 = 5𝑎 + 24𝑏……………(i)
168 = 24𝑎 + 142𝑏
84 = 12𝑎 + 71𝑏………….(ii)
Multiplying equation (i) by 12 and (ii) by 5
35. (when multiplied by 12 and 5 respectively)
348 = 60𝑎 + 288𝑏 ……………(iii)
420 = 60𝑎 + 355𝑏 ……………(iv)
By solving equation (iii) and (iv), we get:
𝒂 = 𝟎. 𝟔𝟔
𝒃 = 𝟏. 𝟎𝟕
The Estimating Equation becomes:
𝒀 = 𝟎. 𝟔𝟔 + 𝟏. 𝟎𝟕𝑿
37. EXTRA:
CORRELATION ANALYSIS
After discovering the type of relationship between two events,
what may be the next step?
Correlation analysis helps us to know to what degree are both the
variables related.
It helps us to determine the extent of reliability through some
determination coefficients, coefficient of correlation etc.
Since this topic is out of scope, we must not discuss it. The next
steps however are correlation analysis and multiple regression.
38. REGRESSION AND CORRELATION
ANALYSIS: LIMITATIONS, ERRORS
AND CAVEATS
• Extrapolation beyond the Range of the Observed Data
An estimating equation is only valid over the same range as the one from
which the sample was taken initially.
• Cause and Effect
Regression and Correlation analysis do not state that a change in one
variable causes change in another variable.
• Using Past Trends to Estimate Future Trends
Reappraisal of historical data must be done if we use it to determine
estimating equations for present.
• Finding Relationships when they do not Exist
It takes knowledge and common-sense to deduce which relationships are
meaningful and meaningless.
39. CONCLUSION
Thus we realise why Regression Analysis is so prominent in use in
all kinds of fields, and not just business.
Both Regression and Correlation Analysis are powerful tools for
prediction of outcomes and forecasting of data, by using the
previously collected samples.
Regression Analysis obviously has its limitations and should be
used with discretion of the user.