2. Brief History
• This term was introduced by Francis Galton
• It was introduced in the year 1877
• It was first started with the study of height of parents and the height of their
children.
• In the beginning it is only regarded as biological phenomenon.
• Later it was extended by Udny Yule and Karl Pearson.
3. INTRODUCTION
• The word regression means going back or returning.
• It is the study of effect of one variable over another variable.
• Also regression can be called as a statistical tool with the help of which we
can estimate the unknown values of one variable with the known values of
another variable.
• In modern world it has very important role in the field of Machine Learning.
4. Regression model
Yi =β1+ β2Xi+Ui
β1 : Intercept Parameters or
β2 : Slope coefficient regression coefficients
Yi : Dependent/Explained/Regressed/Endogenous variable
Xi : Independent/Explanatory/Regressor/Exogenous variable
Ui : Disturbance term or error term
5. Difference between correlation and regression
Correlation
• Used to determine the strength of
linear relationship between two
variables.
• No difference between dependent
and explanatory variables.
Regression
• Used to predict average value of
one variable on the basis of the
fixed values of other variables.
• Dependent variable is random and
explanatory variable is fixed.
6. Dependent variable : The variable whose value is influenced or predicted.
Explanatory variable : The variable which influences the values or is used for
prediction.
8. Scatter Plot Diagram
Scatter plots (also called scatter graphs) are
similar to line graphs. A line graph uses a
line on an X-Y axis to plot a continuous
function, while a scatter plot uses dots to
represent individual pieces of data. In
statistics, these plots are useful to see if
two variables are related to each other.
For example, a scatter chart can suggest
a linear relationship (i.e. a straight line).
9. Line of best fit
A line of best fit (or "trend" line) is a straight line that best represents
the data on a scatter plot.
This line may pass through some of the points, none of the points, or all
of the points.
Also the sum of squares of the deviations of the actual value of Y from
their estimated value is minimum.
10. Where Do we use regression analysis ?
• Where we want to indicates the significant relationships between dependent
variable and independent variable. As for example - income and expenditure.
• To make predictions and forecasting.
• Where we can find dependency among different variables relating to a
phenomenon. For example - We want to estimate growth in sales of a company
based on current economic conditions. You have the recent company data
which indicates that the growth in sales is around two and a half times the growth
in the economy. Using this insight, we can predict future sales of the company
based on current & past information.
11. R-Square
• It is a measure of “Goodness of Fit” for the regression model.
• It is the ratio of Explained Sum of Square (ESS) to the Total sum of Square (TSS).
• It tells us how much capable the regressors are in explaining the variability of the
dependent variable in the model.
• Its value can never be negative.
• Its value lie between 0 to 1
• It is denoted by R2 .