The term regression was originally employed by Francis Galton while describing the laws of inheritance. Galton believed that these laws caused population extremes to “regress
toward the mean.” By this he meant that children of individuals having extreme values of a certain characteristic would tend to have less extreme values of this characteristic
than their parent.
2. Meaning
• In statistics, regression analysis is a statistical process
for estimating the relationships among variables.
• Regression analysis:There are several types of
regression:
– Linear regression
– Simple linear regression
– Logistic regression
– Nonlinear regression
– Nonparametric regression
– Robust regression
– Stepwise regression
3. Meaning: regression
Noun
1.the act of going back to a previous place or state;
return or reversion.
2.retrogradation; retrogression.
3.Biology. reversion to an earlier or less advanced
state or form or to a common or general type.
4.Psychoanalysis. the reversion to a chronologically
earlier or less
adapted pattern of behaviour and feeling.
5.a subsidence of a disease or its manifestations:
a regression of symptoms.
4. INTRODUCTION
• Many engineering and scientific problems
– concerned with determining a relationship between a
set of variables.
• In a chemical process, might be interested in
– the relationship between
• the output of the process, the temperature at which it
occurs, and
• the amount of catalyst employed.
• Knowledge of such a relationship would enable us
– to predict the output for various values of
temperature and amount of catalyst.
5. INTRODUCTION
• Situation-
– there is a single response variable Y , also called
the dependent variable
– depends on the value of a set of input, also called
independent, variables x1, . . . , xr
• The simplest type of relationship these
varibles is a linear relationship
6. INTRODUCTION
• If this was the linear relationship between Y and
the xi, i = 1, . . . , r, then it would be possible
– once the βi were learned
– to exactly predict the response for any set of input
values.
• In practice, such precision is almost never
attainable
• The most that one can expect is that Equation
would be valid subject to random error
10. • Suppose that the responses Yi corresponding to the input
values xi, i = 1, . . . , n are to be observed and used to
estimate α and β in a simple linear regression model.
• To determine estimators of α and β we reason as follows:
– If A is the estimator of α and B of β, then the estimator of the
response corresponding to the input variable xi would be A + Bxi.
• Since the actual response is Yi ,
– the squared difference is (Yi − A − Bxi)2, and so if A and B are the
estimators of α and β,
– then the sum of the squared differences between the estimated
responses
– the actual response values—call it SS — is given by
11.
12.
13.
14.
15.
16. DISTRIBUTION OF THE ESTIMATORS
• To specify the distribution of the estimators A
and B, it is necessary to make additional
assumptions about the random errors aside
from just assuming that their mean is 0.
• The usual approach is to assume that the
random errors are independent normal
random variables having mean 0 and variance
σ2.
27. EXAMPLE
• The following data relate
– X: the moisture of a wet mix of a certain product
– Y: the density of the finished product
• Fit a linear curve to these data. Also determine SSR.
36. Regression
The term regression was originally employed by Francis
Galton while describing the laws
of inheritance.
Galton believed that these laws caused population
extremes to “regress toward the mean.”
By this he meant that children of individuals having
extreme values of a certain characteristic would tend to
have less extreme values of this characteristic than
their parent.
41. EXAMPLE
• An individual claims that the fuel consumption of
his automobile does not depend on how fast the
car is driven.
• To test the plausibility of this hypothesis, the car
was tested at various speeds between 45 and 70
miles per hour. The miles per gallon attained at
each of these speeds was determined, with the
following data resulting is given.
• Do these data refute the claim that the mileage
per gallon of gas is unaffected by the speed
at which the car is being driven?
42. the mileage per gallon of gas is unaffected
by
the speed at which the car is being driven
60. ANALYSIS OF RESIDUALS: ASSESSING
THE MODEL
The figure shows that, as indicated both by its scatter diagram and the random
nature of its standardized residuals, appears to fit the straight-line model quite
well.
61. The figure of the residual plot shows a discernible pattern, in that the residuals
appear to be first decreasing and then increasing as the input level increases. This
often means that higher-order (than just linear) terms are needed to describe the
relationship between the input and response. Indeed, this is also indicated by the
scatter diagram in this case.
62. The standardized residual plot shows a pattern, in that the absolute value
of the residuals, and thus their squares, appear to be increasing, as the
input level increases. This often indicates that the variance of the response
is not constant but, rather, increases with the input level.
63. TRANSFORMING TO LINEARITY
• The mean response is not a linear function
• In such cases, if the form of the relationship
can be determined it is sometimes
possible, by a change of variables, to
transform it into a linear form.
• For instance, in certain applications it is known that
W(t), the amplitude of a signal a time t after its
origination, is approximately related to t by the
functional form
65. EXAMPLE
• The following table gives the percentages of a chemical that
were used up when an experiment was run at various
temperatures (in degrees celsius).
• Use it to estimate the percentage of the chemical that would
be used up if the experiment were to be run at 350 degrees.
66. EXAMPLE
• Let P(x) be the percentage of the chemical that is used up when the
experiment is run at 10x degrees.
• Even though a plot of P(x) looks roughly linear, we can improve upon the
fit by considering a nonlinear relationship between x and P(x).
• Specifically, let us consider a relationship of the form : 1 − P(x) ≈ c(1 − d)x