SlideShare a Scribd company logo
1 of 113
Powerpoint 2
REGRESSION,
REGRESSION,
REGRESSION!
1
REGRESSION / CORRELATION
Object:
To measure the degree of association between
variables and/or to predict the value of one
variable from the knowledge of the values of
(an)other variable(s).
Relationships:
(1) Functional
(2) Statistical
2
Functional Relationship:
Y=f(X),
an exact relationship-- no “error”.
e.g.,
$ savings

Y = -25 +.10X
$ spent at B&N during the year

(joining Barnes & Noble book club)

3
Statistical Relationship:
(true only “on the average”)
Y
PRODUCTION

X
LABOR HOURS

Linear

Y
PHYSICAL ABILITY

X
AGE

Non-linear
Upside-down U-shape

4
Consider the following data, which represent the
sales of a product (adjusted for trend) over the
last 8 sales periods:

Y = sales (millions)
116 109 117 112
122 113 108 115
Y = 114

Last 8 sales of
periods

Average of the 8 sales amounts

What would (should) one predict for the next sales
period? Probably, one would be hard pressed, in this
case, to justify choosing other than Y=114. How good
will this prediction be?
5
WE
DON’T
KNOW!!!!!
6
But-- we can get an idea by looking at how well we would
have done, had we been using this 114 all along:
Prediction error/residual

Y
116
109
117
112
122
113
108
115
Y=114

Y (Y-Y) (Y-Y)2
114
2
4
114
-5
25
114
3
9
-2
114
4
114
8
64
114
-1
1
114
36
-6
114
1
1
0
144

TSS =
Total Sum
of Squares

n

So TSS = Σ (Y -Y)2 = 144
j
j=1

7
Two ways to look at the “TSS”:
1) A measure of the “mis-prediction”
(prediction error) using Y as predictor.
2) A measure of the “Total Variability in the
System” (the amount by which the 8 data
values aren’t all the same).
When the TSS is larger/when the data varies
more, you have more reason to investigate

8
Consider using X, advertising, to “help” predict Y:
Y

X

1 6
1
1 9
0
1 7
1
1 2
1
1 2
2
1 3
1
1 8
0
1 5
1

2
1
3
1
4
2
1
2

Y 1 4 X 2
= 1
=

Scatter Diagram
125

Y

120
115
110

X

105
0

1

2

3

4

9
Consider a Linear or Straight Line Statistical
relationship between the two variables, and then
consider finding the “best fitting line” to the data.
Call this line:
Yc = a+bX
Yc = “Computed Y” or “Predicted Y”
Y is called the Dependent Variable
X is called the Independent Variable
10
What do we mean by “best fitting”?
Answer:
The “Least Squares” line, i.e., the line which
minimizes the sum of the squares of the
distances from the “dots”, Y, and the “line”, Y c.
Hence, the MATH problem is to minimize
n

Σ

Y

j=1

(Y -Yc)2

Y1 = 7
Yc1 = 5
X1

X
11
To find this Least Squares line, we theoretically
need calculus.
However, as a practical matter, every text gives
the answer, and, more importantly, we will get
the result using Excel, or SPSS, or other
software - NOT “BY HAND.”
(There is an arithmetic formula for “b” and “a”
in terms of the sum of the X’s, the sum of the
Y’s, the sum of the X•Y’s, etc., but with software
available, we never use it.)
12
Least squares line
Yc = 106 + 4X

125
120
115
110
105
0

1

2

3

4

13
14
15
16
17
18
19
20
21
Intercept and slope

22
So, using X in the best way, we have a prediction line
of Yc=106+4X. How good are the predictions we’ll get
using this line? Suppose we had been using it:

106+4(2)

(Y-Y)2

(Y-Y)

Y

X

Yc

Y-Yc

(Y-Yc )2

4
25
9
4
64
1
36
1

2
-5
3
-2
8
-1
-6
1

116
109
117
112
122
113
108
115

2
1
3
1
4
2
1
2

114
110
118
110
122
114
110
114

2
-1
-1
2
0
-1
-2
1

4
1
1
4
0
1
4
1

144
TSS

0

0

16
SSE

23
So, SSE = Σ(Y-Yc)2 = 16.
SSE = Sum of Squares “due to error”
That is, we use X in the best way possible, and
still do not get perfect prediction. The amount
of “mis-prediction” still remaining, measured
by sum of squares, is 16. This must be due to
factors other than advertising (X). (Perhaps:
size of sales force, number of retail outlets,
strategy of competition, interest rates, etc.)
24
We call all these other factors “ERROR”. That
is, “error” is the collective name of all variables
(factors) not used in making the prediction.
SSE is also called “SUM OF SQUARED
RESIDUALS” or “RESIDUAL SUM OF
SQUARES”.

25
We have TSS = 144 and SSE = 16.
TSS - SSE = 128
What happened to the other 128? We call this
“SSA”: (“SSR” in text)
SSA = TSS - SSE = 128
SSA = Sum of squares “due to X” or “Attributed
to X”.
26
So, TSS = SSA + SSE
Total = Variability
Variability
+
Variability
Attributed due to ERROR
to X

27
We have
128
SSA
r =
=
= .89
144
TSS
2

r2 is called the “Coefficient of Determination”,
and is interpreted as the “proportion of
variability in Y explained by X” or “... explained
by the relationship between Y and X expressed
in the regression line”.

28
Of course, 1 - r2 =

SSE
TSS

= .11

is interpreted as the proportion of variability
in Y unexplained by X (and still present).

0 ≤ r2 ≤ 1

r2 =

SSA
TSS

Define r= SQRT(r2).
r= correlation (coefficient).
Here r=SQRT(.89) = .943
29
But, r can be + or - !!
SQRT(.89) = +.943 or -.943.
It takes on the sign of b in Yc = a+bX.
-1 ≤ r ≤ 1
A value of r near 1 or -1 is suggestive of a
strong linear relationship between Y and X.
A value of r near 0 is suggestive of no
linear relationship between Y and X.
30
Note that the sign of r indicates the
direction of the relationship (if any). A “+”
indicates that Y and X move in the same
direction; a “-” indicates that they move in
opposite directions. Some people refer to
a positive r as a “positive relationship” and
a negative r as an “inverse relationship”.

31
Y

Y
r = -1
r = +1
X

Y

X
Y
r = -.65

r = +.8
X
Y

r= 0

X
Y

X

r= 0

X
32
Note that a high r 2 does not necessarily mean
CAUSE/EFFECT.
Frequently we have “spurious correlations” – two
variables which are highly related in terms of r 2 ,
but only because they are both “driven” by a third
variable.
“Classic” example:
Number of Number quarts of
TEACHERS LIQUOR SOLD

33
34
R and R

2

SSA
SSE
TSS

35
THE MODEL
In order to get a measure of prediction
error (e.g., confidence intervals,
hypothesis testing), we must make
some assumptions about the
distribution of points scattered about
the regression line. These
assumptions are usually couched in
what is called a “statistical model.”
36
We specify

µ Y•X

= A+BX

Where µ Y•X is the mean or average
value of Y for a given X. We have a
(true) slope of B and (true) intercept of
A; A and B are parameters, the exact
values of which we’ll never know.

37
This says that if we set X = 1 (for example)
and sample an infinite number of Y’s (hence
finding µ Y.1) and then set X = 2 and find µ Y.2, X
= 3 and find µ Y.3, etc., all the µ Y•X fall exactly
on a straight line
(TRUE)
Average
Y,
µ Y•X

X
38
But, we never find µ Y•X. For a given X, we
observe a value of Y which differs from µ Y•X in
the same way that when we observe any random
variable value, it does not equal “µ,” but is some
point governed by some probability law .

f(Y)

µY.X

Y

39
The way we write this in a formal way
is:
Y = µ Y•X + ε = A+BX + ε
Where ε is the difference between an
individual Y and the mean Y, all given a
specific X.
ε is basically the impact of having a nonzero σ.

40
Example:
Suppose that
and µ Y•X=70”

Y = weight
X = height,
= 160 lbs.

Then a person 70” tall with weight of 168 pounds has
a “personal ε” of 8 lbs. If his/her weight were 158
lbs., his/her personal ε would be -2 pounds.
Of course, since ε = Y - µ Y•X, and we don’t know µ Y•X,
we don’t really know anybody’s personal ε.

41
We find the LS line,
Yc = a + bx
a → estimates → A
b → estimates → B
Yc → estimates → µY•X , and Y itself.

42
We usually make the following assumptions,
which are called
“the standard assumptions.”
1) NORMALITY
2) HOMOSCEDASTICITY
3) INDEPENDENCE

43
Assumption 1:
Given a value of X, the probability of Y is
normal.
(e.g., with Y = weight and X = height, for any
given height (say 70”) the Y’s are normal
around µ Y•X =70 (say, 160 lbs.)
Y

160
44
Assumption 2:
The standard deviation of ε , σ ε (which we
don’t know) which is usually called σ Y•X, is
constant for all values of X. The
characteristic of having σ Y•X constant is
referred to as “Homoscedasticity.”

45
Page 31 old

47
Combining assumptions 1 & 2, we have
the Y’s being normally distributed with µ y.x
as mean (and correspondingly, average
error of 0) and constant standard dev. σ y•x.
Of course, as you know, neither µ Y•X nor
σ y•x is known.
µ Y•X is estimated by Yc = a+bx
σ y•x is estimated by “Sy•x”
Sy•x is called the “Standard Error of Estimate,”
SSE
Sy•x =
n−2
48
The SSE makes intuitive sense, in that
SSE is a variability due to error. The [n-2]
(instead of [n-1], the denominator of S in
most previous applications) is really a
degrees of freedom number. The df = n
minus a degree of freedom for each
parameter estimated from the data. Here,
there are 2 such parameters, A and B
(estimated by a and b, respectively).

49
Later, when we have a model of
Y = A + B 1 X1 + B 2 X2 + ε ,
the df will be [n-3].
We usually get Sy•x from the Computer output.
Here, Sy•x = 1.63 (See output on next page).
50
51
sy.x

52
Assumption 3:
The Y values are independent of one another. (This
is often a problem when the data form a time series).
In the real world these assumptions may never be
exactly true, but are often close enough to true to
make our statistical analysis (which follows) valid.
Investigation has shown that moderate departure
from assumptions 1 and 2 do not appreciably affect
results (i.e., assumptions 1 and 2 are “Robust”). In
terms of large departures –– there are ways to
recognize them and do the appropriate (but more
complex) analysis.
53
CONFIDENCE INTERVALS
95% confidence
Intervals for A and B

54
55
This, you had before
Now added to output
56
Of greater interest (usually) is a confidence interval for
the prediction of the next “period.” This is done by:

Yc ± t1-α • Sy•x

Recall: Yc=106+4X

(n-2) df

This formula is a excellent approximation when n is
“large,” (virtually always in MK) and the value of X
at which we are predicting isn’t dramatically far
from the center [X-bar] of our data.
For 95% confidence,

and X = 3, we have:

118 ± 2.447(1.63) or 118 ± 3.99

TINV(.05, 6)
57
(EXCEL COMMAND)

TINV(.05, 6) = 2.447
In general: TINV(α, df)

58
Hypothesis Testing
To test:

H0: B=0
H1: B≠0

We compute

tcalc= b-BH0 0
sb

Note: B=0
same as X & Y
NOT RELATED
Y= A + BX + ε

and accept H0 if |tcalc| < t1- α ,

(n-2)df

reject H0 if |tcalc| > t 1- α

(n-2)df

59
In our problem- tcalc = 6.93 (see output on

next page)
If α=.05, we have t.95= 2.447 and we reject H0
6 df

(All we really need to do is to examine the p-value)
We’ll refer to this as the “t-test.”

60
61
P-value (called “significance” by SPSS)
62
To test

H0: all B’s=0
H1: not all B’s=0,

we have a different procedure.
Here, where µ y•x = A + BX,
there’s only one B, and thus the H’s
above are the same as the previous
H0: B=0

H1: B≠0
63
However , for the future, where
µ y•x = A + B 1 X 1 + B 2 X 2, and “all B’s=0”
means
B 1 =B 2 =0, and there is a difference
between
“B=0” and “all B’s=0,” we introduce:
H 0 : all B’s =0
H 1 : not all B’s=0
64
To test the above, we determine

Fcalc
We get Fcalc from the output!!! Yeah!!!!

65
And we accept H0 if Fcalc < F1- α

(1, n-2) df

reject H0 if Fcalc > F1- α
(1, n-2) df

where F 1- α is the appropriate value
from the F table.
More easily: examine p-value of F- test (next page)
F

α = 0.05
5.99

66
67
Fcalc and p-value

68
MULTIPLE REGRESSION
When there is more than one
independent variable (X), we call our
regression analysis by the term
“Multiple Regression.” With a single
independent variable, we call it “Simple
Regression.”

69
µ y•x = A + B1X1 + B2X2 + • • • + Bk-1Xk-1
Y = µ y•x + ε
Least Squares hyperplane (“line”):

Yc = a + b1x1 + b2x2 + • • • + bk-1xk-1

NOTE:

k-1 = Number of X’s
k = Number of parameters

70
Example:
Y =
X1 =
X2 =
X3 =

Job Performance
Score on (entrance) Test 1
Score on Test 2
Score on Test 3
or

Y =
X1 =
X2 =
X3 =

Sales
Advertising
Number of sales people
Number of competitors

We assume that Computer software gives
us all (or nearly all) the numerical results.
71
Typically, we wish to perform two types of
Hypothesis Tests:

First: F – test

(Y = A+B1X1 + ••• + Bk-1 Xk-1+ ε)

H0 : B1 = B2 = B3 = . . . = Bk-1 = 0
H1 : not all B’s = 0

72
H0 : B1 = B2 = B3 = . . . = Bk-1 = 0
H1 : not all B’s = 0
In “English”:

H0: The X’s
collectively do not
help us predict Y.
H1: At least one of
the darn X’s help us
predict Y!

We call this, reasonably so, a
“TEST OF THE OVERALL MODEL”

73
If we accept H0 that the X’s collectively do
not help us predict Y, we probably
discontinue formal statistical analysis.
However, if we reject H0 (i.e., the “F is
significant”), then we are likely to want a
series of t-tests:
H0 : B1 = 0
H1 : B1 ≠ 0 ,

H0 : B2 = 0
H0 : Bk-1 = 0
•••
H1 : B2 ≠ 0 ,
H1 : Bk-1 ≠ 0
74
These are called “Tests for individual X’s.”
The test is answering: (using B1 as an example)
H0 : Variable X1 is NOT helping us predict
Y, above and beyond the other
variables in the model.
H1 : X1 IS INDEED helping us predict Y,
above and beyond the other variables
in the model.

75
So, note:
We’re answering whether a variable gives us
INCREMENTAL value.
Sometimes a result looks “strange” X1 height

Y
weight
F-Test
t1
t2

X2 pant length
:
:
:

SIGNIFICANT
NOT SIGNIFICANT
NOT SIGNIFICANT
76
If I know a person’s X 1 , height,
do I get additional predictive
value
about Y, weight, Y = Weight
from knowing
X1 = Height
pant length?
No - hence, we
accept H 0 : B 2 = 0X2 = Pant Length
(t 2 not sign.)
77
If I know X2, pant length, do I get
additional predictive value about Y
from knowing height?
(Also) No - hence we accept
Ho: B1= 0
(t1 not sign.)

78
When the X’s themselves are highly
interrelated (the fact that leads to
the strange looking - but not really
strange result), we call this
MULTI-COLINEARITY.
79
Another “look” at this issue:

R2 = .5
Y X1
Y X2
R2 = .4
Y X,X
R2 = ?
1
2
Ans: between .5 and .9
(In some unusual, “strange” cases,
R2 may exceed .9 )
If X1 and X2 not overlapping in the
information provided, R2 = .9; if X2 tells
us a total subset of what X1 tells us,
R2 = .5.

80
If you have

Y
Y
Y

X1
X2
X1, X2

R2 = .70
R2 = .72
R2 = .73,

1) The F test is significant because the X’s together
tell us (an estimate of) 73% of what’s going on with Y.
2) t1 (likely) not sign., because the gain of .01 (.73 - .
72 [with only X2]) is judged by the t-test as too easily
due to the “luck of the draw”. (Actually, it depends
on the sample size)
3) t2 , similarly.

81
X1

X2

100
99
101
93
95
95

95
99
103
95
102
94

.

.

X3
87
98
101
91
88
84
n = 25

.

Example: Y = Job performance
X1 = Test 1 score
X2 = Test 2 score
X3 = Test 3 score

Y
88
80
96
76
80
73

.

82
X1

X2

X3

Y

83
84
85
86
LEAST SQUARES LINE
So, Yc =
-106.34 + 1.02• X1 + .137• X2 + .87• X3
F

α = .05

from
output

47.598

To test: H0: B1 = B2 = B3 = 0
α = .05
H1: not all B’s = 0
Since p-value = .000000001528 < .
05,
we reject H0.
from output

88
To Test
Ho: B1 = 0

Ho: B2 = 0

Ho: B3 = 0

H1: B1 ≠ 0

H1 : B2 ≠ 0

H1 : B3 ≠ 0

We have
tcalc1 = 3.65 tcalc2 = .80
(p = .0015)

t1-α = 2.08 α = .05
21 df = 25 - 4

tcalc3 = 3.57

(p = .4314)

- 2.08

(p = .0018)

0

2.08
89
For

1

and

3

we reject Ho; for

2

we accept Ho.

Conclusion in Practical Terms?
x1 (Test 1) and x3 (Test 3) each gives us
incremental predictive value about
PERFORMANCE, Y.

X2 (Test 2) is either irrelevant or redundant.
90
An added benefit of the analysis was to indicate
how the tests should be weighted: The best fit
occurs if the tests are weighted
1.02,

.137,

.87

(assuming we retain Test 2).
This is equivalent to weights of
1.02
2.027

,

.137

,

.87

2.027

2.027

.07,

1.02
.137
.87
2.027

.43)

or
(.50,

The present weights were (1/3, 1/3, 1/3).
91
“PROBLEM IN NOTES”
Consider the following model: Y = A+B1•X1+B2•X2+B3•X3+ε
Y = Sales Revenue (in units of $100,000)
X1= Expenditure on TV advertising (in units of $10,000)
X2= Expenditure on Web advertising (in units of $10,000)
X3= Expenditure on Newspaper advertising (in units of $10,000)
Refer to computer output following the questions 1. What is the least squares line (hyperplane)?
2. What revenue do I expect (in dollars) with no advertising
in any of the three media?
3. If $10,000 more were allocated to advertising, which
medium should receive it to generate the most additional
revenue?
92
4) What percent of the variability in revenue is due to factors
other than the expenditures in the three advertising media?
5) If management decided to spend the same amount of money
on each of the three types of media, how much total money
would have to be spent to generate an expected revenue of
$40,000,000?
6) Test H0: B1 = B2 = B3 = 0 vs. H1: not all B’s = 0, at α = .05.
What is your conclusion in practical terms?
7) For each variable, test H0: B = 0 vs. H1: B ≠ 0, at α = .05.
What are your conclusions in practical terms?

93
.

94
Dummy Variables
(Indicator)
(Categorical)

Disposable
Income / yr.

Ex: Y = A + B1X1 + B2X2 +
$spent on DVDs/mo.

ε

Sex
Male X2 = 1
Female X2 = 0
95
We Get Yc = a + b1X1+ b2X2

For any given X1, income, we predict Y as
follows:
Male:

Yc = a + b1X1 +b2(1) = a + b1X1 + b2

Female:

Yc = a + b1X1 +b2(0) = a + b1X1 + 0

How is b2 to be interpreted?
96
Ans: The (estimated) amount spent by a
Male, above that which would be spent by a
Female, given the same X1 value (income).
(Of course, if b2 is negative, it says that we
estimate that Females spend more than
Males, at equal incomes.)
If we had defined

X2 = 1 for F’s
X2 = 0 for M’s ,

then b2 would reverse sign, and have the
opposite meaning.
97
Remember that a variable is a “dummy” variable
because of definition and interpretation. The
computer treats a variable whose values are 0 and 1,
just like any other variable.
Our data are, perhaps,
Y

X1

X2

20

50

1

18

40

1

33

65

0

24

49

0

21

62

1

•

•

•

•

•

•

98
Note that we have 2 categories, (M,F), but only one dummy variable.

This is necessary, as is the general situation of C categories, (C-1)
dummy variables.

This is because of computation issues involved in matrix inversion;

99
Example
Y C = a + b 1X 1 + b 2X 2 + b 3X 3 + b 4X 4 + b 5X 5
Water
Usage

Temp. Amount
Produced

# People
Employed

X4

X5

Plant 1

1

0

Plant 2

0

1

Plant 3

0

0
100
Let a + b1X1 + b2X2 + b3X3 = G

Then we predict: (for a given X1, X2 , X3)
FOR PLANT 1:

G + b4(1) + b5(0) = G + b4

FOR PLANT 2:

G + b4(0) + b5(1) = G + b5

FOR PLANT 3:

G + b4(0) + b5(0) = G

How do we interpret b4? b5?

101
STEPWISE REGRESSION
A “variation” of multiple regression
to pick the “best” model.

102
Step 1:

Y/X1, X2, X3, X4

Internal: Run separate simple regressions
with each X; pick the best (best=
highest R2)
R2
Y/X1 .45
Y/X2 .50
Y/X3 .48
Y/X4 .28
2
External: Y/X2, R =.50

103
Step 2:
Internal: Y/X2, X1
Y/X2, X3
Y/X2, X4

R2
.59
.68
.70
2

External: Y/X2, X4, R = .70

104
Step 3: Internal: Y/X2, X4, X1
Y/X2, X4, X3

R2
.77
.73
2

External: Y/X2, X4, X1, R = .77
NOTE: If at any stage, the best variable
to enter is not significant by the t-test,
the ALGORITHM STOPS (and does not
bring that variable in!!!). You select a
p-value (pin), and if p-value of entering
variable > pin (i.e., variable is not
significant), the variable does not
105
enter and the algorithm stops.
Also- There’s a step 3b (and 4b, 5b, etc.)
Step 3b) Now that we’ve entered our third
variable, software goes back
and re-examines previously
entered variables to see if any
should be DELETED (specify a
“p to go out”, pout, so that if pvalue of a variable in our model
> pout, the variable is deleted.
Algorithm continues until it stops!!!!
106
107
Output for the example with three tests and job performance

108
109
KEY!!!

110
Variable 1:
Variable 2:
Variable 3:
Variable 4:
Variable 5:

Y= GRADUATE GPA
X1= UNDERGRAD GPA
X2= QUANTITATIVE GMAT
X3= VERBAL GMAT
X4= COLLEGE SELECTIVITY
0= Less Selective

Y1

X1

3.50
3.90
.
.
.
3.20

3.60
3.60
.
.
.
2.90

X2
600.00
680.00
.
.
.
440.00

X3
580.00
670.00
.
.
.
430.00

1= More Selective

X4
0.0
1.00
.
.
.
1.00
111
Detailed Summary of Stepwise Analysis
Ent. Var.

LS Line

Step 1

UNDERGRAD
GPA
X

Step 2

QUANT
GMAT

Step 3

1

X2

COLLEGE
SEL.

X4

STOP!

R2

Yc= .85 + .73X1

.609

Yc= .585
+ .53X
1
+ .00165X

.833
2

.915
Yc= 1.197
+ .309X1
+ .00163X
2
+
.284X4
If we bring in Verbal GMAT, R2=.919
112
PRACTICE PROBLEM
Y = COMPANY ABC’s SALES ($millions)
X1 = OVERALL INDUSTRY SALES ($billions)
X2 = COMPANY ABC’s ADVERTISING ($millions)
X3 = SPECIAL PROMOTION BY CHIEF COMPETITOR: 0 = YES, 1 = NO
A STEPWISE REGRESSION WAS RUN WITH THESE RESULTS:
STEP 1: VARIABLE ENTERING: X1, Yc = 205+16•X1, R 2 = .48
STEP 2: VARIABLE ENTERING: X2, Yc = 183+11•X1+10•X2, R 2 = .64
STEP 3: VARIABLE ENTERING: X3, Yc = 180+10•X1+8•X2+65•X3, R 2 = .68
A)

If ABC’s advertising is to be same next year as this year (i.e., X2 held constant), and we do
not know (in advance) the value of X3, what would we predict to be the increase in ABC’s
sales if overall industry sales (X1) increase by $1 billion?
a) 10

B)

c) 16

Based on the given information, we can conclude that the R 2 between Y and X2 (the exact
value of which we cannot determine from the given information) is between:
a) .16 and .48

C)

b) 11

b) .16 and .64

c) .48 and .64

d) none of these

Answer part B) if the regression results above were NOT part of a stepwise procedure, but
simply a set of multiple regression results.
113

More Related Content

What's hot

Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisASAD ALI
 
Chapter3 econometrics
Chapter3 econometricsChapter3 econometrics
Chapter3 econometricsVu Vo
 
Chapter8
Chapter8Chapter8
Chapter8Vu Vo
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliAkanksha Bali
 
Chapter 6 simple regression and correlation
Chapter 6 simple regression and correlationChapter 6 simple regression and correlation
Chapter 6 simple regression and correlationRione Drevale
 
Chapter 1: First-Order Ordinary Differential Equations/Slides
Chapter 1: First-Order Ordinary Differential Equations/Slides Chapter 1: First-Order Ordinary Differential Equations/Slides
Chapter 1: First-Order Ordinary Differential Equations/Slides Chaimae Baroudi
 
Bivariate Discrete Distribution
Bivariate Discrete DistributionBivariate Discrete Distribution
Bivariate Discrete DistributionArijitDhali
 
Elementary differential equation
Elementary differential equationElementary differential equation
Elementary differential equationAngeli Castillo
 
Erin catto numericalmethods
Erin catto numericalmethodsErin catto numericalmethods
Erin catto numericalmethodsoscarbg
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regressionTauseef khan
 
Power Series - Legendre Polynomial - Bessel's Equation
Power Series - Legendre Polynomial - Bessel's EquationPower Series - Legendre Polynomial - Bessel's Equation
Power Series - Legendre Polynomial - Bessel's EquationArijitDhali
 
Scse 1793 Differential Equation lecture 1
Scse 1793 Differential Equation lecture 1Scse 1793 Differential Equation lecture 1
Scse 1793 Differential Equation lecture 1Fairul Izwan Muzamuddin
 
3 bessel's functions
3 bessel's functions3 bessel's functions
3 bessel's functionsMayank Maruka
 
First Order Differential Equations
First Order Differential EquationsFirst Order Differential Equations
First Order Differential EquationsItishree Dash
 

What's hot (20)

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
gls
glsgls
gls
 
Chapter3 econometrics
Chapter3 econometricsChapter3 econometrics
Chapter3 econometrics
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Chapter8
Chapter8Chapter8
Chapter8
 
Stats chapter 7
Stats chapter 7Stats chapter 7
Stats chapter 7
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
 
Chapter 6 simple regression and correlation
Chapter 6 simple regression and correlationChapter 6 simple regression and correlation
Chapter 6 simple regression and correlation
 
Chapter 1: First-Order Ordinary Differential Equations/Slides
Chapter 1: First-Order Ordinary Differential Equations/Slides Chapter 1: First-Order Ordinary Differential Equations/Slides
Chapter 1: First-Order Ordinary Differential Equations/Slides
 
Bivariate Discrete Distribution
Bivariate Discrete DistributionBivariate Discrete Distribution
Bivariate Discrete Distribution
 
Diffy Q Paper
Diffy Q PaperDiffy Q Paper
Diffy Q Paper
 
Elementary differential equation
Elementary differential equationElementary differential equation
Elementary differential equation
 
Erin catto numericalmethods
Erin catto numericalmethodsErin catto numericalmethods
Erin catto numericalmethods
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regression
 
Econometrics ch8
Econometrics ch8Econometrics ch8
Econometrics ch8
 
Power Series - Legendre Polynomial - Bessel's Equation
Power Series - Legendre Polynomial - Bessel's EquationPower Series - Legendre Polynomial - Bessel's Equation
Power Series - Legendre Polynomial - Bessel's Equation
 
Scse 1793 Differential Equation lecture 1
Scse 1793 Differential Equation lecture 1Scse 1793 Differential Equation lecture 1
Scse 1793 Differential Equation lecture 1
 
3 bessel's functions
3 bessel's functions3 bessel's functions
3 bessel's functions
 
First Order Differential Equations
First Order Differential EquationsFirst Order Differential Equations
First Order Differential Equations
 

Similar to Powerpoint2.reg

Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).pptMuhammadAftab89
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.pptRidaIrfan10
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptkrunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptMoinPasha12
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Sciencessuser71ac73
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptAdnanAli861711
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.pptbranlymbunga1
 
Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptrahulrkmgb09
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.pptarkian3
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Japheth Muthama
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JMJapheth Muthama
 

Similar to Powerpoint2.reg (20)

Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JM
 

Recently uploaded

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Powerpoint2.reg

  • 2. REGRESSION / CORRELATION Object: To measure the degree of association between variables and/or to predict the value of one variable from the knowledge of the values of (an)other variable(s). Relationships: (1) Functional (2) Statistical 2
  • 3. Functional Relationship: Y=f(X), an exact relationship-- no “error”. e.g., $ savings Y = -25 +.10X $ spent at B&N during the year (joining Barnes & Noble book club) 3
  • 4. Statistical Relationship: (true only “on the average”) Y PRODUCTION X LABOR HOURS Linear Y PHYSICAL ABILITY X AGE Non-linear Upside-down U-shape 4
  • 5. Consider the following data, which represent the sales of a product (adjusted for trend) over the last 8 sales periods: Y = sales (millions) 116 109 117 112 122 113 108 115 Y = 114 Last 8 sales of periods Average of the 8 sales amounts What would (should) one predict for the next sales period? Probably, one would be hard pressed, in this case, to justify choosing other than Y=114. How good will this prediction be? 5
  • 7. But-- we can get an idea by looking at how well we would have done, had we been using this 114 all along: Prediction error/residual Y 116 109 117 112 122 113 108 115 Y=114 Y (Y-Y) (Y-Y)2 114 2 4 114 -5 25 114 3 9 -2 114 4 114 8 64 114 -1 1 114 36 -6 114 1 1 0 144 TSS = Total Sum of Squares n So TSS = Σ (Y -Y)2 = 144 j j=1 7
  • 8. Two ways to look at the “TSS”: 1) A measure of the “mis-prediction” (prediction error) using Y as predictor. 2) A measure of the “Total Variability in the System” (the amount by which the 8 data values aren’t all the same). When the TSS is larger/when the data varies more, you have more reason to investigate 8
  • 9. Consider using X, advertising, to “help” predict Y: Y X 1 6 1 1 9 0 1 7 1 1 2 1 1 2 2 1 3 1 1 8 0 1 5 1 2 1 3 1 4 2 1 2 Y 1 4 X 2 = 1 = Scatter Diagram 125 Y 120 115 110 X 105 0 1 2 3 4 9
  • 10. Consider a Linear or Straight Line Statistical relationship between the two variables, and then consider finding the “best fitting line” to the data. Call this line: Yc = a+bX Yc = “Computed Y” or “Predicted Y” Y is called the Dependent Variable X is called the Independent Variable 10
  • 11. What do we mean by “best fitting”? Answer: The “Least Squares” line, i.e., the line which minimizes the sum of the squares of the distances from the “dots”, Y, and the “line”, Y c. Hence, the MATH problem is to minimize n Σ Y j=1 (Y -Yc)2 Y1 = 7 Yc1 = 5 X1 X 11
  • 12. To find this Least Squares line, we theoretically need calculus. However, as a practical matter, every text gives the answer, and, more importantly, we will get the result using Excel, or SPSS, or other software - NOT “BY HAND.” (There is an arithmetic formula for “b” and “a” in terms of the sum of the X’s, the sum of the Y’s, the sum of the X•Y’s, etc., but with software available, we never use it.) 12
  • 13. Least squares line Yc = 106 + 4X 125 120 115 110 105 0 1 2 3 4 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 23. So, using X in the best way, we have a prediction line of Yc=106+4X. How good are the predictions we’ll get using this line? Suppose we had been using it: 106+4(2) (Y-Y)2 (Y-Y) Y X Yc Y-Yc (Y-Yc )2 4 25 9 4 64 1 36 1 2 -5 3 -2 8 -1 -6 1 116 109 117 112 122 113 108 115 2 1 3 1 4 2 1 2 114 110 118 110 122 114 110 114 2 -1 -1 2 0 -1 -2 1 4 1 1 4 0 1 4 1 144 TSS 0 0 16 SSE 23
  • 24. So, SSE = Σ(Y-Yc)2 = 16. SSE = Sum of Squares “due to error” That is, we use X in the best way possible, and still do not get perfect prediction. The amount of “mis-prediction” still remaining, measured by sum of squares, is 16. This must be due to factors other than advertising (X). (Perhaps: size of sales force, number of retail outlets, strategy of competition, interest rates, etc.) 24
  • 25. We call all these other factors “ERROR”. That is, “error” is the collective name of all variables (factors) not used in making the prediction. SSE is also called “SUM OF SQUARED RESIDUALS” or “RESIDUAL SUM OF SQUARES”. 25
  • 26. We have TSS = 144 and SSE = 16. TSS - SSE = 128 What happened to the other 128? We call this “SSA”: (“SSR” in text) SSA = TSS - SSE = 128 SSA = Sum of squares “due to X” or “Attributed to X”. 26
  • 27. So, TSS = SSA + SSE Total = Variability Variability + Variability Attributed due to ERROR to X 27
  • 28. We have 128 SSA r = = = .89 144 TSS 2 r2 is called the “Coefficient of Determination”, and is interpreted as the “proportion of variability in Y explained by X” or “... explained by the relationship between Y and X expressed in the regression line”. 28
  • 29. Of course, 1 - r2 = SSE TSS = .11 is interpreted as the proportion of variability in Y unexplained by X (and still present). 0 ≤ r2 ≤ 1 r2 = SSA TSS Define r= SQRT(r2). r= correlation (coefficient). Here r=SQRT(.89) = .943 29
  • 30. But, r can be + or - !! SQRT(.89) = +.943 or -.943. It takes on the sign of b in Yc = a+bX. -1 ≤ r ≤ 1 A value of r near 1 or -1 is suggestive of a strong linear relationship between Y and X. A value of r near 0 is suggestive of no linear relationship between Y and X. 30
  • 31. Note that the sign of r indicates the direction of the relationship (if any). A “+” indicates that Y and X move in the same direction; a “-” indicates that they move in opposite directions. Some people refer to a positive r as a “positive relationship” and a negative r as an “inverse relationship”. 31
  • 32. Y Y r = -1 r = +1 X Y X Y r = -.65 r = +.8 X Y r= 0 X Y X r= 0 X 32
  • 33. Note that a high r 2 does not necessarily mean CAUSE/EFFECT. Frequently we have “spurious correlations” – two variables which are highly related in terms of r 2 , but only because they are both “driven” by a third variable. “Classic” example: Number of Number quarts of TEACHERS LIQUOR SOLD 33
  • 34. 34
  • 36. THE MODEL In order to get a measure of prediction error (e.g., confidence intervals, hypothesis testing), we must make some assumptions about the distribution of points scattered about the regression line. These assumptions are usually couched in what is called a “statistical model.” 36
  • 37. We specify µ Y•X = A+BX Where µ Y•X is the mean or average value of Y for a given X. We have a (true) slope of B and (true) intercept of A; A and B are parameters, the exact values of which we’ll never know. 37
  • 38. This says that if we set X = 1 (for example) and sample an infinite number of Y’s (hence finding µ Y.1) and then set X = 2 and find µ Y.2, X = 3 and find µ Y.3, etc., all the µ Y•X fall exactly on a straight line (TRUE) Average Y, µ Y•X X 38
  • 39. But, we never find µ Y•X. For a given X, we observe a value of Y which differs from µ Y•X in the same way that when we observe any random variable value, it does not equal “µ,” but is some point governed by some probability law . f(Y) µY.X Y 39
  • 40. The way we write this in a formal way is: Y = µ Y•X + ε = A+BX + ε Where ε is the difference between an individual Y and the mean Y, all given a specific X. ε is basically the impact of having a nonzero σ. 40
  • 41. Example: Suppose that and µ Y•X=70” Y = weight X = height, = 160 lbs. Then a person 70” tall with weight of 168 pounds has a “personal ε” of 8 lbs. If his/her weight were 158 lbs., his/her personal ε would be -2 pounds. Of course, since ε = Y - µ Y•X, and we don’t know µ Y•X, we don’t really know anybody’s personal ε. 41
  • 42. We find the LS line, Yc = a + bx a → estimates → A b → estimates → B Yc → estimates → µY•X , and Y itself. 42
  • 43. We usually make the following assumptions, which are called “the standard assumptions.” 1) NORMALITY 2) HOMOSCEDASTICITY 3) INDEPENDENCE 43
  • 44. Assumption 1: Given a value of X, the probability of Y is normal. (e.g., with Y = weight and X = height, for any given height (say 70”) the Y’s are normal around µ Y•X =70 (say, 160 lbs.) Y 160 44
  • 45. Assumption 2: The standard deviation of ε , σ ε (which we don’t know) which is usually called σ Y•X, is constant for all values of X. The characteristic of having σ Y•X constant is referred to as “Homoscedasticity.” 45
  • 46.
  • 48. Combining assumptions 1 & 2, we have the Y’s being normally distributed with µ y.x as mean (and correspondingly, average error of 0) and constant standard dev. σ y•x. Of course, as you know, neither µ Y•X nor σ y•x is known. µ Y•X is estimated by Yc = a+bx σ y•x is estimated by “Sy•x” Sy•x is called the “Standard Error of Estimate,” SSE Sy•x = n−2 48
  • 49. The SSE makes intuitive sense, in that SSE is a variability due to error. The [n-2] (instead of [n-1], the denominator of S in most previous applications) is really a degrees of freedom number. The df = n minus a degree of freedom for each parameter estimated from the data. Here, there are 2 such parameters, A and B (estimated by a and b, respectively). 49
  • 50. Later, when we have a model of Y = A + B 1 X1 + B 2 X2 + ε , the df will be [n-3]. We usually get Sy•x from the Computer output. Here, Sy•x = 1.63 (See output on next page). 50
  • 51. 51
  • 53. Assumption 3: The Y values are independent of one another. (This is often a problem when the data form a time series). In the real world these assumptions may never be exactly true, but are often close enough to true to make our statistical analysis (which follows) valid. Investigation has shown that moderate departure from assumptions 1 and 2 do not appreciably affect results (i.e., assumptions 1 and 2 are “Robust”). In terms of large departures –– there are ways to recognize them and do the appropriate (but more complex) analysis. 53
  • 55. 55
  • 56. This, you had before Now added to output 56
  • 57. Of greater interest (usually) is a confidence interval for the prediction of the next “period.” This is done by: Yc ± t1-α • Sy•x Recall: Yc=106+4X (n-2) df This formula is a excellent approximation when n is “large,” (virtually always in MK) and the value of X at which we are predicting isn’t dramatically far from the center [X-bar] of our data. For 95% confidence, and X = 3, we have: 118 ± 2.447(1.63) or 118 ± 3.99 TINV(.05, 6) 57
  • 58. (EXCEL COMMAND) TINV(.05, 6) = 2.447 In general: TINV(α, df) 58
  • 59. Hypothesis Testing To test: H0: B=0 H1: B≠0 We compute tcalc= b-BH0 0 sb Note: B=0 same as X & Y NOT RELATED Y= A + BX + ε and accept H0 if |tcalc| < t1- α , (n-2)df reject H0 if |tcalc| > t 1- α (n-2)df 59
  • 60. In our problem- tcalc = 6.93 (see output on next page) If α=.05, we have t.95= 2.447 and we reject H0 6 df (All we really need to do is to examine the p-value) We’ll refer to this as the “t-test.” 60
  • 61. 61
  • 63. To test H0: all B’s=0 H1: not all B’s=0, we have a different procedure. Here, where µ y•x = A + BX, there’s only one B, and thus the H’s above are the same as the previous H0: B=0 H1: B≠0 63
  • 64. However , for the future, where µ y•x = A + B 1 X 1 + B 2 X 2, and “all B’s=0” means B 1 =B 2 =0, and there is a difference between “B=0” and “all B’s=0,” we introduce: H 0 : all B’s =0 H 1 : not all B’s=0 64
  • 65. To test the above, we determine Fcalc We get Fcalc from the output!!! Yeah!!!! 65
  • 66. And we accept H0 if Fcalc < F1- α (1, n-2) df reject H0 if Fcalc > F1- α (1, n-2) df where F 1- α is the appropriate value from the F table. More easily: examine p-value of F- test (next page) F α = 0.05 5.99 66
  • 67. 67
  • 69. MULTIPLE REGRESSION When there is more than one independent variable (X), we call our regression analysis by the term “Multiple Regression.” With a single independent variable, we call it “Simple Regression.” 69
  • 70. µ y•x = A + B1X1 + B2X2 + • • • + Bk-1Xk-1 Y = µ y•x + ε Least Squares hyperplane (“line”): Yc = a + b1x1 + b2x2 + • • • + bk-1xk-1 NOTE: k-1 = Number of X’s k = Number of parameters 70
  • 71. Example: Y = X1 = X2 = X3 = Job Performance Score on (entrance) Test 1 Score on Test 2 Score on Test 3 or Y = X1 = X2 = X3 = Sales Advertising Number of sales people Number of competitors We assume that Computer software gives us all (or nearly all) the numerical results. 71
  • 72. Typically, we wish to perform two types of Hypothesis Tests: First: F – test (Y = A+B1X1 + ••• + Bk-1 Xk-1+ ε) H0 : B1 = B2 = B3 = . . . = Bk-1 = 0 H1 : not all B’s = 0 72
  • 73. H0 : B1 = B2 = B3 = . . . = Bk-1 = 0 H1 : not all B’s = 0 In “English”: H0: The X’s collectively do not help us predict Y. H1: At least one of the darn X’s help us predict Y! We call this, reasonably so, a “TEST OF THE OVERALL MODEL” 73
  • 74. If we accept H0 that the X’s collectively do not help us predict Y, we probably discontinue formal statistical analysis. However, if we reject H0 (i.e., the “F is significant”), then we are likely to want a series of t-tests: H0 : B1 = 0 H1 : B1 ≠ 0 , H0 : B2 = 0 H0 : Bk-1 = 0 ••• H1 : B2 ≠ 0 , H1 : Bk-1 ≠ 0 74
  • 75. These are called “Tests for individual X’s.” The test is answering: (using B1 as an example) H0 : Variable X1 is NOT helping us predict Y, above and beyond the other variables in the model. H1 : X1 IS INDEED helping us predict Y, above and beyond the other variables in the model. 75
  • 76. So, note: We’re answering whether a variable gives us INCREMENTAL value. Sometimes a result looks “strange” X1 height Y weight F-Test t1 t2 X2 pant length : : : SIGNIFICANT NOT SIGNIFICANT NOT SIGNIFICANT 76
  • 77. If I know a person’s X 1 , height, do I get additional predictive value about Y, weight, Y = Weight from knowing X1 = Height pant length? No - hence, we accept H 0 : B 2 = 0X2 = Pant Length (t 2 not sign.) 77
  • 78. If I know X2, pant length, do I get additional predictive value about Y from knowing height? (Also) No - hence we accept Ho: B1= 0 (t1 not sign.) 78
  • 79. When the X’s themselves are highly interrelated (the fact that leads to the strange looking - but not really strange result), we call this MULTI-COLINEARITY. 79
  • 80. Another “look” at this issue: R2 = .5 Y X1 Y X2 R2 = .4 Y X,X R2 = ? 1 2 Ans: between .5 and .9 (In some unusual, “strange” cases, R2 may exceed .9 ) If X1 and X2 not overlapping in the information provided, R2 = .9; if X2 tells us a total subset of what X1 tells us, R2 = .5. 80
  • 81. If you have Y Y Y X1 X2 X1, X2 R2 = .70 R2 = .72 R2 = .73, 1) The F test is significant because the X’s together tell us (an estimate of) 73% of what’s going on with Y. 2) t1 (likely) not sign., because the gain of .01 (.73 - . 72 [with only X2]) is judged by the t-test as too easily due to the “luck of the draw”. (Actually, it depends on the sample size) 3) t2 , similarly. 81
  • 82. X1 X2 100 99 101 93 95 95 95 99 103 95 102 94 . . X3 87 98 101 91 88 84 n = 25 . Example: Y = Job performance X1 = Test 1 score X2 = Test 2 score X3 = Test 3 score Y 88 80 96 76 80 73 . 82
  • 84. 84
  • 85. 85
  • 86. 86
  • 87. LEAST SQUARES LINE So, Yc = -106.34 + 1.02• X1 + .137• X2 + .87• X3
  • 88. F α = .05 from output 47.598 To test: H0: B1 = B2 = B3 = 0 α = .05 H1: not all B’s = 0 Since p-value = .000000001528 < . 05, we reject H0. from output 88
  • 89. To Test Ho: B1 = 0 Ho: B2 = 0 Ho: B3 = 0 H1: B1 ≠ 0 H1 : B2 ≠ 0 H1 : B3 ≠ 0 We have tcalc1 = 3.65 tcalc2 = .80 (p = .0015) t1-α = 2.08 α = .05 21 df = 25 - 4 tcalc3 = 3.57 (p = .4314) - 2.08 (p = .0018) 0 2.08 89
  • 90. For 1 and 3 we reject Ho; for 2 we accept Ho. Conclusion in Practical Terms? x1 (Test 1) and x3 (Test 3) each gives us incremental predictive value about PERFORMANCE, Y. X2 (Test 2) is either irrelevant or redundant. 90
  • 91. An added benefit of the analysis was to indicate how the tests should be weighted: The best fit occurs if the tests are weighted 1.02, .137, .87 (assuming we retain Test 2). This is equivalent to weights of 1.02 2.027 , .137 , .87 2.027 2.027 .07, 1.02 .137 .87 2.027 .43) or (.50, The present weights were (1/3, 1/3, 1/3). 91
  • 92. “PROBLEM IN NOTES” Consider the following model: Y = A+B1•X1+B2•X2+B3•X3+ε Y = Sales Revenue (in units of $100,000) X1= Expenditure on TV advertising (in units of $10,000) X2= Expenditure on Web advertising (in units of $10,000) X3= Expenditure on Newspaper advertising (in units of $10,000) Refer to computer output following the questions 1. What is the least squares line (hyperplane)? 2. What revenue do I expect (in dollars) with no advertising in any of the three media? 3. If $10,000 more were allocated to advertising, which medium should receive it to generate the most additional revenue? 92
  • 93. 4) What percent of the variability in revenue is due to factors other than the expenditures in the three advertising media? 5) If management decided to spend the same amount of money on each of the three types of media, how much total money would have to be spent to generate an expected revenue of $40,000,000? 6) Test H0: B1 = B2 = B3 = 0 vs. H1: not all B’s = 0, at α = .05. What is your conclusion in practical terms? 7) For each variable, test H0: B = 0 vs. H1: B ≠ 0, at α = .05. What are your conclusions in practical terms? 93
  • 94. . 94
  • 95. Dummy Variables (Indicator) (Categorical) Disposable Income / yr. Ex: Y = A + B1X1 + B2X2 + $spent on DVDs/mo. ε Sex Male X2 = 1 Female X2 = 0 95
  • 96. We Get Yc = a + b1X1+ b2X2 For any given X1, income, we predict Y as follows: Male: Yc = a + b1X1 +b2(1) = a + b1X1 + b2 Female: Yc = a + b1X1 +b2(0) = a + b1X1 + 0 How is b2 to be interpreted? 96
  • 97. Ans: The (estimated) amount spent by a Male, above that which would be spent by a Female, given the same X1 value (income). (Of course, if b2 is negative, it says that we estimate that Females spend more than Males, at equal incomes.) If we had defined X2 = 1 for F’s X2 = 0 for M’s , then b2 would reverse sign, and have the opposite meaning. 97
  • 98. Remember that a variable is a “dummy” variable because of definition and interpretation. The computer treats a variable whose values are 0 and 1, just like any other variable. Our data are, perhaps, Y X1 X2 20 50 1 18 40 1 33 65 0 24 49 0 21 62 1 • • • • • • 98
  • 99. Note that we have 2 categories, (M,F), but only one dummy variable. This is necessary, as is the general situation of C categories, (C-1) dummy variables. This is because of computation issues involved in matrix inversion; 99
  • 100. Example Y C = a + b 1X 1 + b 2X 2 + b 3X 3 + b 4X 4 + b 5X 5 Water Usage Temp. Amount Produced # People Employed X4 X5 Plant 1 1 0 Plant 2 0 1 Plant 3 0 0 100
  • 101. Let a + b1X1 + b2X2 + b3X3 = G Then we predict: (for a given X1, X2 , X3) FOR PLANT 1: G + b4(1) + b5(0) = G + b4 FOR PLANT 2: G + b4(0) + b5(1) = G + b5 FOR PLANT 3: G + b4(0) + b5(0) = G How do we interpret b4? b5? 101
  • 102. STEPWISE REGRESSION A “variation” of multiple regression to pick the “best” model. 102
  • 103. Step 1: Y/X1, X2, X3, X4 Internal: Run separate simple regressions with each X; pick the best (best= highest R2) R2 Y/X1 .45 Y/X2 .50 Y/X3 .48 Y/X4 .28 2 External: Y/X2, R =.50 103
  • 104. Step 2: Internal: Y/X2, X1 Y/X2, X3 Y/X2, X4 R2 .59 .68 .70 2 External: Y/X2, X4, R = .70 104
  • 105. Step 3: Internal: Y/X2, X4, X1 Y/X2, X4, X3 R2 .77 .73 2 External: Y/X2, X4, X1, R = .77 NOTE: If at any stage, the best variable to enter is not significant by the t-test, the ALGORITHM STOPS (and does not bring that variable in!!!). You select a p-value (pin), and if p-value of entering variable > pin (i.e., variable is not significant), the variable does not 105 enter and the algorithm stops.
  • 106. Also- There’s a step 3b (and 4b, 5b, etc.) Step 3b) Now that we’ve entered our third variable, software goes back and re-examines previously entered variables to see if any should be DELETED (specify a “p to go out”, pout, so that if pvalue of a variable in our model > pout, the variable is deleted. Algorithm continues until it stops!!!! 106
  • 107. 107
  • 108. Output for the example with three tests and job performance 108
  • 109. 109
  • 111. Variable 1: Variable 2: Variable 3: Variable 4: Variable 5: Y= GRADUATE GPA X1= UNDERGRAD GPA X2= QUANTITATIVE GMAT X3= VERBAL GMAT X4= COLLEGE SELECTIVITY 0= Less Selective Y1 X1 3.50 3.90 . . . 3.20 3.60 3.60 . . . 2.90 X2 600.00 680.00 . . . 440.00 X3 580.00 670.00 . . . 430.00 1= More Selective X4 0.0 1.00 . . . 1.00 111
  • 112. Detailed Summary of Stepwise Analysis Ent. Var. LS Line Step 1 UNDERGRAD GPA X Step 2 QUANT GMAT Step 3 1 X2 COLLEGE SEL. X4 STOP! R2 Yc= .85 + .73X1 .609 Yc= .585 + .53X 1 + .00165X .833 2 .915 Yc= 1.197 + .309X1 + .00163X 2 + .284X4 If we bring in Verbal GMAT, R2=.919 112
  • 113. PRACTICE PROBLEM Y = COMPANY ABC’s SALES ($millions) X1 = OVERALL INDUSTRY SALES ($billions) X2 = COMPANY ABC’s ADVERTISING ($millions) X3 = SPECIAL PROMOTION BY CHIEF COMPETITOR: 0 = YES, 1 = NO A STEPWISE REGRESSION WAS RUN WITH THESE RESULTS: STEP 1: VARIABLE ENTERING: X1, Yc = 205+16•X1, R 2 = .48 STEP 2: VARIABLE ENTERING: X2, Yc = 183+11•X1+10•X2, R 2 = .64 STEP 3: VARIABLE ENTERING: X3, Yc = 180+10•X1+8•X2+65•X3, R 2 = .68 A) If ABC’s advertising is to be same next year as this year (i.e., X2 held constant), and we do not know (in advance) the value of X3, what would we predict to be the increase in ABC’s sales if overall industry sales (X1) increase by $1 billion? a) 10 B) c) 16 Based on the given information, we can conclude that the R 2 between Y and X2 (the exact value of which we cannot determine from the given information) is between: a) .16 and .48 C) b) 11 b) .16 and .64 c) .48 and .64 d) none of these Answer part B) if the regression results above were NOT part of a stepwise procedure, but simply a set of multiple regression results. 113

Editor's Notes

  1. Regression is used more than any other technique -it’s the mode
  2. Measure the relationship between variables (like how often you go to wendys vs. how many children you have) sometimes (not always) to predict the variable of one variable based on the values of other variables. Functional Relationships: an algebraic relationship Statistical Relationships:
  3. Functional Relationship: Algebraic relationship (an exact relationship) Meaning there is no error Two people that spend the same X will save the same Y. It is an exact relationship.
  4. Statistical relationship: a relationship that is true only on the average It is not exact. Example: Labor hours vs production. There is an upward tendency so you make a trend line. It is not an EXACT relationship. Production won’t be EXACTLY the same what is revealed by trend line. This relationship is a linear relationship, a striahgt line. If it is not a striaght line non-linear age vs. physical ability  time vs. knowledge of a product follows the upside down u-shape curve
  5. Instead of basing it on Y base it on adverising (x) When advertising is at it’s highest, Y is at it’s highest (117 &amp; 122). When Y is at its lowest, so is X. There is a link. Sometimes its better to use this method or proof /demonstration. Meaning the Highest the Advertising the higher the sales. Scatter Diagram When you graph X and Y data points, it is called a Scatter diagram. The scatter diagram has a trend (linear trend) So its good to find the best fitting line. (next slide)
  6. Yc is the best fitting line. The change in Y, if you increase x by 1 (the change in y per unit change in x). IN this context, Y is called the dependent variable (the output variable). X is called the independent variable (the input variable)
  7. Best fitting is defined as the Least Squares line. Line which minimizes the sum of the squared differences (prediction errors). Minimize the sum of the prediction errors squared.
  8. To find Least Squares line, use excel or SPSS. Never do a regression by hand.
  9. Dependent (Y) Independent (X)
  10. SSE we use x in the best way we can, but we still don’t get perfect prediction. The amount of misprediction still there is the SSE. They are due to factors or variables OTHER than advertising. SSS Factors or variables affecting prediction that aren’t what you’re measuring
  11. Error the collective name of all errors not used in making the prediction All the other variables
  12. What happened to the 128? We reduce the prediction error dramatically. We reduced it by 128 by using X to help us predict. We call the 128 SSA. SSA Sums of Squares attributed to X. We gained that 128 is by using x to help us predict.
  13. The total variability in sales = variability due to x + variability due to ERROR. Why are sales not always the same? Because of variable xAdvertising is not always the same (X) and variability of errors