2. HEDONIC PRICING IN REAL ESTATE
• What is the relationship between square footage of a house and
its price?
• If one square foot of living space is added to a house by how much will its
price increase?
• $50?
• $100?
• All of these questions can be answered with the help of a linear
regression. We will be using the ”real estate” dataset to help us
answer these questions.
3. THE RELATIONSHIP BETWEEN SQUARE FOOTAGE OF A
HOUSE AND ITS PRICE
• Stata: twoway scatter price sqft
4. SIDENOTE – LABEL VAR
• To clean up graphs (and other output) use the command ”label”
to clean up variable names.
• label variable sqft “Square Feet”
• label variable price “Price (thousands)”
5. THE RELATIONSHIP BETWEEN SQUARE FOOTAGE OF A
HOUSE AND ITS PRICE
• Add a line of best fit
• twoway (scatter price sqft) (lfit price sqft)
6. THE EQUATION OF THE LINE
Price = 153.18 + 0.195 * sq ft
For a 2,000 sq ft, expected price
would be
= 153.18 + 0.19537 * 2,000
= 543.92
8. LINEAR REGRESSION MODEL
Yi = β0 + β1Xi + ui, i = 1,…, n
• We have n observations, (Xi, Yi), i = 1,.., n.
• X is the independent variable or regressor or explanatory variable
• Y is the dependent variable
• β0 = intercept
• β1 = slope, coefficient
• ui = the regression error or disturbance term
• not a deterministic equation
9. EXAMPLE
• Name the components of the following linear
regression model
𝑝𝑟𝑖𝑐𝑒𝑖 = 𝛽0 + 𝛽1 𝑠𝑞𝑢𝑎𝑟𝑒 𝑓𝑒𝑒𝑡𝑖 + 𝑒𝑖
10. FITTED EQUATION AND SLOPE INTERPRETATION
• 𝑝𝑟𝑖𝑐𝑒 = 153.18 + 0.195 ∗ 𝑠𝑞𝑢𝑎𝑟𝑒 𝑓𝑒𝑒𝑡
• On average, a one unit (what is it?) increase in house size is associated
with about a 0.19 unit (what is it?) increase of its price
• If 50 square feet of living space are added to a house, on average its price
will increase by about _______
• If 100 square feet of living space are added to a house, on average its price
will increase by about ______
11. INTERPRETING BETA FOR THE CHANGE IN X AND AT A
POINT
• 𝑝𝑟𝑖𝑐𝑒 = 153.18 + 0.195 ∗ 𝑠𝑞𝑢𝑎𝑟𝑒 𝑓𝑒𝑒𝑡
• Change in X:
• If square feet (X) changes by 10, by how much will the price change on average?
• At a point:
• If the size of a house (X) is 1500 square feet what is its price?
12. MORE EXAMPLES
• 𝑇𝑒𝑠𝑡𝑠𝑐𝑜𝑟𝑒 = 520.4 − 5.82 ∗ 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
• What is the regression’s prediction for a classroom of 20 students?
• What is the regression’s prediction for the change in the classroom average
test score if the classroom size increases by 4?
• 𝑊𝑒𝑖𝑔ℎ𝑡 = −99.41 + 3.94 ∗ 𝐻𝑒𝑖𝑔ℎ𝑡
• What is the regression’s weight prediction for someone who is 70 inches tall?
(weight is measured in pounds, height is measured in inches)
• If someone has a growth spurt of 1.5 inches over the course of a year what is
the regression’s prediction for the increase in this person’s weight?
13. CORRELATION DOES NOT IMPLY CAUSATION
• Proper language: is associated/correlated with, suggests
*Check out the book “Spurious Correlation by Tyler Vigen
https://www.tylervigen.com/spurious-correlations
15. MORE EXAMPLES
• Regress the price on the following variables, write out
the population regression equation, fitted equation, and
interpret the results:
• Number of bedrooms
• Number of bathrooms
• Age
• Year built
16. EVEN MORE EXAMPLES. DIY TIME.
• Use California test score dataset
• Regress test score on the following variables, write out
the population regression equation, the fitted equation,
and interpret the results.
• Class size
• Percent of ESL students
• Computers per student
• Percent of students qualified for free lunches
17. MEANINGFUL INTERCEPT
• Pay attention to the following
Intuitively can one of the X variable observations be zero?
- The zero value for the X variable is in the sample
Example: In the following two regressions is the intercept
meaningful?
𝑡𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝐸𝑆𝐿 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠𝑖 + 𝑢𝑖
𝑡𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑖𝑛𝑐𝑜𝑚𝑒𝑖 + 𝑢𝑖
18. MEANINGFUL INTERCEPT
• Is the intercept meaningful in the following two
regressions?
𝑡𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒𝑖 + 𝑢𝑖
𝑡𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟𝑠 𝑝𝑒𝑟 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑖 + 𝑢𝑖
19. MEANINGFUL SLOPE COEFFICIENT. INTERPRETING THE
MAGNITUDE
Is the increase in X associated with the large increase in Y? if so => the
coefficient is economically meaningful (large)
• We can’t simply compare coefficients from two different regressions, since
each variable has a different standard deviation. We want to find out by how
much Y changes when X changes by its standard deviation
• Steps to find out the magnitude:
1. Find the standard deviation of X
2. Multiply the standard deviation of X by the coefficient
3. Compare the product from (2) to the standard deviation of Y (often by
dividing it by the standard deviation of Y)
20. INTERPRETING THE MAGNITUDE. EXAMPLES
• Is the effect of class size on test scores large?
• Is the effect of the percent of ESL students on test scores large?
• If we compare the coefficients from the two regressions what
conclusion might we arrive to?
• If class size increases by 1 standard deviation by how much will the test
scores increase?
• If the percent of ESL students increases by 1 standard deviation by how
much will the test scores increase?
• How do these changes compare to the standard deviation of test
scores?
21. INTERPRETING THE MAGNITUDE.
• Is the effect of the independent variable in the following two
regressions on the dependent variable economically meaningful?
• With one standard deviation change in each of the independent
variables by how much will the dependent variable change in
terms of its standard deviation?
𝑡𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖
= 𝛽0 + 𝛽1 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑞𝑢𝑎𝑙𝑖𝑓𝑖𝑒𝑑 𝑓𝑜𝑟 𝑓𝑟𝑒𝑒 𝑙𝑢𝑛𝑐ℎ𝑒𝑠𝑖 + 𝑢𝑖
𝑡𝑒𝑠𝑡 𝑠𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟𝑠 𝑝𝑒𝑟 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑖 + 𝑢𝑖
22. REVIEW
• Please name the components of a linear regression (based on an example of
your own choosing)
• Why do we need to have an error term in the regression equation?
• What is a fitted equation? (What is the difference between the fitted equation
and the population equation)? Please give examples.
• How do you interpret the results of a regression at a point/ based on the
change in X?
• Command in Stata to run a regression
• What does a meaningful intercept mean? Please give an example
• How do we interpret the magnitude of the coefficient and decide if it is
economically meaningful?
Editor's Notes
In general terms before we know what the equation of the line is we will write: price = b0+b1*sqft. However, very few of our observations will actually follow this formula. For example (open excel) the first house has 2400 square feet and its price is 300. If I calculate what its price should be using the equation of the line we would get 617.748, which is different than the price it has of 300. This means that the equation of the line is not perfect in describing the relationship between price and sqft. There is an error most of the time. In the first case the error is equal to 317.748. so we will write our equation as price = b0+b1*sqft+error. On average the error term will be 0.
Note the equation of the line in Stata was added with the help of “aaplot” function that has to be installed separately.
We add a disturbance term to our model and write down its general form
What is n in our sample on Seattle real estate? Stata: count N=420
Deterministic equation – a model that defines an exact relationship between variables, no room for error
Before when we didn’t add any of the other factors that might affect test scores we pretty much lumped them all together in the error term
The price will change by $1950
The price of a house this size is 445680 dollars
Are the questions asking to interpret a change or “at a point”?
Answers: 404 points, - 9.82 points; 176.39 pounds, 5.91 pounds
OLS regression does not show which way causation goes, we were the ones to decide that test scores go on the left side of the equation and student-teacher ratio goes on the right side of the equation. Could it be that districts with higher scores also have lower student-teacher ratios? Can you think of a reason why?
How do we show causation: economic theory, econometrics techniques (including experiments), common sense and intuition
Price = 153.18+0.195*square feet
The coefficients are: -2.28; -0.671; 79.4; -0.61
No in the first one, yes in the second one
The coefficient on class size (STR) is -2.28 while the coefficient on the percent of ESL students is -0.67. At this point it seems that the effect of class size is larger.
The standard deviation of class size is 1.89 the standard deviation of percent of ESL students is 18.28.
-2.28*1.89=-4.3092 (class size)
-0.67*18.28=-12.25 (percent of ESL students)
The standard deviation of test scores is 19.05
With one st dev increase in class size the average increase in the test scores is about 22% of its standard deviation
With one st deviation increase in el_pct the average increase in test scores is about 64% of its standard deviation
They probably both have a meaningful effect
Meal_pct: 0.061*27=16.47 16.47/19.05=86%
Comp_stu: 0.065*79.4=5.161 5.161/19.05 = 27%