Logan Travis Lucas Co

Logan Travis
1 ECO 4313
P a g e 1 | 23
Section 1: Analysis for Lucas County, Ohio
Logan Travis
Economics 4313 Spatial Econometrics
Texas State University - San Marcos
lgtravis15@gmail.com
INTRODUCTION
PART 1 of this assignment involved fitting a least-squares regression model to the relationship
between 200 observed home selling prices from Lucas County, Ohio, using a constant term and
the square foot living area of the home as explanatory variables. The selling price is then logged
to determine the effect of the elasticity of the living area on the selling price of the homes in the
200 home sample.
PART 2 of this assignment involved fitting a least-squares regression model to the relationship
between 200 observed homes selling prices from Lucas County, Ohio, using a constant term, and
8 other characteristics of the home as explanatory variables. Some of the continuous variables
are logged to determine the effect of the elasticity of these variables on the selling price of the
homes in the 200 home sample.
PART 3 of this assignment involved a diagnostic test to determine whether to use the log
transformed or linear level relationship for the hedonic house price regression. Another
regression involved testing whether the age predictor should be included in the model in a linear
or non-linear relationship to selling price. Finally, a test is performed to explore the question of
outliers in our data.

Logan Travis
2 ECO 4313
P a g e 2 | 23
The source of sample data information is a publicly available data set provided by
LeSage as part of the Spatial Econometrics Toolbox, described in LeSage and Pace (2004),
containing over 25,000 home sales for the years 1993 to 1998. The data employed here was
labeled student24.data containing a sample of 200 nearby homes that sold along with a number
of characteristics of the homes (house age, square foot living area, square foot lot size, number of
rooms, number of full baths, number of half baths, and number of bedrooms). The simple model
used here takes the form
Where y1, y2, . . . yn are (n = 200 observed) selling prices, and x1, x2, . . . are
(known/observed) values of the square foot living area for each of the 200 homes, and ε1,
ε2, . . . are unknown/ unobserved disturbances/errors for our sample of 200 homes
The relationship can also be written as
Note: 𝛽 describes how changes in the square foot living area (x) are related to changes in
selling price (y). 𝛼 indicates the selling price of a vacant lot or a house with zero square
foot living area.
Part 1.1 Summary statistics
Summary statistics for the sample of 200 homes are shown in Figure 4: Table 1 below. These
include the mean, median and standard deviation as well as minimum and maximum values for

Logan Travis
3 ECO 4313
P a g e 3 | 23
the selling price as well as all available characteristics. The table also shows summary statistics for
the total sample of 25,357 homes. Histograms and boxplots are used to describe the distribution
of characteristics in the sample of 200 homes in regards to age, sqft living area and selling price.
Figure 1 shows a histogram of age which the left skewedness of distribution of the ages of
homes. The amount of skew is evident by the distance between the median line to the right of
the mean line. The mean is being pulled downward by the outliers below the first quartile. The
boxplot shows the range and interquartile breakdown of 50% of the ages of the 200 home
sample. The first quartile begins at approximately 38 years of age, below which there are four
outliers in my sample that are less than 30 years of age.
Figure 1: Age Distribution
Figure 2 shows a histogram which indicates the right skew of the distribution of sqft living
area within the sample. The median is to

Logan Travis
4 ECO 4313
P a g e 4 | 23
the left of the mean, which indicates that there are outliers pulling the mean upward. The living
area distribution is skewed to the right because of the presence of these outliers as indicated by
the boxplot.
Figure 2: Sqft TLA Distribution
Figure 2 shows a histogram which indicates a right skew in the distribution of selling price
in the sample of 200 homes wherein the median resides to the left of the most frequent selling
price range but is not significantly different from the mean. The boxplot of the sample shows the
interquartile range of the majority of selling prices for the sample between with the first quartile
beginning at $43,900 and the third quartile ending around $68,000. There are no outliers
present.
Figure 3:
Selling Price Distribution

Logan Travis
5 ECO 4313
P a g e 5 | 23
Tabular summary statistics are for the sample of 200 homes are shown in Figure 4: Table 1
below. These include the mean, median and standard deviation as well as minimum and
maximum values for the selling price as well as all available characteristics. The table also shows
summary statistics for the total sample of 25,357 homes. These include the mean, median and
standard deviation as well as minimum and maximum values for the selling price as well as all
available characteristics.
Figure 4: Table 1
The median for age is older than the mean suggesting an asymmetry skewed to the left.
The comparable mean and median for the selling price indicates a symmetric distribution of
values in the sample. This means that there is similar distribution of the selling price of homes
both above and below the “typical” home
The mean of sqft living area is above the median “typical” house in the sample indicates a
right skewedness to the sample that may be caused by extreme values above the median value of
the homes in my sample.
The number of rooms and bedrooms in the “typical” house in my sample is higher than
the mean which suggests left skew. This indicates that there are more homes in my sample with
as many or more rooms than the “typical” home in my sample.

Logan Travis
6 ECO 4313
P a g e 6 | 23
Both the number of full baths and half baths have a mean and median that are equal
suggesting a symmetric distribution.
The “typical” house in my sample is 22 years older, smaller in lotsize and sqft living area
and sold for less than the “typical” house in the full sample. The range of selling prices is also
much smaller with my sample than that in the full population. The distribution of homes in my
sample is more symmetric than the full sample as indicated by the closeness in value of the mean
and median in my sample as compared with the larger differences in the full sample. The
standard deviation of the selling prices in my sample is much lower than the full population
suggesting much less variation in the selling prices than the entire population The full population
sample has a mean $13,518 above the median suggesting an asymmetric distribution of prices
skew to the right. The mean is being influenced by the large maximum value of $875,000.
Part 1.2 Univariate Regression
Results from the univariate regression are presented in Table 2. The slope, as represented by 𝛽, is
33.72 which indicates an increase in one square foot increases price by $33.72 for a home in my
sample. The t-statistics indicates that estimate is 11.5 standard deviations away from zero which
suggests that sqft living area is a statistically significant predictor of variation in the estimation of
selling price of a home in my sample at the 99% confidence level.
Table 2: Ordinary Least Squares Estimate (levels model)
The value of an empty lot, as indicated by the coefficient of the constant term, is
$17,176.22. The p-value and t-statistic for the value of an empty lot shows statistical significance
at the 99% level in this model. R2 shows that this model explains approximately 40% of the

Logan Travis
7 ECO 4313
P a g e 7 | 23
variation in the selling price in the sample of 200 homes as explained by the sqft living area of the
house. This could be indicative of omitted variable bias since this naive model is not controlling
for any other predictors of the selling price.
The results for a second regression are presented in Table 3. This regression shows an
estimation where the Y and X variables were transformed to their log form. This parameter
estimate of 𝛽 represents percentage response to percentage changes in sqft living area or the
elasticity of selling price to sqft living area. The positive slope of the fitted line indicates an
increase of living area by one percent would lead to price increase of 0.79% on average over the
200 homes. The t-statistic for this slope estimate is over 11 standard deviations away from zero
and has p-value that shows that this estimate of elasticity is significant at the 99% confidence
level. R2 indicates that 40% of variation in the observed logged selling price is explained by the
change in logged sqft living area in the homes in my sample.
Table 3: OLS Estimates (log-transformed model)
The figures below were included, one from each regression showing a scatter plot of the
actual versus fitted values for the 200 homes (logged or unlogged) selling prices, with the
horizontal axis showing (logged or unlogged) square foot living area. The scatterplot for Figure 5
exhibits an error for one home that is much larger than the rest of the homes. The majority of
homes sold between $35,000 and $75000. There houses are clustered in the bottom left
quadrant of the graph demonstrating that the homes selling price was low and their size was
small but they are dispersed widely above and below the prediction line for the model. This is a
reiteration of the R2 value which indicates that the univariate regression is a naive model that is a
poor estimator for selling price of any particular house in my sample. The levels simple linear
model in Figure 6 had a tendency to overestimate the selling price for home between 500 and

Logan Travis
8 ECO 4313
P a g e 8 | 23
1000 sqft living area. The log-transformed simple linear model is a better model since it indicates
a similar tendency to overestimate as to underestimate selling price using sqft living area.
Compared to the non-logged model, the log-transformed model is a better predictor for
selling price and the relatively large house in Figure 1. However, the large dispersion below the
fitted line suggests that it is a poor predictor for homes that sold relatively cheaply compared to
other homes in the sample. The same 7 homes in Figure 1 are still errors in Figure 6.
Figure 6: Scatter plot of actual selling prices versus fitted valuessqft living
area
Figure 5: Log-transformed regression actual prices versus fitted values

Logan Travis
9 ECO 4313
P a g e 9 | 23
Part 2: Multivariate Regression
This second part of the assignment involves extending the regression model to include 7 other
possible explanatory variables in the attempt to predict selling prices. As before, this model will
use both the level and log-transformed continuous variables. The log transformed variables are
sqft living area, selling prices and lotsize. The other five variables are categorical and are not log-
transformed. Table 4 and Table 5 present the coefficient estimates for levels regression and log-
transformed regression, respectively.
The estimate for square foot living area points to a $16.87 increase in selling price
associated with one square foot increase in living area which is statistically significant at the 99%
level. Also statistically significant at the same level is the estimate of the effect of the increase of
one square foot in lotsize on the selling price of a house in my sample; it will increase the selling
price by an estimated $1.34. The estimate for an empty lot is $15,067.41 which is 2 standard
deviations away from zero and is statistically different than zero at to the 95% level. All other
estimates of the effect for other predictors in this extended model are not statistically different
from zero.
Our level model is therefore;
Table 4: Multivariate OLS Estimates for levels regression

Logan Travis
10 ECO 4313
P a g e 10 | 23
This states that the prediction for selling price increases $16.87 for each sqft living area increase
controlling for lotsize the house is built upon.
The rbar-squared is used to compare the simple levels model to the extended since it penalizes
for the addition of predictors in the denominator. This shows that the extended model explains a
further 8% of the variation in the actual selling prices in my sample as indicated by a rbaradjusted
value of 48.19% versus the 40.23% for the simple model. There is a noticeable reduction in errors
using this extended levels linear model.
Table 5: Multivariate Regression of log-transformed model
This log transformed regression allows for the inclusion of logged continuous variables as
predictors of the change in logged selling price of homes in my sample. The variables that are
transformed into logs are lotsize, sqft living area and selling price. These statistically significant
coefficients are interpreted as elasticity or the effect of the marginal percentage change on the
percentage change in selling price.
The log-transformed model can be represented thusly,
𝐸(𝑦̂|𝑙𝑜𝑔𝑥 𝑠𝑞𝑓𝑡 𝑇𝐿𝐴 𝑙𝑜𝑔𝑥𝑙𝑜𝑡𝑠𝑖𝑧𝑒) = 5.39+. 378𝑙𝑜𝑔𝑥 𝑠𝑞𝑓𝑡 𝑇𝐿𝐴 +. 044𝑙𝑜𝑔𝑥𝑙𝑜𝑡𝑠𝑖𝑧𝑒
This can be interpreted as a 10% increase in the sqft living area will have an estimated
3.78% increase in selling price of a home in my sample while controlling for the effect of the
lotsize. This lotsize effect is estimated to increase selling price 3.09% when lotsize is increased
by 10%. The value of an empty lot is $219.20 is statistically significant at the 99% confidence
interval but is not economically significant since it is numerically close to zero.

Logan Travis
11 ECO 4313
P a g e 11 | 23
There is a more pronounced increase in the value of adjusted R-squared at 52.59% from
the previous simple log-transformed model by controlling for lotsize. This model explains a
further 12.54% of the variation in the estimated logged selling price and seems to indicate a
better fit. However, the two statistics are not appropriate measure of goodness of fit between
a log and levels regression model and requires more sophisticated statistical analysis. The
proportion of unexplained to errors indicates this improvement in fit.
The scatterplots in Figure 6 show that both forms of the model tend to overestimate
houses that sold for less but it underestimated the values for homes that sold for more than the
typical home in my sample.
Figure 6: Scatterplot of residuals of multivariate model
Part 3: Specification Tests
The part of the assignment is threefold. First, a test of the linear or non-linear relationship of the
predictor age to the house selling price. Second, a determination is made regarding which of the two
extended regression models, levels versus logged, is more appropriate for the hedonic house price
regression for my sample of 200 homes. Lastly, there is an investigation of the impact outliers in my
sample of 200 homes.
Part 3.1 Relationship of house age
This section uses the R-bar squared statistic to determine the statistical significance of the
estimated effect of predicting selling price using a linear, quadratic and cubic house age variable. This
adjusted form of R-squared penalizes for the addition of explanatory variables in these three models and

Logan Travis
12 ECO 4313
P a g e 12 | 23
is therefore more appropriate than r-squared. It is theorized by R. Kelley Pace in, Journal of Real Estate
Finance and Economics, that the predictor age might not follow a linear relationship but is more
polynomial in its effect upon selling price. This is interpreted as an increase in home age depressing the
value of a home until its becomes an economically-significant age that is old enough as to add value to
the home’s selling price due to its perception as an antique or being historic. Figure 7 indicates that the
best model for my sample is using the predictor of the quadratic house age. This indicates that house
age decreases house selling price at an increasing rate.
Figure 7
Part 3.2 Test for log versus levels specification
This part of the assignment is a measure of goodness of fit for the two forms of the model. The
null hypothesis being tested here is that both forms of the models are equal in the ability to predict the
selling price for my sample of 200 homes. It is the rejection of this hypothesis that will allow the
appropriate specification to be determined. This procedure originated with Sargen, 1964.
This section uses MATLAB to run a regression using a regression of a model that is transformed
using the geometric mean as opposed to levels or log-transformed. As already noted, we cannot
compare the fit of the two models using R2 because the log transformation to y changes the variation in
y to variation in ln(y). However, we can follow the 4-step procedure from Gujarati page 41. This
procedure is for the case where all y and all x−variables are logged (which is not exactly our case). There
are other approaches set forth in the literature that might be more appropriate here, but these are
more complicated (e.g., see Aneuryn-Evans and Deaton, 1980). Another common practice is to take the
antilog (exponential) of the logged predicted values and compute an R−squared statistic for the (anti)
log-transformed model that would be comparable to the untransformed model R−squared.
We will rely on the results from the previous section that indicated the appropriate model
specification should include age + age-squared or quadratic explanatory variables.
This 4-step procedure from Gujarati p. 41 is calculated with the following MATLAB
code:

Logan Travis
13 ECO 4313
P a g e 13 | 23
This code retrieves the (vector of) residuals from the ‘result1’ and ‘result2’ structure variables
returned by the ols(log(ytilde),lnx) and ols(ytilde,xmatrix) function calls, then calculates the residual sum
of squares using the inner product vector multiplication. Finally, a formal chi-squared distributed
statistic is calculated. The numerator and denominator for this statistic depend on whether RSS1 or RSS2
is larger, which is why we use the MATLAB min() function to determine this.
The results of this test indicate that the log-transformed model produces a better fit. However,
lambda indicates that the improved fit is significant at the 95% level since it is less than the 5% critical
value which fails to provide enough evidence to reject the null hypothesis. Nevertheless, a log
transformed model will be used as the most appropriate model. A robust regression will be performed
using this log-transformed model as the most appropriate model for predicting selling prices of homes in
my sample and estimates from the robust regression will be compared to a log transformed ordinary
least squares regression having the same variables.

Logan Travis
14 ECO 4313
P a g e 14 | 23
Figure 8: Evidence for supporting rejection of null hypothesis and appropriate use of log-transformed model
Part 3.3 Robust Regression
As a test for outliers, we carried out robust regressions using Bayesian MCMC estimates proposed by
Geweke (1993).
This regression will be using the most appropriate model as determined by the two previous
sections of this part of the assignment. Specifically, it is the log-transformed linear model as controlled
for lotsize and age-squared. Table 7 shows the results of the robust regression.
Table 7: Robust regression results
The adjusted r-squared shows that this robust model explains 52.38% of the variation in the
predictions of selling price. There are 9 variables in this model with the addition of the quadratic age
predictor.
Table 8: Robust Regression estimates
The results of Table 8 indicate that the quadratic house age predictor’s effect is not statistically
significant from zero. The elasticity of lotsize and sqft living area are significant at the 99% level. The

Logan Travis
15 ECO 4313
P a g e 15 | 23
predicted effects lotsize and sqft living area elasticity indicates that an increase in their sizes by 10% will
increase the elasticity of house price by 3.0% and 3.97%, respectively. The number of rooms is included
in this robust model at the 95% confidence interval. It’s estimated effect upon predicted selling price is
to increase the selling price by a non-significant economic amount.
A comparison between the differences in the coefficient estimates of the robust and OLS
regression models is used as a test for outliers. If there are significant differences between these
estimates, then the implication is that outliers are impacting the results of the OLS regression model.
Table 9: OLS Regression Estimates
Table 9 indicates that there is less than a percentage point difference in the coefficient estimate
of logged sqft living area. All other variables do not indicate the presence of outliers impacting the
regression results.
Figure 7: vi plot of ordered residuals
Figure 7 is a plot of the residuals using a Geweke test that shows the weights of the residuals.
Even though there seems to be aberrations around observations 60 and 160, their vi estimate values are
not high enough to indicate an impactful effect of outliers.

Logan Travis
16 ECO 4313
P a g e 16 | 23
Part 4: Conclusion
The best model for my sample of 200 homes in the Lucas County, Ohio area is to use OLS regression
model controlling for logged lotsize , logged sqft living area and the quadratic form of house age.
References
Aneuryn-Evans, G. and A. Deaton (1980) “Testing linear versus logarithmic regression
models,” Review of Economic Studies, 47, 275-91.
LeSage, James P. and R. Kelley Pace, “Models for Spatially Dependent Missing Data,”
Journal of Real Estate Finance and Economics, 2004, Volume 29, number 2, pp. 233
254.
Geweke, J. (1993). “Bayesian Treatment of the Independent Student t Linear Model,”
Journal of Applied Econometrics, 8, 19-40.
Guajarati, D, (2011), Econometrics by Example, Palgrave Macmillan, 5th Edition.
Ramsey, J.B. (1969) “Tests for Specification Errors in Classical Linear Least Squares
Regression Analysis”, Journal of the Royal Statistical Society, Series B., 31(2), 350371.
JSTOR 2984219
SARGAN, J. D. (1964), “Wages and prices in the United Kingdom”, in Hart, P. E., Mills,
G. and Whitaker, J. K. (eds.) Econometric Analysis for National Economic Planning
(London: Butterworths).

Logan Travis
17 ECO 4313
P a g e 17 | 23
INTRODUCTION
A hedonic price ordinary least squares regression was performed in SECTION 1 of this
assignment on 200 non-random house observations from a common geographic region. This
hedonic price ordinary least squares regression compared the appropriateness of a levels versus a
log-transformed model and also investigated the nature of the relationship between a house’s
age and selling price. It concluded by comparing the hese OLS regression models against a robust
model and tested for the influence of outliers on the regression results. The conclusion was that
the most appropriate model was the log-transformed OLS model as explained by the percentage
change in sqft living area and the percentage change in lotsize and that age exhibited a significant
quadratic relationship with the selling price. It also determined that outliers were not influencing
the regression results.
PART 2 of the assignment is aimed at testing the Gauss-Markov assumptions associated
with the ordinary least squares (OLS) regressions performed in PART 1. First, the collinearity of
the variables in the regression is examined by applying a singular value decomposition to the
variance-covariance matrix of the estimates. Collinearity is also determined by investigating the
amount of upward bias present in the coefficients of two Ridge regressions using an H-K ϴ and
4*H-Kϴ values that introduce increasing levels of bias into the model. Secondly, the assumption
of homoscedasticity is examined by comparing the OLS estimates statistical significance against a
semi-parametric White regression and a Newey-West regression. Thirdly, the influence of spatial
dependence upon the regression estimates is inspected given that the selection of the
observations in the dataset used for this hedonic price regression was not random but dependent
upon their spatial adjacency to one another. This is determined by comparing the OLS results
against those of robust and non-robust Bayesian spatial error models. Lastly, the influence of
outliers is again reconciled with the conclusion of PART 2 of the assignment in order to arrive at
the most appropriate model for the hedonic price regression of the dataset after considering the
Gauss-Markov assumptions.

Logan Travis
18 ECO 4313
P a g e 18 | 23
COLLINEARITY
The presence of collinearity in the regression results is first explored using a Belsley-Kuh-
Welsch variance decomposition. This tabulates the variance proportions of the variables wherein
a possible near linear relationship is demonstrated when two proportions within the same
condition index are above the 0.50 threshold.
Figure 1: Belsley-Kuh-Welsch Variance Decomposition
According to Figure 1, there is possible collinearity exhibited between the age and age2
variable. There is also evidence of possible collinearity between the number of bedrooms and the
total number of rooms in the house. There is a possible degradation of the OLS estimated
coefficients’ precision due to this collinearity. Omitted variables bias may be present in the
estimates due to the degradation of precision in the OLS regression when two variables become
more correlated. Two ridge regressions with increasing amount of bias, applied using the H-Kϴ
and (4*H-Kϴ) values, are compared against the OLS estimate results. Any change in statistical
significance of the variables is indicative of the presence of a statistical problem arising from the
near linear relationship identified by the BKW diagnostics, since these tend to blow up the
variance of the coefficient estimates.

Logan Travis
19 ECO 4313
P a g e 19 | 23
Figure 2: OLS and Ridge Regression Results
Figure 2 reveals that the 𝛽, or coefficient estimates, show upward bias in the sqft living
area and lotsize between the three regressions. The significance for the variables remains
unchanged throughout the three regression results.
Values of Regression Coefficients as a Function of
Figure 3 shows a plot of estimates for the variables on the vertical axis, with increasing
amounts of bias as theta increases on the horizontal axis versus the unbiased 𝛽 𝑂𝐿𝑆 at the origin on
the horizontal axis. Any sloping lines demonstrate variables exhibiting upwards bias, and the
vertical line is the H-Kϴ value with a moderate level of bias introduced. This is the case for sqft
living area and lotsize which further illustrates the conclusions in Figure 2.
The conclusion of this part of the assignment is that there is no problem with collinearity
regardless of the implication of a near linear relationship from the BKW diagnostics.
HETEROSCEDASTICITY
The Gauss-Markov assumption of homogeneity, or homogeneous variance of the disturbances is
investigated by comparing the significance of the OLS regression results against
White and Newey-West regression. The Newey-West regression results show Heteroscedastic
Autocorrelation Consistent Estimates that allow for both heteroscedasticity and serial correlation.
If the dataset observations are in order of size, then spatial correlation may mimic serial and
provide a false positive. The presence of heteroscedasticity is demonstrated in this test by a
change in the significance level of a variable from one regression to another.

Logan Travis
20 ECO 4313
P a g e 20 | 23
Figure 4: Regression Results for Heteroscedasticity
Figure 4 shows the change in significance at the 90% level, from the previous insignificant
level in the the OLS regression, of the number of rooms in a house in both White and Newey-
West. The linear age of a house also increases its significance to the 90% level from the previous
insignificance in the OLS regression in both White and Newey-West regressions. The quadratic
age of a house increases its significance from 90% in the OLS regression to 95% in the White
regression. However, the Newey-West regression shows no change in the significance level from
the OLS regression. This indicates that there might be some slight heteroscedasticity in the OLS
model with regards to the linear and quadratic form of house age and the number of rooms in a
house.
SPATIAL DEPENDENCE
The presence of spatial dependence is probable due to the selection method of the
observations in the dataset being the result of spatially adjacent houses. Spatial dependence is
therefore studied because it is logical that the selling price of a house is influenced by the
disturbances of the houses neighboring it. This logic is scrutinized by comparing the OLS
regression results against non-robust and robust Bayesian spatial error models. A lambda ( )
that is statistically significant indicates that this logic holds true for the houses in the dataset.
Heteroscedasticity in the presence of spatial correlation is also investigated using the change in
the significance levels of the variables and the change in 𝛽 point to outliers from the three
regression models.

Logan Travis
21 ECO 4313
P a g e 21 | 23
Figure 5: OLS and Bayesian SEM Models
First, the value for lambda in both Bayesian models in Figure 5 denotes the asymptotic t-
statistic which is significant to the 99% level. This indicates the presence of spatial correlation in
the houses in the dataset. The number of rooms in a house increases its significance from the
90% to the 95% level from the non-robust Bayesian to the robust Bayesian model, which is
indicative of heteroscedasticity in the presence of spatial correlation. The coefficient estimates
remain the same between the two Bayesian models. However, the quadratic age variable loses
its 90% significance level from OLS to the Bayesian models and the log sqft living area variable
drops from 99% to 95% significance level. These results are indicative of the heteroscedasticity
in the presence of spatial correlation in the OLS regression estimates.
OUTLIERS
The first assignment concluded that outliers were not influencing the fit of the OLS
regression model that was found to be most appropriate. It is again necessary to determine if
outliers are influencing the results of this assignment as well. If outliers are found to have a
statistically significant effect on the models in this assignment then the most appropriate model
are the estimates provided by the Robust Spatial Error Model. Outliers are determined to have
an effect on the regression results if the estimated coefficients change between the
aforementioned robust model and the OLS regression.
In Figure 6, the number of rooms becomes significant to the 95% level and linear age
becomes significant to the 90% level in the robust model. There is variation in the coefficient
estimates as modelled by the spatial error models displayed in Figure 5. Thus it is concluded that
there could be a problem with outliers in the observations.

Logan Travis
22 ECO 4313
P a g e 22 | 23
Figure 7 plots the residuals along a horizontal axis of houses sizes that ordered from
smallest to largest. Heteroscedasticity is indicated in this plot by a funnel shape in the
distribution of these points in two-dimensional space. This is also useful for viewing the outliers
in the dataset. Figure 7 shows a wide distribution of residuals with no discernible pattern
indicative of heteroscedasticity. A couple of outliers may be present around residual 5 and 110
but for the plot illustrates an even distribution of the residuals that does not indicate the
presence of many outliers.
Figure 6: Ordered House Size and Residual Values
Figure 7: OLS Vi plot for outliers and hetero

Logan Travis
23 ECO 4313
P a g e 23 | 23
Figure 8: Robust Gibbs Vi Plot
Figure 8 and 9 show no funnel shape and therefore no heteroscedasticity but many large
spikes which could be indicative of outliers. However, the volatility of the plot indicates that these
spikes are relatively common enough to not indicate a significant influence of outliers on the fit of
the OLS regression model. While Figure 6 seems to indicate an outlier problem, it seems that it
may be that it is slight heteroscedasticity in the presence of spatial correlation in the OLS
regression results.
CONCLUSION
My previous conclusion was for an OLS regression using the log-transformed versions of
sqft living area and lotsize. However, an examination of the homogeneity of variance, spatial
correlation and collinearity of the variables in the regressed concluded that the number of rooms
may be a statistically significant predictor in selling price of a house but had a degraded t-statistic
due to violation of all of these Gauss-Markov assumptions for ordinary least squares estimates.
The quadratic from the age of the house may also be a statistically significant predictor but its
estimated coefficient is not economically different from zero. The proper model to use is the
Robust Spatial Error Model due to the presence of spatial correlation, and outliers in the
observations.

Logan Travis Lucas Co

More Related Content

Viewers also liked

Similar to Logan Travis Lucas Co

Logan Travis Lucas Co