This document provides an overview of regression analysis and the steps involved in building a regression model to predict property values. It discusses the history of regression, uses of regression, and key components like the dependent and independent variables. The document walks through exploring the data, specifying the model, running regression, and interpreting the results. It shows how adding additional relevant variables like size, land size, age, and quality can improve the predictive power of the model. Transforming categorical variables is also discussed. The goal is to create a model that explains as much of the variation in sales prices as possible.
Call Girls In Hari Nagar Dadb Block {8447779280}Hari Nagar {West Delhi Escort...
Getting started with regression
1. Getting Started with Regression
$700,000
$600,000
$500,000
Sales Prices
$400,000
$300,000
$200,000
$100,000
$100,000 $200,000 $300,000 $400,000 $500,000 $600,000
Predicted Values
Presented By: Tim Wilmath, MAI
Prepared For: Florida IAAO
2. History of Regression
James Galton created
Regression Analysis in 1885
when he was attempting to
predict a person’s height based
on the height of his or her
parent.
3. History of Regression
Galton found that children born to tall parents would be
shorter than their parents - and children born to short
parents would be taller than their parents. Both groups
of children regressed toward the mean height of all children.
7. What is Regression?
When Regression Analysis is used to predict sales
prices or establish assessments it becomes an
Automated Sales Comparison Approach
8. Steps in Regression
1. Data Exploration and cleanup
2. Specifying the model
3. Calibrating the model
4. Interpreting the results
9. Data Exploration & Cleanup
Is there a pattern suggesting a
relationship between variables?
800000
700000
Note the outliers.
600000 These will adversely
SALES PRICE
500000
affect our final values
400000
if we don’t deal with
300000
200000 them now
100000
0
0 1000 2000 3000 4000 5000 6000 7000
HEATED AREA
Because of the potential for extreme values to influence the mean,
modelers often remove or “trim” extreme values.
10. Model Specification
Specifying the model means picking the appropriate
equation and which variables that will be used.
Models can be:
• Additive - Most common for residential properties
• Multiplicative- Often used for land valuation
• Hybrid - Most advanced
We are going to use an Additive Model
in this presentation
12. Simple Regression
Simple Regression includes one Dependent
Variable (sales price) and only one Independent
Variable - such as Square Footage.
500000
400000
SALES PRICE
300000
200000
100000
Using this model,
a 1,000 sf home would 0
0 1000 2000 3000 4000 5000
be valued at $75,000
HEATED AREA
13. Simple Regression
Simple Regression using only size as the independent
variable will predict sales prices, however, it will
treat all homes with the same size equally.
1,000 square feet - $75,000 1,000 square feet - $75,000?
14. Multiple Regression
We know square footage is an important variable
but what other variables should we include
and how do we decide?
Effective Age
Actual Age
View
15. Correlation Analysis
Pearson’s Correlation tells you the degree of
relationships between variables.
Correlations
SALEPRICE BLDSIZE BEDROOMS DOCK
SALEPRICE Pearson Correlation 1 0.855 0.557 0.142
Sig. (2-tailed) . 0 Notice the high 0
0
N 1367 1367 1367 1367
correlation between
BLDSIZE Pearson Correlation 0.855 1 0.659 0.062
sales price and size
Sig. (2-tailed) 0. 0 0.021
N 1367 1367 1367 1367
BEDROOMS Pearson Correlation 0.557 0.659 1 0.037
Sig. (2-tailed) 0 0. 0.176
N 1367 1367 1367 1367
DOCK Pearson Correlation 0.142 0.062 0.037 1
Sig. (2-tailed) 0 0.021 0.176 . little
Very
N 1367 1367 1367 1367
correlation between
sales price and dock
Correlation Analysis also helps identify “Collinearity”, which is a correlation between 2 independent variables. For example, the living area
of a home is highly correlated to the number of bedrooms. It would only be necessary to have one of these variables in the model.
17. Running Regression
Statistical Software makes using Regression much easier,
performing the necessary calculations quickly and accurately.
Let’s Run
This!
18. Regression Results
Model 1
The Output tells us how good our model is working
Model Summary
The closer the
Adjusted R Std. Error of the
Model R R Square Square EstimateR-Square
Adj. is to “1”
1 .855(a) .732 .731 25406.53266545
the better
a Predictors: (Constant), BLDSIZE
And - it gives us the coefficients (or adjustments)
Coefficients(a)
Unstandardized Coefficients
Standardized
Coefficients
$6,838
Model B Std. Error Beta t + Bldsize x $75.07
Sig.
1 (Constant) 6838.585 2195.717 3.115 .002
BLDSIZE 75.068 1.231 .855 60.997
= Property Value
.000
a Dependent Variable: SALEPRIC
The adjusted R2 statistic measures the amount of total variation explained by the Regression Model. It ranges from 0.00 to 1.00 with 1.00
being the desired value. A high number, say 0.910 means that approximately 91% of the value can be explained by the model.
19. Regression Results
The output includes the coefficient and the “Constant”
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 6838.585 2195.717 3.115 .002
BLDSIZE 75.068 1.231 .855 60.997 .000
a Dependent Variable: SALEPRIC
The “Constant” represents the un-explained
value that is not included in the model.
20. Running Regression
Let’s add another variable to the model - Say Land Size
Let’s run
this model and
see if results
improve.
21. Regression Results
Model 2
Our Adj. R2 went up from
Model Summary .731 to .801!
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .895(a) .801 .801 21864.78975921
a Predictors: (Constant), LANDSF, BLDSIZE
We also have new coefficients (or adjustments)
Coefficients(a)
$6,119
Standardized
Unstandardized Coefficients Coefficients
+ Bldsize x $72.66
Model B Std. Error Beta t Sig.
1 (Constant) 6119.232 1889.914 3.238 .001
BLDSIZE 72.660 1.065 .828 68.237 .000
+ Landsf x $0.382
LANDSF .382 .017 .266 21.887 .000
a Dependent Variable: SALEPRIC = Property Value
22. Running Regression
Let’s add Age to the model
If Age is
significant
to value, the model
should improve.
Let’s run it.
23. Regression Results
Model 3
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate Adj.
Our R2 went up from
1 .912(a) .832 .832 20114.04445033
a Predictors: (Constant), AGE, LANDSF, BLDSIZE .801 to .832!
Notice the age coefficient is negative
Coefficients(a)
Unstandardized Coefficients
Standardized
Coefficients
$22,855
Model B Std. Error Beta t Sig. + Bldsize x $67.28
1 (Constant) 22855.587 2036.809 11.221 .000
BLDSIZE 67.276 1.037 .767 64.856 .000 + Landsf x $0.44
LANDSF .444 .017 .309 26.868 .000
AGE -630.763 39.991 -.189 -15.773 .000 + Age x ($630.76)
a Dependent Variable: SALEPRIC
= Property Value
25. Regression Results
Model 4 Our Adj. R2 went up from
Model Summary
.832 to .854 after
Adjusted R Std. Error of the adding quality, but
Model R R Square Square Estimate
1 .924(a) .854 .853 18784.15717760
a Predictors: (Constant), QUAL, LANDSF, AGE, BLDSIZE
Notice the constant is now negative - that’s not good!
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
What do we do with this
Model B Std. Error Beta t Sig.
1 (Constant)
BLDSIZE
-45723.503 5199.675 -8.794 .000 quality adjustment?
59.808 1.103 .681 54.234 .000
LANDSF .445 .015 .309 28.831 .000
AGE -605.886 37.388 -.182 -16.205 .000
QUAL 26110.420 1842.475 .171 14.171 .000
a Dependent Variable: SALEPRIC
26. Regression Results
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -45723.503 5199.675 -8.794 .000
BLDSIZE 59.808 1.103 .681 54.234 .000
LANDSF .445 .015 .309 28.831 .000
AGE -605.886 37.388 -.182 -16.205 .000
QUAL 26110.420 1842.475 .171 14.171 .000
a Dependent Variable: SALEPRIC
Quality Resulting Adjustment This doesn’t make
1 - Fair = 1 x $26,110 = $26,110
sense because the
2 - Average = 2 x $26,110 = $52,220
3 - Good = 3 x $26,110 = $78,330 codes 1,2,3, etc.
4 - Excellent = 4 x $26,110 = $104,440 were not meant
5 - Superior = 5 x $26,110 = $130,550
to be a rank
27. A Note about Data Types
There are 3 primary types of property Characteristics:
• Continuous: Based on a size or measurement.
Examples: Square Footage or Lot Size
• Discrete: Specific pre-defined value.
Examples: Roof Material, Building Quality
• Binary: Either the item is present or not
Examples: corner location, Lakefront Location
28. Transformations
To solve the problem we need to convert the “discrete”
variable Quality into individual “binary” variables
which allows Regression to distinguish each type:
Fair - Yes/No
Average - Yes/No
“Quality” BECOMES Good - Yes/No
Excellent - Yes/No
Superior - Yes/No
29. Running Regression
Now that we have transformed the variable Quality
we can put it back in the model
Notice we left
“Average” out
30. Regression Results
Our Adj. R2 went up from
Model Summary
Model 5 .832 to .869.
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .933(a) .870 .869 17717.09739523
a Predictors: (Constant), SUPERIOR, EXCEL, AGE, FAIR, GOOD, LANDSF, BLDSIZE
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error
These Quality
Beta t Sig.
1 (Constant) 35633.753 1922.792 18.532 .000
BLDSIZE 58.537 1.045 adjustments
.667 56.031 .000
LANDSF .419 .016 .291 26.342 .000
AGE -625.742 35.363 -.188 -17.695 .000
FAIR -25511.289 8693.178
are all relative to
-.031 -2.935 .003
GOOD 21095.623 1838.228 .127 11.476 .000
EXCEL
SUPERIOR
75844.967 12720.934 “Average”
.059 5.962 .000
305671.839 18494.059 .169 16.528 .000
a Dependent Variable: SALEPRIC
31. Running Regression
Let’s transform Neighborhood into a binary and
add it to the model
Notice we left
out the“Base”
Neighborhood
(the most typical)
32. Regression Results
Model 6 Our Adj. R2 went up from
Model Summary
.869 to .874.
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .936(a) .875 .874 17391.93018134
a Predictors: (Constant), NB211006, BLDSIZE, EXCEL, FAIR, SUPERIOR, NB211002,
NB211001, NB211005, AGE, LANDSF, GOOD, NB211003
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model
1 (Constant)
B Std. Error Beta These Neighborhood
t Sig.
40799.859 2299.668 17.742 .000
BLDSIZE 56.000 1.143 .638 48.980 .000
LANDSF
AGE
.423
-671.493
.016
37.221
.294
-.201
25.753
-18.041
adjustments
.000
.000
FAIR -33476.331 8602.963 -.041 -3.891 .000
GOOD
EXCEL
17371.495
72617.618
2023.937
12567.147
.105
.057
are all relative to
8.583
5.778
.000
.000
SUPERIOR 313444.055 18313.237 .173 17.116 .000
NB211001
NB211002
14199.881
-3514.034
2321.457
1657.862
.070
-.025
6.117
-2.120
our “Base”
.000
.034
NB211003 -1483.623 1244.877 -.015 -1.192 .234
NB211005
NB211006
4044.357
1915.755
2266.186
2601.773
.021
.008
1.785
Neighborhood
.736
.075
.462
a Dependent Variable: SALEPRIC
33. Running Regression
Multiplicative Transformations combine two variables into one
Square Footage x Quality = SQFT1
Reflects the fact that quality may contribute greater value in larger homes and less value in
smaller homes. In other words, without combining these variables, all Good Quality homes get
the same adjustment regardless of their size. Let’s add this new combined variable to the model.
Since we combined SF
and Quality, we remove
them as stand-alone
variables
34. Regression Results
Our Adj. R2 went up from
Model Summary Model 7 .874 to .879.
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .938(a) .880 .879 17065.96846831
a Predictors: (Constant), SQFT5, SQFT4, AGE, NB211002, SQFT2, SQFT1, NB211006,
NB211001, NB211005, LANDSF, NB211003, SQFT3
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 43999.158 2299.663 Notice the adjustments
19.133 .000
LANDSF .418 .016 .291 25.996 .000
AGE -660.473 36.505 -.198 -18.092 .000
NB211001 10975.273 2335.844
went from fixed dollar
.054 4.699 .000
NB211002 -3611.418 1624.028 -.026 -2.224 .026
NB211003
NB211005
-1250.573
6350.688
1221.119
2243.206
-.013
.033
-1.024
2.831
amounts to
.306
.005
NB211006 1923.311 2554.324 .008 .753 .452
SQFT1
SQFT2
21.119
53.673
8.533
1.169
.026
.723
“per square foot”
2.475
45.916
.013
.000
SQFT3 63.139 1.074 .964 58.806 .000
SQFT4 77.267 3.557 .210 21.720 .000
SQFT5 108.100 2.941 .356 36.759 .000
a Dependent Variable: SALEPRIC
35. Advanced Transformations
Exponential transformations - Raise variable to a power
Land Size x .75 = LAND75
Reflects the principle of diminishing returns. The unit price of land
tends to decrease as size increases. Without this transformation land
would get the same adjustment, regardless of size. Raising land size
to the power of .75 reflects the curve shown below.
SINGLE FAMILY LOT PRICES
$2.85
$2.80
PRICE PER SF
$2.75
$2.70
$2.65
$2.60
$2.55
$2.50
$2.45
$2.40
00
00
00
00
50
00
10
00
00
00
0
0
0
0
00
00
00
00
50
50
53
56
57
58
58
58
70
90
11
15
20
30
LOT SIZE
39. Regression Results
Our Adj. R2 went up from
Model 9 .881 to .895.
Model Summary(b)
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .947(a) .897 .895 15854.87728402
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 29680.695 2885.532 10.286 .000
AGE -705.817 38.491 -.212 -18.337 .000
NB211001 12374.064 2176.815 .061 5.684 .000
NB211002 -1094.891 1527.977 -.008 -.717 .474
NB211003 -938.838 1136.671 -.010 -.826 .409
NB211005 12639.946 2139.489 .066 5.908 .000
NB211006 852.109 2535.266 .004 .336 .737
SQFT1 31.388 7.815 .039 4.016 .000
SQFT2 44.166 1.365 .595 32.349 .000
SQFT3 52.939 1.265 .808 41.857 .000
SQFT4 60.447 3.561 .164 16.974 .000
SQFT5 94.723 2.943 .312 32.186 .000
LAND75 11.788 .433 .303 27.240 .000
BATHS 7714.093 1338.204 .076 5.765 .000
POOL 13359.275 1184.469 .105 11.279 .000
GARAGE 10.750 3.137 .038 3.427 .001
a Dependent Variable: SALEPRIC
40. Regression Results
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 35633.753 1922.792 18.532 .000
BLDSIZE 58.537 1.045 .667 56.031 .000
LANDSF .419 .016 .291 26.342 .000
AGE -625.742 35.363 -.188 -17.695 .000
FAIR -25511.289 8693.178 -.031 -2.935 .003
GOOD 21095.623 1838.228 .127 11.476 .000
EXCEL 75844.967 12720.934 .059 5.962 .000
SUPERIOR 305671.839 18494.059 .169 16.528 .000
a Dependent Variable: SALEPRIC
The “Beta” value in column 4 indicates the partial correlation
of the variable. It is used in stepwise regression in deciding
which variable to add next.
41. Regression Results
The significance of each variable to the model can be determined
by looking at the “t” values. Rule of Thumb:
Coefficients(a)
Standardized “t” scores should
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t be 2.0 or greater
Sig.
1 (Constant) 29680.695 2885.532 10.286 .000
AGE -705.817 38.491 -.212 -18.337 .000
NB211001 12374.064 2176.815 .061 5.684 .000
NB211002 -1094.891 1527.977 -.008 -.717 .474
NB211003 -938.838 1136.671 -.010 -.826 .409
NB211005 12639.946 2139.489 .066 5.908 .000
NB211006 852.109 2535.266 .004 .336 .737
SQFT1 31.388 7.815 .039 4.016 .000
SQFT2 44.166 1.365 .595 32.349 .000
SQFT3 52.939 1.265 .808 41.857 .000
SQFT4 NB211002
60.447 3.561 .164 16.974 .000
SQFT5 94.723 2.943 .312 32.186 .000
LAND75
NB211003
11.788 .433 .303 27.240 .000
BATHS 7714.093 1338.204 .076 5.765 .000
POOL
NB211006
13359.275 1184.469 .105 11.279 .000
GARAGE 10.750 3.137 .038 3.427 .001
are insignificant
a Dependent Variable: SALEPRIC
42. Regression Results
Coefficients(a)
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 35633.753 1922.792 18.532 .000
BLDSIZE 58.537 1.045 .667 56.031 .000
LANDSF .419 .016 .291 26.342 .000
AGE -625.742 35.363 -.188 -17.695 .000
FAIR -25511.289 8693.178 -.031 -2.935 .003
GOOD 21095.623 1838.228 .127 11.476 .000
EXCEL 75844.967 12720.934 .059 5.962 .000
SUPERIOR 305671.839 18494.059 .169 16.528 .000
a Dependent Variable: SALEPRIC
The “t-statistic” is calculated by dividing the coefficient of
a variable by its standard error. For example: for the variable
BLDSIZE, the “t-statistic” is calculated as follows:
58.537 / 1.045 = 56.0
43. Regression Results
Model Summary(b)
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .947(a) .897 .895 15854.87728402
The “Standard Error of the Estimate” in the regression model tells us
how much a sale estimate will vary from its actual value.
This number alone is meaningless unless related to the average
sales price in the sale sample. Dividing the Standard Error by
the Average SalesPrice produces the Coefficient of Variation (COV)
$15,854 / $134,043 = 11.82% COV
44. Regression Options
“Enter” is the default regression method in most statistical software
programs. This method includes all variables “entered” by the modeler.
“Stepwise” multiple regression automatically eliminates
redundant or insignificant variables.
Coefficients(a)
Model: 4
Notice that Stepwise
Standardized
Unstandardized Coefficients Coefficients
B Std. Error Beta t
Regression
Sig.
(Constant) 28624.283 2584.025 11.077 .000
AGE
NB211001
-697.862 37.689 -.209 “kicked out” the
-18.516 .000
12794.553 2071.093 .063 6.178 .000
NB211005 13302.885 1969.163 .069 6.756 .000
SQFT1
SQFT2
31.406
44.305
7.797
1.354
.039
.597 neighborhoods that had
4.028
32.723
.000
.000
SQFT3 53.134 1.249 .811 42.525 .000
SQFT4 60.544 3.557 .164 17.023 .000
SQFT5 94.884 2.924 .313 low “t-scores"
32.446 .000
LAND75 11.891 .393 .305 30.243 .000
BATHS 7732.836 1332.987 .076 5.801 .000
POOL 13317.394 1179.165 .105 11.294 .000
GARAGE 10.586 3.047 .037 3.474 .001
a Dependent Variable: SALEPRIC
45. Creating New Assessments
Once you have calibrated
your model, the Regression
software allows you to predict
the new values (or assessments)
using the coefficients
(or adjustments) you created.
46. Reviewing Ratio Statistics
Once the new assessments are created using our final model, we can
review the accuracy of our new values using traditional ratio statistics.
Ratio Statistics for ASSESS Unstandardized Predicted Value / SALEPRIC
Weighted Mean 1.000
Price Related Differential 1.008
Coefficient of Dispersion .079
Coefficient of Variation Mean Centered 11.1%
Median Centered 11.2%
47. Valuing the Population
Valuing the population requires transforming the same variables
you used in the model, then applying the coefficients to those variables.
This can be done internally within some CAMA systems, using
Microsoft Excel or other spreadsheet software, or within the
regression software.
Valuing the population is one of the most difficult aspects
of regression modeling because changes in the physical attributes of
any one parcel often requires re-running the entire model and
re-calculating values.
48. Conclusion
Predicting assessments using Regression requires the appraiser to:
• Explore data to determine relationships and cleanup outliers
• Specify which model and variables will be used
• transform variables and run regression
• Review Results, modify or add variables
• Create predicted assessments and review ratio statistics
• Value Population using final coefficients
49. The End
500000
400000
SALE PRICES
300000
200000
100000
0
0 100000 200000 300000 400000 500000
Predicted Values