Multiple linear regression for energy efficacy of residential buildings

FAHAD BIN MOSTAFA
TEXAS TECH UNIVERSITY
MAY 05, 2020
Texas Tech University
Multiple Linear Regression for
Cooling load efficiency of residential buildings
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 1

Outlines
 Background
 Methodology
 Results
 Conclusion

Background
 Primary aim of this regression analysis is to show statistical
significance of many statistical technique to analysis cooling load of
buildings.
 Perform energy analysis using 12 different building shapes simulated
in Ecotect. The buildings differ with respect to the glazing area, the
glazing area distribution, and the orientation, amongst other parameters.
 Simulate various settings as functions of the afore-mentioned
characteristics to obtain 768 building shapes.
 Dataset comprises 768 samples and 8 features, aiming to predict two
real valued responses. However, we only work with one response.

Variables Descriptions
𝑋1 Relative compactness
𝑋2 Surface area
𝑋3 Wall area
𝑋4 Roof area
𝑋5 Overall Hight
𝑋6 Orientation
𝑋7 Glazing area
𝑋8 Glazing area distribution
𝑌𝐶𝐿 Cooling Load
Nomenclature of predictors and response

Methodology
The multiple linear regression model is
𝑌𝑖~ 𝑁 β 0+ X 𝑖1β 1+ . . . + X 𝑖𝑝β 𝑝 , 𝜎2
1
Using Matrix Form, β 𝑜𝑙𝑠 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦
Hat matrix, 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇
𝐻 involves the weights ℎ𝑖𝑖; 𝑖 = 1 … 𝑛 depends on predictors.
Cook distance, 𝐷𝑖 =
𝑒𝑖
2
𝑝𝑠2 [
ℎ𝑖𝑖
1−ℎ𝑖𝑖
2]
Box Cox Transformation, gλ y =
𝑦λ−1
λ
; ; λ ≠ 0
log λ ; λ = 0
(2)
The weighted least squares estimate, β 𝑊𝐿𝑆 = (𝑋 𝑇
𝑊𝑋)−1
𝑋 𝑇
𝑊𝑦
K-fold cross validation: Comparing RMSE with model RMSE

Exploratory Analysis

Model Selection

Based on AIC why not adj-R-square!!!
Table 03: summary statistics of the model 3
Coefficients Estimates Std. Error Pr(> |t|)
Intercept 97.336561 20.754252 3.24e-06
Relative compactness -70.787707 11.219992 4.76e-10
Surface area -0.088245 0.018620 < 2e-16
Overall height 4.283843 0.368557 1.15e-09
Wall area 0.044682 0.007249 1.15e-09
Orientation 0.121510 0.103269 0.24
Glazing area 14.817971 0.867239 < 2e-16

Result for Model 2
Table 05: Analysis of Variance Table (ANOVA)
Source DF Sum of Square Mean Square F value Pr(>F)
Relative compactness 1 27931.9 27931.9 2726.885 < 2.2e−16
Surface area 1 8254.2 8254.2 805.823 < 2.2e−16
Overall height 1 22046.5 22046.5 2152.309 < 2.2e−16
Wall area 1 389.0 389.0 37.973 1.162e−10
Glazing area 1 2988.9 2988.9 291.797 < 2.2e−16
Residuals 762 7805.3 10.2
Total 767
For this multiple linear regression, we have
𝐻0: 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 𝛽5 = 0
𝐻1: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽 𝑖𝑠 𝑛𝑜𝑡 𝑧𝑒𝑟𝑜
The null hypothesis claims that there is no significant correlation at all. That is, all of the
coefficients are zero and none of the variables belong in the model.
Test of heteroscedasticity for the model can be done by the following test
𝐻0: 𝜎2′
𝑠 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙
Chi-square = 187.5252, Df = 1, p = 2.22 𝑒−16
. At
5% level of significance we can say that we do have
sufficient evidence to reject null hypothesis. So,
variances are not equal. So, it has a problem with
heteroscedasticity.

Result for Model 2

Exploring Cooling Load
λ=-0.8 and is extremely close to the maximum, which suggests a transformation of the form
𝐶𝑜𝑜𝑙𝑖𝑛𝑔_𝐿𝑜𝑎𝑑λ
− 1
λ
=
𝐶𝑜𝑜𝑙𝑖𝑛𝑔_𝐿𝑜𝑎𝑑−0.8
− 1
−0.8

Result for WLS model
Table 06: summary for WLS with box cox transformed response
Coefficients Estimates Std. Error Pr(>|t|)
Intercept 9.424e−01
6.312e−02
< 2e−16
Relative compactness 1.902e−02
3.416e−02
0.5777
Surface area 7.122e−05
5.674e−05 0.20977
Overall height 1.878e−02
1.13e−03
< 2e−16
Wall area 7.904e−05
2.236e−05 1.16e−09
Glazing area 5.863e−02
2.638e−03
< 2e−16
In the WLS model the Residual standard
error: 0.009118 on 762 degrees of freedom,
Multiple R-squared: 0.9137, Adjusted R-
squared: 0.9131.
F-statistic: 1613 on 5 and 762 DF, p −
value: < 2.2e−16
Table 05: Analysis of Variance Table (ANOVA) for WLS model
Source DF Sum of Square Mean Square F value Pr(>F)
Relative compactness 1 0.33606 0.33606 4042.453 < 2.2e−16
Surface area 1 0.04954 0.04954 595.887 < 2.2e−16
Overall height 1 0.24296 0.24296 2922.628 < 2.2e−16
Wall area 1 0.00103 0.00103 12.448 0.0004434
Glazing area 1 0.04106 0.04106 493.969 < 2.2e−16
Residuals 762 0.06335 0.00008
Total 767

Result for Model 2

Final WLS model

Output of final WLS model

Result for WLS model
Table 06 some extreme values from WLS model diagnosis
Data points Standardized Residual Hat CookD
22 -0.1372486 0.01772626 5.672951𝑒−05
24 -0.2019629 0.01773586 1.229031𝑒−04
45 -2.8854279 0.01386842 1.932885𝑒−02
48 -2.9225801 0.01387279 1.983057𝑒−02

Confidence Interval

5-fold CV

Conclusion
rate of change with respect to surface area, overall height, wall area and grazing area has a
positive effect on cooling load
however wall area as well as surface area is numerically small in case of rate of change.
MLR is not a good model to predict cooling load because we are loosing important predictors.
Although cross validation verified a good fit.
Elastic net could be a better model because we can use two different penalties with
regularization parameters which avail from CV.

References

Multiple linear regression for energy efficacy of residential buildings

Recommended

Recommended

More Related Content

Similar to Multiple linear regression for energy efficacy of residential buildings

Similar to Multiple linear regression for energy efficacy of residential buildings (20)

More from Fahad B. Mostafa

More from Fahad B. Mostafa (6)

Recently uploaded

Recently uploaded (20)

Multiple linear regression for energy efficacy of residential buildings