In this project, we use a statistical multiple regression to study the impact of eight various predictors (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area distribution) to estimate the cooling load energy efficiency of residential buildings. We try to analyze and visualize the effect of each predictor with each of the response variable using different classical statistical analysis tools used in describing linear models, in such a way so that we can find out the most strongly related predictor variables. Before starting all of this, we used the idea of model selection by stepwise regression technique and compare the AIC of these models and identified a better model between all of them. Then, we compare a classical linear regression approach by simulations on 768 diverse residential buildings show that we can predict CL with low mean absolute error. By using ANOVA we determined variation in the different residuals. Also, we used non constant variance test to verify it. Furthermore, we check leverage and influence points as well as outliers as well as determined cook distance for influential points. By taking box cox transformation and weights, we also introduced WLS technique to fit the model for better results and did all type of important analysis to understand the energy efficiency. Finally, we should 5-fold cross validation to verify our model.
Source of used data:
The dataset was created by Angeliki Xifara (angxifara '@' gmail.com, Civil/Structural Engineer) and was processed by Athanasios Tsanas (tsanasthanasis '@' gmail.com, Oxford Centre for Industrial and Applied Mathematics, University of Oxford, UK).
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Multiple linear regression for energy efficacy of residential buildings
1. FAHAD BIN MOSTAFA
TEXAS TECH UNIVERSITY
MAY 05, 2020
Texas Tech University
Multiple Linear Regression for
Cooling load efficiency of residential buildings
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 1
3. Background
Primary aim of this regression analysis is to show statistical
significance of many statistical technique to analysis cooling load of
buildings.
Perform energy analysis using 12 different building shapes simulated
in Ecotect. The buildings differ with respect to the glazing area, the
glazing area distribution, and the orientation, amongst other parameters.
Simulate various settings as functions of the afore-mentioned
characteristics to obtain 768 building shapes.
Dataset comprises 768 samples and 8 features, aiming to predict two
real valued responses. However, we only work with one response.
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 3
4. DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 4
Variables Descriptions
𝑋1 Relative compactness
𝑋2 Surface area
𝑋3 Wall area
𝑋4 Roof area
𝑋5 Overall Hight
𝑋6 Orientation
𝑋7 Glazing area
𝑋8 Glazing area distribution
𝑌𝐶𝐿 Cooling Load
Nomenclature of predictors and response
5. Methodology
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 5
The multiple linear regression model is
𝑌𝑖~ 𝑁 β 0+ X 𝑖1β 1+ . . . + X 𝑖𝑝β 𝑝 , 𝜎2
1
Using Matrix Form, β 𝑜𝑙𝑠 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦
Hat matrix, 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇
𝐻 involves the weights ℎ𝑖𝑖; 𝑖 = 1 … 𝑛 depends on predictors.
Cook distance, 𝐷𝑖 =
𝑒𝑖
2
𝑝𝑠2 [
ℎ𝑖𝑖
1−ℎ𝑖𝑖
2]
Box Cox Transformation, gλ y =
𝑦λ−1
λ
; ; λ ≠ 0
log λ ; λ = 0
(2)
The weighted least squares estimate, β 𝑊𝐿𝑆 = (𝑋 𝑇
𝑊𝑋)−1
𝑋 𝑇
𝑊𝑦
K-fold cross validation: Comparing RMSE with model RMSE
8. Based on AIC why not adj-R-square!!!
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 8
Table 03: summary statistics of the model 3
Coefficients Estimates Std. Error Pr(> |t|)
Intercept 97.336561 20.754252 3.24e-06
Relative compactness -70.787707 11.219992 4.76e-10
Surface area -0.088245 0.018620 < 2e-16
Overall height 4.283843 0.368557 1.15e-09
Wall area 0.044682 0.007249 1.15e-09
Orientation 0.121510 0.103269 0.24
Glazing area 14.817971 0.867239 < 2e-16
9. Result for Model 2
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 9
Table 05: Analysis of Variance Table (ANOVA)
Source DF Sum of Square Mean Square F value Pr(>F)
Relative compactness 1 27931.9 27931.9 2726.885 < 2.2e−16
Surface area 1 8254.2 8254.2 805.823 < 2.2e−16
Overall height 1 22046.5 22046.5 2152.309 < 2.2e−16
Wall area 1 389.0 389.0 37.973 1.162e−10
Glazing area 1 2988.9 2988.9 291.797 < 2.2e−16
Residuals 762 7805.3 10.2
Total 767
For this multiple linear regression, we have
𝐻0: 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 𝛽5 = 0
𝐻1: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽 𝑖𝑠 𝑛𝑜𝑡 𝑧𝑒𝑟𝑜
The null hypothesis claims that there is no significant correlation at all. That is, all of the
coefficients are zero and none of the variables belong in the model.
Test of heteroscedasticity for the model can be done by the following test
𝐻0: 𝜎2′
𝑠 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙
Chi-square = 187.5252, Df = 1, p = 2.22 𝑒−16
. At
5% level of significance we can say that we do have
sufficient evidence to reject null hypothesis. So,
variances are not equal. So, it has a problem with
heteroscedasticity.
10. Result for Model 2
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 10
11. Exploring Cooling Load
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 11
λ=-0.8 and is extremely close to the maximum, which suggests a transformation of the form
𝐶𝑜𝑜𝑙𝑖𝑛𝑔_𝐿𝑜𝑎𝑑λ
− 1
λ
=
𝐶𝑜𝑜𝑙𝑖𝑛𝑔_𝐿𝑜𝑎𝑑−0.8
− 1
−0.8
12. Result for WLS model
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 12
Table 06: summary for WLS with box cox transformed response
Coefficients Estimates Std. Error Pr(>|t|)
Intercept 9.424e−01
6.312e−02
< 2e−16
Relative compactness 1.902e−02
3.416e−02
0.5777
Surface area 7.122e−05
5.674e−05 0.20977
Overall height 1.878e−02
1.13e−03
< 2e−16
Wall area 7.904e−05
2.236e−05 1.16e−09
Glazing area 5.863e−02
2.638e−03
< 2e−16
In the WLS model the Residual standard
error: 0.009118 on 762 degrees of freedom,
Multiple R-squared: 0.9137, Adjusted R-
squared: 0.9131.
F-statistic: 1613 on 5 and 762 DF, p −
value: < 2.2e−16
Table 05: Analysis of Variance Table (ANOVA) for WLS model
Source DF Sum of Square Mean Square F value Pr(>F)
Relative compactness 1 0.33606 0.33606 4042.453 < 2.2e−16
Surface area 1 0.04954 0.04954 595.887 < 2.2e−16
Overall height 1 0.24296 0.24296 2922.628 < 2.2e−16
Wall area 1 0.00103 0.00103 12.448 0.0004434
Glazing area 1 0.04106 0.04106 493.969 < 2.2e−16
Residuals 762 0.06335 0.00008
Total 767
13. Result for Model 2
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 13
15. Output of final WLS model
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 15
16. Result for WLS model
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 16
Table 06 some extreme values from WLS model diagnosis
Data points Standardized Residual Hat CookD
22 -0.1372486 0.01772626 5.672951𝑒−05
24 -0.2019629 0.01773586 1.229031𝑒−04
45 -2.8854279 0.01386842 1.932885𝑒−02
48 -2.9225801 0.01387279 1.983057𝑒−02
19. Conclusion
rate of change with respect to surface area, overall height, wall area and grazing area has a
positive effect on cooling load
however wall area as well as surface area is numerically small in case of rate of change.
MLR is not a good model to predict cooling load because we are loosing important predictors.
Although cross validation verified a good fit.
Elastic net could be a better model because we can use two different penalties with
regularization parameters which avail from CV.
DEPARTMENT OF MATHEMATICS AND STATISTICS, TEXAS TECH UNIVERSITY 19