SlideShare a Scribd company logo
1 of 8
Download to read offline
GEO6161: Intermediate Quantitative Methods for Geographers
Laboratory-1
MULTIPLE REGRESSION
Kalaivanan Murthy
Page 1/8
I. PRILIMINARY ANALYSIS
1. Plot the Y’s vs individual X’s:
(Anti-clockwise) We observe that X1 is more scattered, X2 is closely linear, X3=1 has higher Y’s than
X3=0, X1-X3 and X2-X3 is not much inferential.
2. Run Naïve Model: 𝑌̂ = β0 + β1X1 + β2X2 + β3X3 + β4A
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1082.3793 165.4371 -6.543 8.17e-08 ***
X1 0.2397 3.4967 0.069 0.946
X2 1.2993 0.1620 8.019 7.45e-10 ***
X3 67.9740 47.9893 1.416 0.164
A 2.8016 7.1449 0.392 0.697
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 120.4 on 40 degrees of freedom
Multiple R-squared: 0.8635, Adjusted R-squared: 0.8498
F-statistic: 63.24 on 4 and 40 DF, p-value: < 2.2e-16
Page 2/8
Only is β2 significant at 95% significance. R2
=86% and R2
-adj=85% implies the model tries to explain 85% variability
in data and model is concise. The F-statistic p-value<0.05 implies atleast one of the β’s is significant at 95%
confidence level. In addition, AIC=565.61.
3. Correlation Matrix
Y X1 X2 X3 A
Y 1.00 0.77 0.93 0.50 -0.51
X1 0.77 1.00 0.83 0.47 -0.48
X2 0.93 0.83 1.00 0.46 -0.57
X3 0.50 0.47 0.46 1.00 -0.32
A -0.51 -0.48 -0.57 -0.32 1.00
There is a strong correlation between Y-X2, and X1-X2. Rejection region of r: t(α/2,n-2)=2.016 rmin=0.29 implies
anything above 0.29 is a significant correlation.
4. Assumptions to be met:
i. Normality of error terms
ii. Independent of spatial and temporal
iii. Constant Variance homoscedasticity of error terms
iv. Independent (uncorrelated) from predictors
The naïve model is found to violate most of assumptions. It is explained at appendix. In addition, R2
-adj and
AIC can be improved. A higher R2
-adj and lower AIC is desired.
II. A BETTER MODEL
5. Improve R2
-adj and AIC
These methods are performed to see how it performs.
i. Adding interaction terms
ii. Double log model
iii. Polynomial function
However the model can better explain the variability when an interaction term is added. Addition of log and
polynomial terms are attempted but the accuracy improved is not very significant. One of the better model is
shown below. It is simpler as well.
Call:
lm(formula = Y ~ X1 + X2 + X3 + X1:X2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.771e+03 4.499e+02 3.936 0.000322 ***
X1 -6.394e+01 1.037e+01 -6.168 2.75e-07 ***
X2 -1.078e+00 3.852e-01 -2.799 0.007855 **
X3 8.190e+01 3.386e+01 2.419 0.020217 *
X1:X2 5.275e-02 8.278e-03 6.372 1.42e-07 ***
Page 3/8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 85.01 on 40 degrees of freedom
Multiple R-squared: 0.932, Adjusted R-squared: 0.9252
F-statistic: 137 on 4 and 40 DF, p-value: < 2.2e-16
> AIC(y.reg)
[1] 534.2501
All β’s are significant at 95% significance. R2
=93.2% (↑86%) and R2
-adj=92.52% (↑85%) has significantly
improved. The F-statistic=137 (↑63) and its p-value<0.05 implies atleast one of the β’s is significant (≠0) at 95%
confidence level. In addition, AIC=534.25 (↓565.61) has reduced which is a good sign.
As an important assumption, normality is checked first. Anderson-Darling test is performed as it is more robust
than Shapiro-Wilk for large sample sizes. It is found that p-value=0.006 < 0.05. Hence H0: Normality is rejected.
As a remedy, box-cox transformation (power transformation) is applied. The estimated power for dependent
variable, Y is –0.277.
The transformed Y’s are, say Y*
, Y*
= Y-0.277
. The transformed model would then be,
Y*
= 2.998 – 6.471*10-3
*X1 - 5.618*10-5
*X2 + 1.814*10-2
*X3 – 2.139*10-3
*A + 6.259*10-6
*X1*X2
Call:
lm(formula = Y.transform ~ X1 + X2 + X3 + X1:X2 + A)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.998e+00 9.894e-02 30.297 < 2e-16 ***
X1 -6.471e-03 2.197e-03 -2.945 0.005418 **
X2 -5.618e-05 8.309e-05 -0.676 0.502958
X3 1.814e-02 7.115e-03 2.549 0.014834 *
A -2.139e-03 1.069e-03 -2.001 0.052373 .
X1:X2 6.259e-06 1.755e-06 3.568 0.000973 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01782 on 39 degrees of freedom
Multiple R-squared: 0.9377, Adjusted R-squared: 0.9297
F-statistic: 117.3 on 5 and 39 DF, p-value: < 2.2e-16
[1] -227.1882
Though β values are insignificant for X2 and A at 95% confidence level, we keep it because it helps to conform to
Normality assumption. Analysis show that removing X2 and A yield a slightly better model but violates the
normality assumption. The R2
-adj=92.9% is significantly high, and F-statistic p-value<0.05. |AIC| is reduced to -
227.18 which is a good sign.
The new model confirms to the normality test. The assumptions are detailed in following section.
Page 4/8
III. CHECK FOR ASSUMPTIONS
6. Test for Normality: The following methods can be used
i. Shapiro-Wilk
ii. Kolmogorov-Smirnov
iii. D’Agostino’s Battery of Tests
The sample size is relatively small; n=45. Hence Shapiro-Wilk test is performed and itse p-value=0.2489 > 0.05.
‘Fail to reject’ H0: Distribution is Normal.
7. Test for independence: The following tests can be used to test if error terms are independent of space and
time.
i. Runs test
ii. Durbin Watson test
Runs test is performed here. In the above figure, the plot on right side shows distribution of residuals about
the mean. Spatial dependence is not significantly observable. The p-value=0.453 > 0.05, implies Fail to Reject
H0: Variance is not independent.
8. Test for homoscedasticity: Homoscedasticity or homogeneity of variance can be tested by Bartlett, Levenes
and Fligner Killeen test. Bartlett test is roughly valid only when data is normally distributed data. Levenes test
is performed here. It uses a non-parametric approach and is powerful than Bartlett.
Levenes test yield p-value=0.92>0.05. Hence fail to reject H0: Homogeneity of variance.
Page 5/8
IV. RELIABILITY AND ROBUSTNESS
9. Coefficient of Determination (R2
-adjusted): The transformed model has an R2
-adj=92.97% which means it can
explain the variability effectively up to 92.97%.
10. Fisher–Snedecor Statistic: The transformed model has F-statistic=117.3 and its p-value≈0, which means the
model explains the variability better.
11. Akaike Information Criterion (AIC): It is a measure of relative quality of statistical models. The transformed
model yielda AIC=–227.18 which is very low than naïve model.
12. Modified Coefficient of Efficiency (E*): E* is less sensitive to larger values hence it the terms not squared. This
model has an E*=0.7569.
There are other testing procedures available but these three are the most powerful. By this we can finalize our
model.
The best fit regression model is
Y -0.277
= 2.998 – 6.471*10-3
*X1 - 5.618*10-5
*X2 + 1.814*10-2
*X3 – 2.139*10-3
*A + 6.259*10-6
*X1*X2
Y: Monthly Mortgage Payment ($)
X1: Household Disposable Income (x1000 $)
X2: Square Footage of housing units
X3: Mortgage Type
A: Housing Unit’s Age
V. LIMITATIONS
13. Some of the tests for model validation are not performed but intuitive in the model. In the lines, the following
are
i. Multicollinearity: Test for multicollinearity is not systematically performed. It requires regressing each
independent variable to rest of independent variables and finding Coefficient of Variation for each of the
regression. In this model, multicollinearity exists with X2 but it is found that removing X2 violates the
normality and other assumptions.
ii. Polynomial Interaction Terms: Since R2-adj is above 90%, polynomial interaction terms are not included in
the final model.
iii. Outliers: Outliers are not identified in the model. Those which has high deviations from predicted values
(usually when standard residuals, 𝜀̂i> 2*σ. It could be noticed that two points lie outside of ±2σ. Removing it
might give a better model.
iv. Since there are not many models formulated, the Mallow Cp which is used to identify the best model
among a set of models is not implemented here.
v. In MODEL.ACCURACY(), a function created to check the reliability, the significance test for β (≠0) is not
explicitly performed but it is implied from the t-statistic of individual β.
Page 6/8
VI. APPENDIX
The following program, written in R, is used to simulate the above results.
#read data
raw.data=read.csv("~/My R Codes/Data/LabDataGEO6161.csv",header=T)
attach(raw.data); length(Y)
#split window
dev.list()
mat=matrix(c(1,2,1,2,3,4,3,5),2,4)
layout(mat); layout.show(5)
#scatter plot for each variable
plot(X1,Y,main="Y - X1",ylab="Y",las=1)
plot(-log10(A),Y,main="Y - X2",ylab="Y",las=1)
boxplot(Y~X3,main="Y - X3",ylab="Y",xlab="X3",las=1)
boxplot(X1~X3,main="X1 - X3",ylab="X1",xlab="X3",las=1)
boxplot(X2~X3,main="X2 - X3",ylab="X2",xlab="X3",las=1)
TEST.ASSUMPTIONS=function(reg.sample,Yi) {
error.sample=rstandard(reg.sample)
mat=matrix(c(1,1,2,3),2,2);layout(mat)
#anderson-darling normality
qqnorm(error.sample,datax=TRUE); qqline(error.sample,datax=TRUE)
p.norm=nortest::ad.test(error.sample)$p.value
norm=ifelse(nortest::ad.test(error.sample)$p.value<=0.05,"Ha:Normality
Violated","Ho:Normality Verified")
#runstest independence
p.ind=lawstat::runs.test(error.sample,plot.it=T,alternative="two.sided")$p.value
ind=ifelse(runs.test(error.sample)$p.value<=0.05, "Ha:Inpendence Violated",
"Ho:Independence Verified")
#levenes variance
group.levene=as.factor(c(rep(1,length(Yi)),rep(2,length(reg.sample$fitted.values))))
y.combined=c(Yi,reg.sample$fitted.values)
p.var=lawstat::levene.test(y.combined,group.levene)$p.value
var=ifelse(p.var<=0.05,"Ha:Variance Violated","Ho:Homoscedastic Variance")
plot(error.sample~fitted.values(reg.sample),xlab=expression(hat(y)),ylab="std
res.",
main="Homogeneity / Fit")
abline(h=0)
RESULTS=list("Normality"=c(round(p.norm,4),norm),
"Independence:"=c(round(p.ind,4),ind),
"Variance:"=c(round(p.var,4),var))
return (RESULTS)
}
MODEL.ACCURACY=function(reg.sample,Yi) {
r.sq.adj=summary(reg.sample)$adj.r.squared
fstat=summary(reg.sample)$fstatistic
Page 7/8
p.fstat=pf(fstat[1],fstat[2],fstat[3],lower.tail=F)
mod.coefVAR=1-(sum(abs(Yi-reg.sample$fitted.values))/sum(abs(Yi-mean(Yi))))
RESULTS.B=list("R2-adj:"=r.sq.adj,"F-statistic:"=c(fstat,round(p.fstat,4)),
"AIC"=AIC(reg.sample),"Modified E*:"=mod.coefVAR)
return(RESULTS.B)
}
#naive model
y.reg.naive=lm(Y~X1+X2+X3+A);summary(y.reg.naive);
MODEL.ACCURACY(y.reg.naive,Y)
#correlation matrix
cov.mat=round(cor(raw.data[c("Y","X1","X2","X3","A")]),2);cov.mat
#model improvement
y.reg=lm(Y~X1+X2+X3+X1:X2); summary(y.reg);
MODEL.ACCURACY(y.reg,Y)
TEST.ASSUMPTIONS(y.reg,Y)
#power transformation
power.transform=powerTransform(y.reg);
Y.transform=bcPower(Y,power.transform$lambda)
y.reg.transform=lm(Y.transform~X1+X2+X3+X1:X2+A); summary(y.reg.transform)
y.reg.transform$coefficients
MODEL.ACCURACY(y.reg.transform,Y.transform)
TEST.ASSUMPTIONS(y.reg.transform,Y.transform)
Page 8/8

More Related Content

What's hot

introduction to scientific computing
introduction to scientific computingintroduction to scientific computing
introduction to scientific computingHaiderParekh1
 
Presentation on application of numerical method in our life
Presentation on application of numerical method in our lifePresentation on application of numerical method in our life
Presentation on application of numerical method in our lifeManish Kumar Singh
 
Solutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. MontgomerySolutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. MontgomeryByron CZ
 
Metodos jacobi y gauss seidel
Metodos jacobi y gauss seidelMetodos jacobi y gauss seidel
Metodos jacobi y gauss seidelCesar Mendoza
 
The Application of Derivatives
The Application of DerivativesThe Application of Derivatives
The Application of Derivativesdivaprincess09
 
Jacobi and gauss-seidel
Jacobi and gauss-seidelJacobi and gauss-seidel
Jacobi and gauss-seidelarunsmm
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullahAli Abdullah
 
Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...
Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...
Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...IJERA Editor
 
Numerical Methods
Numerical MethodsNumerical Methods
Numerical MethodsTeja Ande
 
Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...Ali Lodhra
 
least squares approach in finite element method
least squares approach in finite element methodleast squares approach in finite element method
least squares approach in finite element methodsabiha khathun
 

What's hot (20)

Statistics Assignment Help
Statistics Assignment HelpStatistics Assignment Help
Statistics Assignment Help
 
introduction to scientific computing
introduction to scientific computingintroduction to scientific computing
introduction to scientific computing
 
Presentation on application of numerical method in our life
Presentation on application of numerical method in our lifePresentation on application of numerical method in our life
Presentation on application of numerical method in our life
 
Data Analysis Assignment Help
Data Analysis Assignment Help Data Analysis Assignment Help
Data Analysis Assignment Help
 
Input analysis
Input analysisInput analysis
Input analysis
 
Solutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. MontgomerySolutions. Design and Analysis of Experiments. Montgomery
Solutions. Design and Analysis of Experiments. Montgomery
 
Metodos jacobi y gauss seidel
Metodos jacobi y gauss seidelMetodos jacobi y gauss seidel
Metodos jacobi y gauss seidel
 
The Application of Derivatives
The Application of DerivativesThe Application of Derivatives
The Application of Derivatives
 
Chapter8
Chapter8Chapter8
Chapter8
 
Numerical Method 2
Numerical Method 2Numerical Method 2
Numerical Method 2
 
Nsm
Nsm Nsm
Nsm
 
Jacobi and gauss-seidel
Jacobi and gauss-seidelJacobi and gauss-seidel
Jacobi and gauss-seidel
 
Numerical method
Numerical methodNumerical method
Numerical method
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullah
 
Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...
Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...
Solving Transportation Problems with Hexagonal Fuzzy Numbers Using Best Candi...
 
Numerical Methods
Numerical MethodsNumerical Methods
Numerical Methods
 
Es272 ch4b
Es272 ch4bEs272 ch4b
Es272 ch4b
 
Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...Principle of Least Square, its Properties, Regression line and standard error...
Principle of Least Square, its Properties, Regression line and standard error...
 
least squares approach in finite element method
least squares approach in finite element methodleast squares approach in finite element method
least squares approach in finite element method
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
 

Viewers also liked

Application of Multivariate Regression Analysis and Analysis of Variance
Application of Multivariate Regression Analysis and Analysis of VarianceApplication of Multivariate Regression Analysis and Analysis of Variance
Application of Multivariate Regression Analysis and Analysis of VarianceKalaivanan Murthy
 
Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5
Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5
Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5Kalaivanan Murthy
 
Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation  Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation Kalaivanan Murthy
 
Gasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of IsopreneGasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of IsopreneKalaivanan Murthy
 
Seismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water TankSeismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water TankKalaivanan Murthy
 

Viewers also liked (6)

Application of Multivariate Regression Analysis and Analysis of Variance
Application of Multivariate Regression Analysis and Analysis of VarianceApplication of Multivariate Regression Analysis and Analysis of Variance
Application of Multivariate Regression Analysis and Analysis of Variance
 
Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5
Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5
Respiratory Illness Mortality: Global Health Burden due to Ozone and PM2.5
 
Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation  Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation
 
Gasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of IsopreneGasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of Isoprene
 
Remote Sensing of Aerosols
Remote Sensing of AerosolsRemote Sensing of Aerosols
Remote Sensing of Aerosols
 
Seismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water TankSeismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water Tank
 

Similar to Intermediate Quantitative Methods Lab: Multiple Regression Analysis

15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solutionKrunal Shah
 
C language numanal
C language numanalC language numanal
C language numanalaluavi
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
Lecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfLecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfMDNomanCh
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelMehdi Shayegani
 
Diploma sem 2 applied science physics-unit 1-chap 2 error s
Diploma sem 2 applied science physics-unit 1-chap 2 error sDiploma sem 2 applied science physics-unit 1-chap 2 error s
Diploma sem 2 applied science physics-unit 1-chap 2 error sRai University
 
Statistics project2
Statistics project2Statistics project2
Statistics project2shri1984
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural NetRatul Alahy
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
 
Propagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionPropagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionMohammad
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introductionedinyoka
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
Aplicaciones de las derivadas
Aplicaciones de las  derivadasAplicaciones de las  derivadas
Aplicaciones de las derivadasAyshaReyes1
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksESCOM
 
Regression
Regression Regression
Regression Ali Raza
 

Similar to Intermediate Quantitative Methods Lab: Multiple Regression Analysis (20)

15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solution
 
C language numanal
C language numanalC language numanal
C language numanal
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Lecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfLecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdf
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression model
 
Diploma sem 2 applied science physics-unit 1-chap 2 error s
Diploma sem 2 applied science physics-unit 1-chap 2 error sDiploma sem 2 applied science physics-unit 1-chap 2 error s
Diploma sem 2 applied science physics-unit 1-chap 2 error s
 
Statistics project2
Statistics project2Statistics project2
Statistics project2
 
Regression vs Neural Net
Regression vs Neural NetRegression vs Neural Net
Regression vs Neural Net
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
Propagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionPropagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace Reduction
 
Chapter5.pdf.pdf
Chapter5.pdf.pdfChapter5.pdf.pdf
Chapter5.pdf.pdf
 
Regression
RegressionRegression
Regression
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introduction
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Regression
RegressionRegression
Regression
 
Aplicaciones de las derivadas
Aplicaciones de las  derivadasAplicaciones de las  derivadas
Aplicaciones de las derivadas
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
Regression
Regression Regression
Regression
 

More from Kalaivanan Murthy

Life Cycle Analysis of Greenhouse Gases for Sugarcane Ethanol
Life Cycle Analysis of Greenhouse Gases for Sugarcane EthanolLife Cycle Analysis of Greenhouse Gases for Sugarcane Ethanol
Life Cycle Analysis of Greenhouse Gases for Sugarcane EthanolKalaivanan Murthy
 
Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...
Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...
Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...Kalaivanan Murthy
 
Seismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water TankSeismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water TankKalaivanan Murthy
 
Approaches in Scientific Research
Approaches in Scientific ResearchApproaches in Scientific Research
Approaches in Scientific ResearchKalaivanan Murthy
 
Impact of Greenhouse Gases on Climate Policy
Impact of Greenhouse Gases on Climate PolicyImpact of Greenhouse Gases on Climate Policy
Impact of Greenhouse Gases on Climate PolicyKalaivanan Murthy
 
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...Kalaivanan Murthy
 
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...Kalaivanan Murthy
 
Biodiesel vs. Diesel: Air Quality and Economic Aspects
Biodiesel vs. Diesel: Air Quality and Economic AspectsBiodiesel vs. Diesel: Air Quality and Economic Aspects
Biodiesel vs. Diesel: Air Quality and Economic AspectsKalaivanan Murthy
 
Performance of Biodiesel against Petroleum Diesel
Performance of Biodiesel against Petroleum DieselPerformance of Biodiesel against Petroleum Diesel
Performance of Biodiesel against Petroleum DieselKalaivanan Murthy
 
Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...
Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...
Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...Kalaivanan Murthy
 
Performance Appraisal (Part-2/2: Graphical Representation)
Performance Appraisal (Part-2/2: Graphical Representation)Performance Appraisal (Part-2/2: Graphical Representation)
Performance Appraisal (Part-2/2: Graphical Representation)Kalaivanan Murthy
 
Gasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of IsopreneGasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of IsopreneKalaivanan Murthy
 
Indian Economy: Part-1 Everyday Economics
Indian Economy: Part-1 Everyday EconomicsIndian Economy: Part-1 Everyday Economics
Indian Economy: Part-1 Everyday EconomicsKalaivanan Murthy
 

More from Kalaivanan Murthy (18)

Life Cycle Analysis of Greenhouse Gases for Sugarcane Ethanol
Life Cycle Analysis of Greenhouse Gases for Sugarcane EthanolLife Cycle Analysis of Greenhouse Gases for Sugarcane Ethanol
Life Cycle Analysis of Greenhouse Gases for Sugarcane Ethanol
 
Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...
Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...
Analysis of Stratospheric Tropospheric Intrusion as a Function of Potential V...
 
Seismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water TankSeismic Analysis of Elevated Water Tank
Seismic Analysis of Elevated Water Tank
 
Approaches in Scientific Research
Approaches in Scientific ResearchApproaches in Scientific Research
Approaches in Scientific Research
 
Hybrid Leadership
Hybrid LeadershipHybrid Leadership
Hybrid Leadership
 
How to write better emails?
How to write better emails?How to write better emails?
How to write better emails?
 
Impact of Greenhouse Gases on Climate Policy
Impact of Greenhouse Gases on Climate PolicyImpact of Greenhouse Gases on Climate Policy
Impact of Greenhouse Gases on Climate Policy
 
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
 
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
Pairwise Comparison of Daily Ozone Concentration in Tampa-St.Petersburg Regio...
 
Biodiesel vs. Diesel: Air Quality and Economic Aspects
Biodiesel vs. Diesel: Air Quality and Economic AspectsBiodiesel vs. Diesel: Air Quality and Economic Aspects
Biodiesel vs. Diesel: Air Quality and Economic Aspects
 
Performance of Biodiesel against Petroleum Diesel
Performance of Biodiesel against Petroleum DieselPerformance of Biodiesel against Petroleum Diesel
Performance of Biodiesel against Petroleum Diesel
 
Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...
Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...
Near-field Gaussian Dispersion Analysis in AERMOD: A demonstration project (I...
 
Autonomous Line Follower
Autonomous Line FollowerAutonomous Line Follower
Autonomous Line Follower
 
Performance Appraisal (Part-2/2: Graphical Representation)
Performance Appraisal (Part-2/2: Graphical Representation)Performance Appraisal (Part-2/2: Graphical Representation)
Performance Appraisal (Part-2/2: Graphical Representation)
 
Gasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of IsopreneGasphase Oxidation Products of Isoprene
Gasphase Oxidation Products of Isoprene
 
Climate Wedges
Climate Wedges Climate Wedges
Climate Wedges
 
Indian Economy: Part-1 Everyday Economics
Indian Economy: Part-1 Everyday EconomicsIndian Economy: Part-1 Everyday Economics
Indian Economy: Part-1 Everyday Economics
 
Graduate Training at JSW
Graduate Training at JSWGraduate Training at JSW
Graduate Training at JSW
 

Recently uploaded

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 

Recently uploaded (20)

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 

Intermediate Quantitative Methods Lab: Multiple Regression Analysis

  • 1. GEO6161: Intermediate Quantitative Methods for Geographers Laboratory-1 MULTIPLE REGRESSION Kalaivanan Murthy Page 1/8
  • 2. I. PRILIMINARY ANALYSIS 1. Plot the Y’s vs individual X’s: (Anti-clockwise) We observe that X1 is more scattered, X2 is closely linear, X3=1 has higher Y’s than X3=0, X1-X3 and X2-X3 is not much inferential. 2. Run Naïve Model: 𝑌̂ = β0 + β1X1 + β2X2 + β3X3 + β4A Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1082.3793 165.4371 -6.543 8.17e-08 *** X1 0.2397 3.4967 0.069 0.946 X2 1.2993 0.1620 8.019 7.45e-10 *** X3 67.9740 47.9893 1.416 0.164 A 2.8016 7.1449 0.392 0.697 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 120.4 on 40 degrees of freedom Multiple R-squared: 0.8635, Adjusted R-squared: 0.8498 F-statistic: 63.24 on 4 and 40 DF, p-value: < 2.2e-16 Page 2/8
  • 3. Only is β2 significant at 95% significance. R2 =86% and R2 -adj=85% implies the model tries to explain 85% variability in data and model is concise. The F-statistic p-value<0.05 implies atleast one of the β’s is significant at 95% confidence level. In addition, AIC=565.61. 3. Correlation Matrix Y X1 X2 X3 A Y 1.00 0.77 0.93 0.50 -0.51 X1 0.77 1.00 0.83 0.47 -0.48 X2 0.93 0.83 1.00 0.46 -0.57 X3 0.50 0.47 0.46 1.00 -0.32 A -0.51 -0.48 -0.57 -0.32 1.00 There is a strong correlation between Y-X2, and X1-X2. Rejection region of r: t(α/2,n-2)=2.016 rmin=0.29 implies anything above 0.29 is a significant correlation. 4. Assumptions to be met: i. Normality of error terms ii. Independent of spatial and temporal iii. Constant Variance homoscedasticity of error terms iv. Independent (uncorrelated) from predictors The naïve model is found to violate most of assumptions. It is explained at appendix. In addition, R2 -adj and AIC can be improved. A higher R2 -adj and lower AIC is desired. II. A BETTER MODEL 5. Improve R2 -adj and AIC These methods are performed to see how it performs. i. Adding interaction terms ii. Double log model iii. Polynomial function However the model can better explain the variability when an interaction term is added. Addition of log and polynomial terms are attempted but the accuracy improved is not very significant. One of the better model is shown below. It is simpler as well. Call: lm(formula = Y ~ X1 + X2 + X3 + X1:X2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.771e+03 4.499e+02 3.936 0.000322 *** X1 -6.394e+01 1.037e+01 -6.168 2.75e-07 *** X2 -1.078e+00 3.852e-01 -2.799 0.007855 ** X3 8.190e+01 3.386e+01 2.419 0.020217 * X1:X2 5.275e-02 8.278e-03 6.372 1.42e-07 *** Page 3/8
  • 4. --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 85.01 on 40 degrees of freedom Multiple R-squared: 0.932, Adjusted R-squared: 0.9252 F-statistic: 137 on 4 and 40 DF, p-value: < 2.2e-16 > AIC(y.reg) [1] 534.2501 All β’s are significant at 95% significance. R2 =93.2% (↑86%) and R2 -adj=92.52% (↑85%) has significantly improved. The F-statistic=137 (↑63) and its p-value<0.05 implies atleast one of the β’s is significant (≠0) at 95% confidence level. In addition, AIC=534.25 (↓565.61) has reduced which is a good sign. As an important assumption, normality is checked first. Anderson-Darling test is performed as it is more robust than Shapiro-Wilk for large sample sizes. It is found that p-value=0.006 < 0.05. Hence H0: Normality is rejected. As a remedy, box-cox transformation (power transformation) is applied. The estimated power for dependent variable, Y is –0.277. The transformed Y’s are, say Y* , Y* = Y-0.277 . The transformed model would then be, Y* = 2.998 – 6.471*10-3 *X1 - 5.618*10-5 *X2 + 1.814*10-2 *X3 – 2.139*10-3 *A + 6.259*10-6 *X1*X2 Call: lm(formula = Y.transform ~ X1 + X2 + X3 + X1:X2 + A) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.998e+00 9.894e-02 30.297 < 2e-16 *** X1 -6.471e-03 2.197e-03 -2.945 0.005418 ** X2 -5.618e-05 8.309e-05 -0.676 0.502958 X3 1.814e-02 7.115e-03 2.549 0.014834 * A -2.139e-03 1.069e-03 -2.001 0.052373 . X1:X2 6.259e-06 1.755e-06 3.568 0.000973 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01782 on 39 degrees of freedom Multiple R-squared: 0.9377, Adjusted R-squared: 0.9297 F-statistic: 117.3 on 5 and 39 DF, p-value: < 2.2e-16 [1] -227.1882 Though β values are insignificant for X2 and A at 95% confidence level, we keep it because it helps to conform to Normality assumption. Analysis show that removing X2 and A yield a slightly better model but violates the normality assumption. The R2 -adj=92.9% is significantly high, and F-statistic p-value<0.05. |AIC| is reduced to - 227.18 which is a good sign. The new model confirms to the normality test. The assumptions are detailed in following section. Page 4/8
  • 5. III. CHECK FOR ASSUMPTIONS 6. Test for Normality: The following methods can be used i. Shapiro-Wilk ii. Kolmogorov-Smirnov iii. D’Agostino’s Battery of Tests The sample size is relatively small; n=45. Hence Shapiro-Wilk test is performed and itse p-value=0.2489 > 0.05. ‘Fail to reject’ H0: Distribution is Normal. 7. Test for independence: The following tests can be used to test if error terms are independent of space and time. i. Runs test ii. Durbin Watson test Runs test is performed here. In the above figure, the plot on right side shows distribution of residuals about the mean. Spatial dependence is not significantly observable. The p-value=0.453 > 0.05, implies Fail to Reject H0: Variance is not independent. 8. Test for homoscedasticity: Homoscedasticity or homogeneity of variance can be tested by Bartlett, Levenes and Fligner Killeen test. Bartlett test is roughly valid only when data is normally distributed data. Levenes test is performed here. It uses a non-parametric approach and is powerful than Bartlett. Levenes test yield p-value=0.92>0.05. Hence fail to reject H0: Homogeneity of variance. Page 5/8
  • 6. IV. RELIABILITY AND ROBUSTNESS 9. Coefficient of Determination (R2 -adjusted): The transformed model has an R2 -adj=92.97% which means it can explain the variability effectively up to 92.97%. 10. Fisher–Snedecor Statistic: The transformed model has F-statistic=117.3 and its p-value≈0, which means the model explains the variability better. 11. Akaike Information Criterion (AIC): It is a measure of relative quality of statistical models. The transformed model yielda AIC=–227.18 which is very low than naïve model. 12. Modified Coefficient of Efficiency (E*): E* is less sensitive to larger values hence it the terms not squared. This model has an E*=0.7569. There are other testing procedures available but these three are the most powerful. By this we can finalize our model. The best fit regression model is Y -0.277 = 2.998 – 6.471*10-3 *X1 - 5.618*10-5 *X2 + 1.814*10-2 *X3 – 2.139*10-3 *A + 6.259*10-6 *X1*X2 Y: Monthly Mortgage Payment ($) X1: Household Disposable Income (x1000 $) X2: Square Footage of housing units X3: Mortgage Type A: Housing Unit’s Age V. LIMITATIONS 13. Some of the tests for model validation are not performed but intuitive in the model. In the lines, the following are i. Multicollinearity: Test for multicollinearity is not systematically performed. It requires regressing each independent variable to rest of independent variables and finding Coefficient of Variation for each of the regression. In this model, multicollinearity exists with X2 but it is found that removing X2 violates the normality and other assumptions. ii. Polynomial Interaction Terms: Since R2-adj is above 90%, polynomial interaction terms are not included in the final model. iii. Outliers: Outliers are not identified in the model. Those which has high deviations from predicted values (usually when standard residuals, 𝜀̂i> 2*σ. It could be noticed that two points lie outside of ±2σ. Removing it might give a better model. iv. Since there are not many models formulated, the Mallow Cp which is used to identify the best model among a set of models is not implemented here. v. In MODEL.ACCURACY(), a function created to check the reliability, the significance test for β (≠0) is not explicitly performed but it is implied from the t-statistic of individual β. Page 6/8
  • 7. VI. APPENDIX The following program, written in R, is used to simulate the above results. #read data raw.data=read.csv("~/My R Codes/Data/LabDataGEO6161.csv",header=T) attach(raw.data); length(Y) #split window dev.list() mat=matrix(c(1,2,1,2,3,4,3,5),2,4) layout(mat); layout.show(5) #scatter plot for each variable plot(X1,Y,main="Y - X1",ylab="Y",las=1) plot(-log10(A),Y,main="Y - X2",ylab="Y",las=1) boxplot(Y~X3,main="Y - X3",ylab="Y",xlab="X3",las=1) boxplot(X1~X3,main="X1 - X3",ylab="X1",xlab="X3",las=1) boxplot(X2~X3,main="X2 - X3",ylab="X2",xlab="X3",las=1) TEST.ASSUMPTIONS=function(reg.sample,Yi) { error.sample=rstandard(reg.sample) mat=matrix(c(1,1,2,3),2,2);layout(mat) #anderson-darling normality qqnorm(error.sample,datax=TRUE); qqline(error.sample,datax=TRUE) p.norm=nortest::ad.test(error.sample)$p.value norm=ifelse(nortest::ad.test(error.sample)$p.value<=0.05,"Ha:Normality Violated","Ho:Normality Verified") #runstest independence p.ind=lawstat::runs.test(error.sample,plot.it=T,alternative="two.sided")$p.value ind=ifelse(runs.test(error.sample)$p.value<=0.05, "Ha:Inpendence Violated", "Ho:Independence Verified") #levenes variance group.levene=as.factor(c(rep(1,length(Yi)),rep(2,length(reg.sample$fitted.values)))) y.combined=c(Yi,reg.sample$fitted.values) p.var=lawstat::levene.test(y.combined,group.levene)$p.value var=ifelse(p.var<=0.05,"Ha:Variance Violated","Ho:Homoscedastic Variance") plot(error.sample~fitted.values(reg.sample),xlab=expression(hat(y)),ylab="std res.", main="Homogeneity / Fit") abline(h=0) RESULTS=list("Normality"=c(round(p.norm,4),norm), "Independence:"=c(round(p.ind,4),ind), "Variance:"=c(round(p.var,4),var)) return (RESULTS) } MODEL.ACCURACY=function(reg.sample,Yi) { r.sq.adj=summary(reg.sample)$adj.r.squared fstat=summary(reg.sample)$fstatistic Page 7/8
  • 8. p.fstat=pf(fstat[1],fstat[2],fstat[3],lower.tail=F) mod.coefVAR=1-(sum(abs(Yi-reg.sample$fitted.values))/sum(abs(Yi-mean(Yi)))) RESULTS.B=list("R2-adj:"=r.sq.adj,"F-statistic:"=c(fstat,round(p.fstat,4)), "AIC"=AIC(reg.sample),"Modified E*:"=mod.coefVAR) return(RESULTS.B) } #naive model y.reg.naive=lm(Y~X1+X2+X3+A);summary(y.reg.naive); MODEL.ACCURACY(y.reg.naive,Y) #correlation matrix cov.mat=round(cor(raw.data[c("Y","X1","X2","X3","A")]),2);cov.mat #model improvement y.reg=lm(Y~X1+X2+X3+X1:X2); summary(y.reg); MODEL.ACCURACY(y.reg,Y) TEST.ASSUMPTIONS(y.reg,Y) #power transformation power.transform=powerTransform(y.reg); Y.transform=bcPower(Y,power.transform$lambda) y.reg.transform=lm(Y.transform~X1+X2+X3+X1:X2+A); summary(y.reg.transform) y.reg.transform$coefficients MODEL.ACCURACY(y.reg.transform,Y.transform) TEST.ASSUMPTIONS(y.reg.transform,Y.transform) Page 8/8