SlideShare a Scribd company logo
1 of 33
Download to read offline
STAT7001
Computing for Practical Statistics
In-Course Assessment 2
TASK 1: PREDICTION OF THE ISTANBUL STOCK MARKET 2
TASK 2: THE RESISTANCE OF CONSTANTIN 7
APPENDIX A: TASK 1 R CODE 11
APPENDIX B: TASK 2 SAS CODE 26
APPENDIX C: REFERENCES 33
Task 1: Prediction of the Istanbul Stock Market
Task 1: Prediction of the Istanbul Stock Market
Main Question
The main task was to use different prediction strategies to predict the daily returns of the Istanbul Stock
Exchange (ISE) index based on the data of ISE returns as well as the returns of 7 other stock indices; and
compare the performance of these prediction methods by calculating error measures such as RMSE, MAE,
and the relative variants of these.
For the following report, we apply significance level 5% to all analyses.
Summary
For benchmarking experiments where ISE returns were predicted based on data from the same day, the
models based on other stock indices were significantly better than taking the mean ISE return as a predictor;
while the inclusion of time did not result in any significant changes in the goodness of prediction.
For benchmarking experiments where predictions were made only based on previous data, the reverse was
observed as predictors based only on prior ISE returns performed significantly better than models based on
previous stock index returns, suggesting a non-linear relationship may exist between ISE returns and that of
previous days.
Exploratory Data Analysis
Figures 1 to 8. Scatter plots of stock index returns (y-axis) against number of days since earliest record (x-axis),
with respective correlation estimates and p-values.
From the scatter plots in Figures 1 to 8, it can be seen that there is no apparent association between the
returns of stock indices and time, as the location of the index returns do not appear to change with time. A
correlation test was performed on each of the stock index returns and time, with the results indicating no
apparent linear association between the variables at 95% confidence, as all p-values were greater than 0.05.
Figure 1. ISE:
cor=-0.0499, p-value=0.2485
Figure 2. S&P 500:
cor=0.0245, p-value=0.5714
Figure 3. DAX:
cor=0.0299, p-value=0.4891
Figure 4. FTSE 100:
cor=0.0190, p-value=0.6615
Figure 5. Nikkei 225:
cor=0.00533, p-value=0.9019
Figure 6. Ibovespa:
cor=-0.0582, p-value=0.1786
Figure 7. MSCI EU:
cor=0.0121, p-value=0.7803
Figure 8. MSCI EM:
cor=-0.0538, p-value=0.2136
Task 1: Prediction of the Istanbul Stock Market
Figures 9 to 16. Scatter plots of stock index returns for day N-1, N-2 and N-3 (y-axis) respectively (left to right)
against stock index returns for day N (x-axis), with respective correlation estimates and p-values.
The scatter plots in Figures 9 to 16 shows that there is generally no patterns in the stock indices returns
against its returns one, two, and three days before. A correlation test has been carried out to confirm this and
it suggests that there is no correlation for the all scatter plots except the first one in Figure 16, which shows
that there is slightly positive association (cor=0.149) between MSCI EM returns and its returns one day
earlier (p-value=0.0005403<0.05).
In the light of the above, it may be reasonable to suggest that stock indices returns are not linearly related
with the recent past few days in general.
cor = 0.0188
p-value = 0.6651
cor = -0.0124
p-value = 0.7745
cor = -0.0337
p-value = 0.4369
Figure 9. ISE
cor = -0.0608
p-value = 0.1605
cor = -0.0300
p-value = 0.4888
cor = -0.00845
p-value = 0.8456
Figure 10. S&P
500
cor = 0.00132
p-value = 0.9758
cor = -0.0262
p-value = 0.5453
cor = -0.0172
p-value = 0.6919
Figure 11. DAX
cor = -0.00739
p-value = 0.8646
cor = -0.0276
p-value = 0.5248
cor = -0.0218
p-value = 0.6149
Figure 12. FTSE 100
cor = -0.0782
p-value = 0.07085
cor = 0.0261
p-value = 0.5479
cor = 0.000953
p-value = 0.9825
Figure 13. Nikkei 225
cor = -0.0485
p-value = 0.2626
cor = -0.0140
p-value = 0.7463
cor = -0.0457
p-value = 0.2921
Figure 14. Ibovespa
cor = 0.00995
p-value = 0.8184
cor = -0.0420
p-value = 0.3323
cor = 0.00254
p-value = 0.9533
Figure 15. MSCI EU
cor = 0.149
p-value = 0.0005403
cor = -0.0141
p-value = 0.7449
cor = 0.0489
p-value = 0.2599
Figure 16. MSCI EM
Task 1: Prediction of the Istanbul Stock Market
Results and Interpretation of Prediction from Same Day Indices (Part C)
Table 1. Table of error measures and their confidence intervals for different prediction methods, under
respective validation set-ups.
Validation Set-up
Prediction
Methods
Linear Model
w/o Time
Linear Model
w/ Time
Robust Linear
Regression
Chronological
80-20 Split
Mean
V = 3695
p-value = 0.0123
V = 3688
p-value = 0.01307
V = 3612
p-value = 0.02473
Linear Model
w/o Time
V = 3031
p-value = 0.6601
V = 3135
p-value = 0.4455
Linear Model
w/ Time
V = 3163
p-value = 0.3953
5-Fold
Cross-Validation
Mean
V = 96316
p-value = 1.121e-11
V = 96500
p-value = 7.849e-12
V = 96397
p-value = 9.587e-12
Linear Model
w/o Time
V = 72898
p-value = 0.7934
V = 68503
p-value = 0.3356
Linear Model
w/ Time
V = 68297
p-value = 0.3075
Table 2. Table of results for paired Wilcoxon signed-rank tests on absolute residuals of different prediction
methods.
The similar error measures for the linear models with and without time, as shown in Table 1, indicate that
both models seem to perform as well as each other, for both the chronological and 5-fold cross-validation
set-ups. This is confirmed by the paired Wilcoxon signed-rank test results in Table 2, with p-values of
0.6601 and 0.7934 (for chronological and 5-fold respectively) indicating that there is no significant
difference in absolute residuals for the two models.
These results agree with the preliminary conclusions derived from the exploratory findings. Since it was
found that there is no apparent association between the stock index returns and time, thus it should follow
that the linear models with or without time should perform just as well as each other in predicting ISE returns,
as the addition of the time variable does not provide significant information.
Additionally, a robust linear regression model (RLR) was created, predicting the ISE return based on the
returns of other stock indices for the same day, without time. This also produced similar error measures to
the ordinary least squares regression models, and paired Wilcoxon signed-rank test results similarly indicated
no significant difference in absolute residuals (p-values of 0.4455 and 0.3953 for chronological; p-values of
0.3356 and 0.3075 for 5-fold).
However, these 3 models have RMSE and MAE values which are considerably smaller than that of the
prediction using mean ISE return for the training data set, as seen in Table 1. This is confirmed by the paired
Validation Set-up
Prediction
Method
RMSE MAE
Relative
RMSE
Relative
MAE
Chronological
80-20 Split
Mean
0.0131
(0.0112, 0.0149)
0.0100
(0.0084, 0.0116)
1.68
(1.13, 2.23)
1.22
(1.00, 1.44)
Linear Model
w/o Time
0.0108
(0.0094, 0.0122)
0.00855
(0.00729, 0.00980)
3.35
(1.63, 5.07)
1.61
(1.05, 2.17)
Linear Model
w/ Time
0.0107
(0.0093, 0.0121)
0.00852
(0.00729, 0.00975)
3.06
(1.63, 4.49)
1.56
(1.06, 2.06)
Robust Linear
Regression
0.0105
(0.0091, 0.0119)
0.00845
(0.00726, 0.00964)
3.04
(1.33, 4.74)
1.53
(1.03, 2.03)
5-Fold
Cross-Validation
Mean
0.0162
(0.0134, 0.0191)
0.0121
(0.0100, 0.0141)
1.49
(1.03, 1.95)
1.14
(0.96, 1.32)
Linear Model
w/o Time
0.0120
(0.0100, 0.0140)
0.00920
(0.00773, 0.01067)
7.43
(1.30, 13.55)
2.11
(0.76, 3.46)
Linear Model
w/ Time
0.0120
(0.0100, 0.0140)
0.00920
(0.00773, 0.01066)
7.46
(1.30, 13.62)
2.11
(0.75, 3.47)
Robust Linear
Regression
0.0121
(0.0101, 0.0142)
0.00928
(0.00780, 0.01076)
6.85
(1.36, 12.34)
2.02
(0.78, 3.26)
Task 1: Prediction of the Istanbul Stock Market
Wilcoxon signed-rank tests. For the chronological set-up, the p-values of 0.0123, 0.01307, and 0.02473 (for
tests of mean vs. LM w/o time, LM w/ time and RLR respectively) suggest some evidence of a difference in
the absolute residuals of the models. For the 5-fold set-up, the p-values of 1.121e-11, 7.849e-12 and 9.587e-
12 respectively, suggest strong evidence of a significant difference in the absolute residuals from the models.
This allows us to conclude with 95% confidence that the mean ISE return is a worse prediction method than
any of the other 3 models, under both validation set-ups.
Results and Interpretation of Prediction from Previous Day Indices (Part D)
Validation
Set-up
Prediction Method RMSE MAE
Relative
RMSE
Relative
MAE
11
Consecutive
Days
Most Recent
ISE Return
0.0221
(0.0203, 0.0240)
0.0170
(0.0158, 0.0182)
16.31
(6.83, 25.78)
4.23
(2.88, 5.57)
Mean ISE Return of
Recent 5 Days
0.0173
(0.0159, 0.0187)
0.0130
(0.0120, 0.0140)
4.50
(3.11, 5.89)
1.97
(1.63, 2.32)
LM - Most
Recent Day
0.724
(0.319, 1.129)
0.169
(0.110, 0.230)
221.3
(91.4, 351.2)
43.1
(24.5, 61.7)
LM - Most
Recent 2 Days
2.59
(0.31, 4.86)
0.274
(0.054, 0.494)
280.0
(134.7, 425.3)
49.1
(25.5, 72.6)
Robust Linear
Regression
0.0634
(0.0397, 0.0870)
0.0344
(0.0298, 0.0389)
25.89
(17.98, 7.98)
8.79
(6.70, 10.87)
Table 3. Table of error measures and their confidence intervals for different prediction methods.
Validation
Set-up
Prediction
Method
Mean ISE Return of
Recent 5 Days
LM - Most
Recent Day
LM - Most
Recent 2 Days
Robust Linear
Regression
11
Consecutive
Days
Most Recent
ISE Return
V = 95496
p-value = 5.857e-14
V = 15103
p-value < 2.2e-16
V = 12427
p-value < 2.2e-16
V = 100020
p-value < 2.2e-16
Mean ISE Return
of Recent 5 Days
V = 10271
p-value < 2.2e-16
V = 7468
p-value < 2.2e-16
V = 111260
p-value < 2.2e-16
LM - Most
Recent Day
V = 65944
p-value = 0.3359
V = 25768
p-value < 2.2e-16
LM - Most
Recent 2 Days
V = 31006
p-value < 2.2e-16
Table 4. Table of results for paired Wilcoxon signed-rank tests on absolute residuals of different prediction
methods.
The error measures from the LM of stock index returns from the most recent day are all smaller than that of
the LM from the most recent 2 days. However, the large standard error of these error measures suggest that
this difference might not be significant; and this is confirmed by the paired Wilcoxon signed-rank test, with a
p-value of 0.3359 indicating no significant difference in absolute residuals from both these models.
A robust linear regression model was also created, predicting the ISE return based on the returns of stock
indices from the most recent day. These are the same covariates as in the LM for most recent day; however,
the method of obtaining the coefficients for each covariate is different and more robust, so the custom
regression shows lower error values for all measures of prediction goodness. This is confirmed by the paired
Wilcoxon signed-rank test, with p-value < 2.2e-16 indicating a significant difference in the absolute residuals
from the two models.
However, the prediction method of using mean ISE return of recent 5 days shows the lowest value for each
error measure. Furthermore, the upper bound of the 95% CI of all 4 of its error measures are lower than the
lower bound of the 95% CI of the error measures from all other models. The p-values from the paired
Wilcoxon signed-rank tests (5.857e-14, <2.2e-16, <2.2e-16, <2.2e-16) of this method against all other
methods also provide support that there are significant differences in the absolute residuals obtained from the
model. Thus, there is a strong evidence to suggest that the prediction method of using the mean ISE return of
recent 5 days is the best method among the five different prediction methods used in this benchmarking
experiment.
Task 1: Prediction of the Istanbul Stock Market
It should be noted that in the initial exploratory analysis, it was concluded that there seems to be no linear
association between ISE stock index returns and its returns one, two, and three days before (Figure 9).
However, the benchmark experiment in Part D suggests that the prediction method based on the recent 5
days appears to be the best prediction method, which is contradictory to the results of the exploratory
analysis. This may suggest that there might be non-linear associations that exist between ISE stock index
returns and its returns on the days before, thus allowing prediction to be made based on previous ISE returns.
Alternatively, this may have happened due to poorly designed prediction methods, with the mean ISE return
of the recent 5 days performing relatively better than the rest.
Conclusion
To assess the several prediction methods for the Istanbul stock market, two different benchmarking
experiments have been performed in this task. One is predicted from other indices on the same day and the
other one is predicted from all indices including ISE itself from previous recent days.
The first benchmarking experiment showed that the prediction methods with least squares regression based
on other data obtained from the same day, generally performed better than ones using the mean ISE return as
a predictor, in terms of error measures such as RMSE and MAE. This was confirmed by the paired Wilcoxon
signed-rank tests on the absolute residuals from the different prediction methods. Additionally, the inclusion
of time as an additional covariate did not result in any significant changes in the goodness of prediction of
the models, concurring with the results of the exploratory analysis.
On the other hand, in the second benchmarking experiment, there was sufficient evidence to support the
opposite situation to the first benchmarking experiment, where prediction models based only on prior ISE
returns performed significantly better than models based on previous returns on all stock indices. This
contradicts the results of the exploratory analysis, where there was no significant linear association found
between ISE returns and its returns from days before, thus suggesting a possible non-linear relationship not
identified by the correlation test.
Comparing the benchmarking experiments in Part C and D, it can be seen that the error measures calculated
in Part C tend to be smaller than those in Part D, in general. This might suggest that prediction models based
on data from the same day are better at predicting the ISE returns than prediction models based only on
recent previous data. This might indicate that data from the same day provides better information, or has
closer associations to, the ISE returns.
Task 2: The Resistance of Constantin
Task 2: The Resistance of Constantin
Main Question
The data from the 8th
edition of the “Rubber Bible”, contains 16 data points of resistance of Constantin wire
at different diameters. The main task was to fit different regression models to explain resistance in terms of
diameter and investigate the goodness of fit of these models by obtaining estimates of error measures such as
RMSE and MAE.
The significance level of 5% was applied to all analyses for the following report.
Summary
Based on the investigation, regression models that involved logarithmic or reciprocal transformations
generally had higher goodness of fit. It was found that the regression model of 1/d2
best explained the
relationship between resistance and diameter, producing the simplest model with high goodness of fit and the
smallest residuals.
The model of 1/d2
+ 1/d produced a similarly performing model, but the covariate 1/d was found to be
insignificant; while the log-transformed model produced residuals that were larger than the 1/d2
model.
Meanwhile, fitting resistance to a polynomial of degree 15 in diameter produced an over-fitted rank-deficient
model.
Explorative Analysis
Figure 1 (a) and (b): Histograms of both variables R and d.
Both resistance and diameter variables have positive values below 1. From the histograms, both can be
deduced to be positively skewed, justified with the positive skewness, 2.9870 and 1.3166 respectively. To
shrink the larger values more than the smaller ones, power or log transformation can be used to result in a
distribution that is more symmetric. However, as all the values are between 0 and 1, power transformation
might not achieve the desired effect.
From the scatter plot (Figure 2), we can see that as
diameter increases, the resistance decreases. There
is clearly a decreasing non-linear relationship
between the two variables. Suggested
transformations on the data could include
logarithmic or reciprocal transformations to
straighten out the bivariate non-linear relationship.
Figure 2: Scatterplot of Resistance of Constantin wire and its diameter.
Task 2: The Resistance of Constantin
Regression Models
There are 4 suggested regression models fitted. Namely,
 model 1, log(R)=log(d);
 model 2, R=d+d2
+d3
+ d4
+ d5
+d6
+ d7
+d8
+ d9
+d10
+ d11
+d12
+ d13
+d14
+d15
;
 model 3, and
 model 4, .
For model 1, the fit plot is negatively linear, as the logarithmic
transformation of both variables gives negative values. Log(d) is a
significant parameter as its p-value is lower than 0.05. The R-square of 1.00 indicates the model is a good fit.
Meanwhile, model 2 is a rank-deficient least squares model. As such, the least-squares solutions for the
parameters are not unique, producing biased estimates with some misleading statistics.
Parameter Estimates R-Square
Variable Parameter Estimate Standard Error t Value Pr > |t| 0.9944
Intercept 3.11362 0.21165 14.71 <.0001
d -417.05833 43.70357 -9.54 <.0001
d2 23289 3277.49356 7.11 0.0004
d3 -685537 119274 -5.75 0.0012
d4 11541180 2347752 4.92 0.0027
d5 -113141376 25879463 -4.37 0.0047
d6 617364760 154399878 4.00 0.0071
d7 -1539663710 412457801 -3.73 0.0097
d8 0 . . .
d9 5151630410 1517635765 3.39 0.0146
d10 0 . . .
d11 0 . . .
d12 0 . . .
d13 0 . . .
d14 -4.28733E11 1.406149E11 -3.05 0.0225
d15 0 . . .
Table 2: Parameter Estimate for Model 2.
Table 2 shows that the model of resistance being a polynomial of degree 15 in diameter might be over fitted.
Its R-square value of 0.9944 is the only high because every time you add a predictor to a model, the R-
squared increases, supporting the argument that this model is not a good fit.
Table 3: Parameter Estimate for Model 3.
Model 3 is a rather good fit as it has R-square of 1. However, the p-value of 0.3798 for the parameter 1/d
suggests that the parameter is not significant in the model. Based on this, model 4 is fitted with only 1/d2
, as
a simpler model is usually preferred in general.
Parameter Estimates R-Square
Variable Parameter
Estimate
Standard
Error
t Value Pr > |t| 1.0000
Intercept -9.68163 0.00659 -1469.6 <.0001
Log(d) -1.99987 0.00208 -963.01 <.0001
Table 1: Parameter Estimate for Model 1.
Parameter Estimates R-Square
Variable Parameter Estimate Standard Error t Value Pr > |t| 1.0000
Intercept 0.00024745 0.00061146 0.40 0.6923
1/d2 0.00006279 2.538051E-7 247.39 <.0001
1/d -0.00002805 0.00003085 -0.91 0.3798
Task 2: The Resistance of Constantin
Table 4: Parameter Estimate for Model 4.
In model 4, the transformed variable 1/d2
is a significant parameter, with p-value of less than 0.0001. The fit
plot is positively linear with an R-square of value 1.
Based on the analysis so far, models 1 and 4 give the best fit compared to the other 2 models. These results
agree with the suggested potential models derived from the exploratory findings. Further analysis has to be
carried out to establish a conclusion.
Cross Validation
Leave-one-out cross validation was carried out to test a goodness of fit of the models. The models are
compared below.
Figure 3: Residual plots for the 4 models fitted.
First, we consider the residual plots of all 4 models. As shown, model 1, 3 and 4 are relatively better fitted
than model 2. Model 2 has one extreme residual value of over 20000, which causes the large scale of the
residual plot. Moreover, all its residual values are relatively larger compared to other 3 models. Meanwhile,
the residuals in model 1 are randomly scattered, but have values that are larger than that of model 3 and 4.
Parameter Estimates R-Square
Variable Parameter
Estimate
Standard
Error
t Value Pr > |t| 1.0000
Intercept -0.00020809 0.00034825 -0.60 0.5597
1/d2 0.00006257 7.917185E-8 790.31 <.0001
Task 2: The Resistance of Constantin
The residual plots for Model 3 and 4 have the smallest scales among all plots. This suggests that model 3 and
4 have the smallest residuals, and thus could be better as regression models. Then, we look at the by
estimates for out-of-sample RMSE and MAE obtained.
Table 5: Table of error measures and their confidence intervals for different regression models.
As expressed in the table above, the second suggested method (R=1/d2
), which transforms diameter to the
power of -2, shows the lowest value for both measures of prediction goodness. However, there is in overlap
in the 95% confidence intervals of the error measures for R=1/d2
and R=1/d2
+1/d. We cannot conclude that
model 4 is the best model yet. Hence, paired Wilcoxon signed-rank tests have been done between all 4
different models to confirm the results above.
Models Log(R)=log(d) R=d+d2
+d3
+d4
+d5
+d6
+d7
+d8
+d9+d10
+d11
+d
12
+d13
+d14
+d15
R=1/d2
+1/d R=1/d2
Log(R)=log(d) S=65
p-value=0.0002
S=64
p-value=0.0002
S=68
p-value=<.0001
R=d+d2
+d3
+d4
+d5
+d6
+
d7
+d8
+d9+d10
+d11
+d12
+
d13
+d14
+d15
S=67
p-value=<.0001
S=67
p-value=<.0001
R=1/d2
+1/d S=5
p-value=0.8209
Table 6: Results of paired Wilcoxon signed-rank tests on absolute residuals.
The tests support the deduction that the performances of the models are significantly different from one
another, as most p-values are less than 0.05, indicating significant difference in the absolute residuals from
each regression model. The exception to this is the 4th
(R=1/d2
) and 3rd
models (R=1/d2
+1/d), with p-
value=0.8209 suggesting no significant difference in model performances. Generally, models with less
covariates is better when 2 models are similar. Thus, in this case, the 4th
model is the best model in
explaining resistance in terms of diameter.
Conclusion
Based on the investigation, Model 2 (R=d+d2
+d3
+d4
+d5
+d6
+d7
+d8
+d9
+d10
+d11
+d12
+d13
+d14
+d15
) is an
extreme example of fitting an overly complicated model to get a good fit. The model is too complex for the
data even though it appears to explain a lot of variation in the response variable. Model 1 (Log(R)=log(d)) is
relatively good but it does not have the lowest RMSE and MAE, suggesting that the residual it produces is
relatively large. Meanwhile, Model 3 (R=1/d2
+1/d) has one insignificant covariate and leads to the second
suggested model.
In conclusion, Model 4 (R=1/d2
) is the best model in explaining the resistance of Constantin wire in terms of
varying diameter, producing the simplest model with high goodness of fit and the smallest residuals, as
evidenced by the high R-squared value and low RMSE and MAE error measures. The model can be
interpreted as when diameter of Constantin wire decreases, the square of diameter decreases, the reciprocal
of the square of diameter increase, the resistance increases.
Model RMSE MAE
Suggested (i) Log(R)=log(d) 0.0093
(0.0092, 0.0094)
0.0065
(0.0049, 0.0081)
Part (b) (i) R=d+d2
+d3
+d4
+d5
+d6
+d7
+d8
+d9+d10
+d11
+d12
+d13
+d14
+d15
5358.40
(2764.46, 7952.34)
1340.41
(0.91, 2679.91)
Part (b) (ii) R=1/d2
+1/d 0.0019
(0.0014, 0.0024)
0.0009
(0.0005, 0.0013)
Suggested (ii) R=1/d2
0.0013
(0.0009, 0.0017)
0.0006
(0.0003, 0.0009)
Task 1 Code
Appendix A: Task 1 R Code
# Part (a). Data load and conversation of day column.
# Load data from CSV file.
ISE_data=read.csv(file="C:/Documents/STAT7001/Istanbul.csv",
header=TRUE, sep=",")
# Convert the date column into a recognisable date format in R.
ISE_data$date=as.POSIXct(ISE_data$date, format="%d-%b-%Y")
# Find the difference in numbers of days, and round off any decimals.
ISE_data$date<-difftime(ISE_data$date,ISE_data$date[1], units="days")
ISE_data$date<-round(ISE_data$date,digits=0)
ISE_data$date=as.numeric(as.character(ISE_data$date))
Task 1 Code
# Part (b). Exploratory data analysis.
# Association between index and time.
plot(ISE_data[,1], ISE_data[,2], xlab="Days", ylab="ISE",
abline(lm(ISE~date, ISE_data)))
plot(ISE_data[,1], ISE_data[,3], xlab="Days", ylab="S&P 500",
abline(lm(S.P.500~date, ISE_data)))
plot(ISE_data[,1], ISE_data[,4], xlab="Days", ylab="DAX",
abline(lm(DAX~date, ISE_data)))
plot(ISE_data[,1], ISE_data[,5], xlab="Days", ylab="FTSE 100",
abline(lm(FTSE~date, ISE_data)))
plot(ISE_data[,1], ISE_data[,6], xlab="Days", ylab="Nikkei 225",
abline(lm(NIKKEI~date, ISE_data)))
plot(ISE_data[,1], ISE_data[,7], xlab="Days", ylab="Ibovespa",
abline(lm(BOVESPA~date, ISE_data)))
plot(ISE_data[,1], ISE_data[,8], xlab="Days", ylab="MSCI EU Index",
abline(lm(MSCI.EU~date, ISE_data)))
plot(ISE_data[,1], ISE_data[,9], xlab="Days", ylab="MSCI EM Index",
abline(lm(MSCI.EM~date, ISE_data)))
cor.test(ISE_data[,1], ISE_data[,2])
cor.test(ISE_data[,1], ISE_data[,3])
cor.test(ISE_data[,1], ISE_data[,4])
cor.test(ISE_data[,1], ISE_data[,5])
cor.test(ISE_data[,1], ISE_data[,6])
cor.test(ISE_data[,1], ISE_data[,7])
cor.test(ISE_data[,1], ISE_data[,8])
cor.test(ISE_data[,1], ISE_data[,9])
# Association between ISE index and index the days before.
plot(ISE_data[c(2:536),2], ISE_data[c(1:535),2], xlab="ISE, Day N", ylab="ISE, Day N-1")
plot(ISE_data[c(3:536),2], ISE_data[c(1:534),2], xlab="ISE, Day N", ylab="ISE, Day N-2")
plot(ISE_data[c(4:536),2], ISE_data[c(1:533),2], xlab="ISE, Day N", ylab="ISE, Day N-3")
cor.test(ISE_data[c(2:536),2], ISE_data[c(1:535),2])
cor.test(ISE_data[c(3:536),2], ISE_data[c(1:534),2])
cor.test(ISE_data[c(4:536),2], ISE_data[c(1:533),2])
# Association between S&P 500 index and index the days before.
plot(ISE_data[c(2:536),3], ISE_data[c(1:535),3],
xlab="S&P 500, Day N", ylab="S&P 500, Day N-1")
plot(ISE_data[c(3:536),3], ISE_data[c(1:534),3],
xlab="S&P 500, Day N", ylab="S&P 500, Day N-2")
plot(ISE_data[c(4:536),3], ISE_data[c(1:533),3],
xlab="S&P 500, Day N", ylab="S&P 500, Day N-3")
cor.test(ISE_data[c(2:536),3], ISE_data[c(1:535),3])
cor.test(ISE_data[c(3:536),3], ISE_data[c(1:534),3])
cor.test(ISE_data[c(4:536),3], ISE_data[c(1:533),3])
# Association between DAX index and index the days before.
plot(ISE_data[c(2:536),4], ISE_data[c(1:535),4], xlab="DAX, Day N", ylab="DAX, Day N-1")
plot(ISE_data[c(3:536),4], ISE_data[c(1:534),4], xlab="DAX, Day N", ylab="DAX, Day N-2")
plot(ISE_data[c(4:536),4], ISE_data[c(1:533),4], xlab="DAX, Day N", ylab="DAX, Day N-3")
cor.test(ISE_data[c(2:536),4], ISE_data[c(1:535),4])
cor.test(ISE_data[c(3:536),4], ISE_data[c(1:534),4])
cor.test(ISE_data[c(4:536),4], ISE_data[c(1:533),4])
# Association between FTSE 100 index and index the days before.
plot(ISE_data[c(2:536),5], ISE_data[c(1:535),5],
xlab="FTSE 100, Day N", ylab="FTSE 100, Day N-1")
plot(ISE_data[c(3:536),5], ISE_data[c(1:534),5],
xlab="FTSE 100, Day N", ylab="FTSE 100, Day N-2")
plot(ISE_data[c(4:536),5], ISE_data[c(1:533),5],
xlab="FTSE 100, Day N", ylab="FTSE 100, Day N-3")
cor.test(ISE_data[c(2:536),5], ISE_data[c(1:535),5])
cor.test(ISE_data[c(3:536),5], ISE_data[c(1:534),5])
cor.test(ISE_data[c(4:536),5], ISE_data[c(1:533),5])
Task 1 Code
# Association between Nikkei 225 index and index the days before.
plot(ISE_data[c(2:536),6], ISE_data[c(1:535),6],
xlab="Nikkei 225, Day N", ylab="Nikkei 225, Day N-1")
plot(ISE_data[c(3:536),6], ISE_data[c(1:534),6],
xlab="Nikkei 225, Day N", ylab="Nikkei 225, Day N-2")
plot(ISE_data[c(4:536),6], ISE_data[c(1:533),6],
xlab="Nikkei 225, Day N", ylab="Nikkei 225, Day N-3")
cor.test(ISE_data[c(2:536),6], ISE_data[c(1:535),6])
cor.test(ISE_data[c(3:536),6], ISE_data[c(1:534),6])
cor.test(ISE_data[c(4:536),6], ISE_data[c(1:533),6])
# Association between Ibovespa index and index the days before.
plot(ISE_data[c(2:536),7], ISE_data[c(1:535),7],
xlab="Ibovespa, Day N", ylab="Ibovespa, Day N-1")
plot(ISE_data[c(3:536),7], ISE_data[c(1:534),7],
xlab="Ibovespa, Day N", ylab="Ibovespa, Day N-2")
plot(ISE_data[c(4:536),7], ISE_data[c(1:533),7],
xlab="Ibovespa, Day N", ylab="Ibovespa, Day N-3")
cor.test(ISE_data[c(2:536),7], ISE_data[c(1:535),7])
cor.test(ISE_data[c(3:536),7], ISE_data[c(1:534),7])
cor.test(ISE_data[c(4:536),7], ISE_data[c(1:533),7])
# Association between MSCI EU index and index the days before.
plot(ISE_data[c(2:536),8], ISE_data[c(1:535),8],
xlab="MSCI EU, Day N", ylab="MSCI EU, Day N-1")
plot(ISE_data[c(3:536),8], ISE_data[c(1:534),8],
xlab="MSCI EU, Day N", ylab="MSCI EU, Day N-2")
plot(ISE_data[c(4:536),8], ISE_data[c(1:533),8],
xlab="MSCI EU, Day N", ylab="MSCI EU, Day N-3")
cor.test(ISE_data[c(2:536),8], ISE_data[c(1:535),8])
cor.test(ISE_data[c(3:536),8], ISE_data[c(1:534),8])
cor.test(ISE_data[c(4:536),8], ISE_data[c(1:533),8])
# Association between MSCI EM index and index the days before.
plot(ISE_data[c(2:536),9], ISE_data[c(1:535),9],
xlab="MSCI EM, Day N", ylab="MSCI EM, Day N-1")
plot(ISE_data[c(3:536),9], ISE_data[c(1:534),9],
xlab="MSCI EM, Day N", ylab="MSCI EM, Day N-2")
plot(ISE_data[c(4:536),9], ISE_data[c(1:533),9],
xlab="MSCI EM, Day N", ylab="MSCI EM, Day N-3")
cor.test(ISE_data[c(2:536),9], ISE_data[c(1:535),9])
cor.test(ISE_data[c(3:536),9], ISE_data[c(1:534),9])
cor.test(ISE_data[c(4:536),9], ISE_data[c(1:533),9])
Task 1 Code
# Part (c). Benchmarking with all data.
# ----------------------------------------------------------------------------
# Creating functions for measures of prediction goodness and their std errors.
# ----------------------------------------------------------------------------
# (i) Root mean squared error (RMSE)
rmse=function(observed, fitted){
sqrt(mean((observed-fitted)^2))
}
rmseSE=function(observed, fitted){
sd((observed-fitted)^2)/sqrt(length(observed))/(2*sqrt(mean((observed-fitted)^2)))
}
# (ii) Mean absolute error (MAE)
mae=function(observed, fitted){
mean(abs(observed-fitted))
}
maeSE=function(observed, fitted){
sd(abs(observed-fitted))/sqrt(length(observed))
}
# (iii) Relative RMSE
RELrmse=function(observed, fitted){
sqrt(mean(((observed-fitted)/observed)^2))
}
RELrmseSE=function(observed, fitted){
sd(((observed-fitted)/observed)^2)/sqrt(length(observed))/
(2*sqrt(mean(((observed-fitted)/observed)^2)))
}
# (iv) Relative MAE
RELmae=function(observed, fitted){
mean(abs((observed-fitted)/observed))
}
RELmaeSE=function(observed, fitted){
sd(abs((observed-fitted)/observed))/sqrt(length(observed))
}
# ---------------------------------------------------------------------------------
# Comparison of prediction methods, using validation set-up (i).
# i.e. Chronologically first 80% of data (428.8 or 429 entries) as training sample;
# remaining data as test sample.
# ---------------------------------------------------------------------------------
# Prediction method (i): Mean
# -- Predictor
Chr.ISEmean=mean(ISE_data$ISE[c(1:429)])
# -- Predicted values
Chr.ISEmean
# -- Error measures
Chr.mean.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.rmse-1.96*Chr.mean.rmseSE; Chr.mean.rmse+1.96*Chr.mean.rmseSE
Chr.mean.mae = mae(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.mae-1.96*Chr.mean.maeSE; Chr.mean.mae+1.96*Chr.mean.maeSE
Chr.mean.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.RELrmse-1.96*Chr.mean.RELrmseSE; Chr.mean.RELrmse+1.96*Chr.mean.RELrmseSE
Chr.mean.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.ISEmean)
Chr.mean.RELmae-1.96*Chr.mean.RELmaeSE; Chr.mean.RELmae+1.96*Chr.mean.RELmaeSE
Task 1 Code
# Prediction method (ii): Linear model excluding time.
# -- Model
Chr.LMnoTime=lm(ISE ~ S.P.500 + DAX + FTSE + NIKKEI + BOVESPA + MSCI.EU + MSCI.EM,
data=ISE_data[c(1:429),])
summary(Chr.LMnoTime)
# -- Predicted values
Chr.LMnoTime.Pred=predict(Chr.LMnoTime, ISE_data[c(430:536),])
# -- Error measures
Chr.LMnoTime.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.rmse-1.96*Chr.LMnoTime.rmseSE; Chr.LMnoTime.rmse+1.96*Chr.LMnoTime.rmseSE
Chr.LMnoTime.mae = mae(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.mae-1.96*Chr.LMnoTime.maeSE; Chr.LMnoTime.mae+1.96*Chr.LMnoTime.maeSE
Chr.LMnoTime.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.RELrmse-1.96*Chr.LMnoTime.RELrmseSE;
Chr.LMnoTime.RELrmse+1.96*Chr.LMnoTime.RELrmseSE
Chr.LMnoTime.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred)
Chr.LMnoTime.RELmae-1.96*Chr.LMnoTime.RELmaeSE;
Chr.LMnoTime.RELmae+1.96*Chr.LMnoTime.RELmaeSE
# Prediction method (iii): Linear model including time.
# -- Model
Chr.LMwithTime=lm(ISE ~ date+S.P.500+DAX+FTSE+NIKKEI+BOVESPA+MSCI.EU+MSCI.EM,
data=ISE_data[c(1:429),])
summary(Chr.LMwithTime)
# -- Predicted values
Chr.LMwithTime.Pred=predict(Chr.LMwithTime, ISE_data[c(430:536),])
# -- Error measures
Chr.LMwithTime.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.rmse-1.96*Chr.LMwithTime.rmseSE;
Chr.LMwithTime.rmse+1.96*Chr.LMwithTime.rmseSE
Chr.LMwithTime.mae = mae(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.mae-1.96*Chr.LMwithTime.maeSE;
Chr.LMwithTime.mae+1.96*Chr.LMwithTime.maeSE
Chr.LMwithTime.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.RELrmse-1.96*Chr.LMwithTime.RELrmseSE;
Chr.LMwithTime.RELrmse+1.96*Chr.LMwithTime.RELrmseSE
Chr.LMwithTime.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred)
Chr.LMwithTime.RELmae-1.96*Chr.LMwithTime.RELmaeSE;
Chr.LMwithTime.RELmae+1.96*Chr.LMwithTime.RELmaeSE
# Comparison of prediction methods.
wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.ISEmean),
abs(ISE_data$ISE[c(430:536)]-Chr.LMnoTime.Pred), paired=TRUE)
wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.ISEmean),
abs(ISE_data$ISE[c(430:536)]-Chr.LMwithTime.Pred), paired=TRUE)
wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.LMnoTime.Pred),
abs(ISE_data$ISE[c(430:536)]-Chr.LMwithTime.Pred), paired=TRUE)
Task 1 Code
# ---------------------------------------------------------------
# Comparison of prediction methods, using validation set-up (ii).
# i.e. Five-fold cross-validation with uniformly randomly sampled folds.
# ---------------------------------------------------------------
# Five-fold cross-validation data setup.
# Create random permutation of values.
set.seed(555)
randperm=sample(nrow(ISE_data))
# Create lists with test folds and their respective training folds.
trainfolds=list()
testfolds=list()
for(i in 1:5){
lower=floor((i-1)*nrow(ISE_data)/5)+1
upper=floor(i*nrow(ISE_data)/5)
testfolds[[i]]=randperm[lower:upper]
trainfolds[[i]]=setdiff(1:nrow(ISE_data),testfolds[[i]])
testfolds[[i]]=ISE_data[testfolds[[i]],]
trainfolds[[i]]=ISE_data[trainfolds[[i]],]
}
# ---------------------------------------------------------------
# Prediction method (i): Mean
# -- Predictor
Fol.ISEmean=list()
for(i in 1:5){
Fol.ISEmean[[i]]=mean(trainfolds[[i]][[2]])
}
# -- Predicted values
Fol.ISEmean
# -- Error measures
# *** RMSE ***
Fol.mean.rmse=list()
for(i in 1:5){
Fol.mean.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Fol.mean.rmse=mean(as.numeric(Fol.mean.rmse))
# Standard Error
Fol.mean.rmseSE=list()
for(i in 1:5){
Fol.mean.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Fol.mean.rmseSE=mean(as.numeric(Fol.mean.rmseSE))
# Confidence Interval
Fol.mean.rmse-1.96*Fol.mean.rmseSE; Fol.mean.rmse+1.96*Fol.mean.rmseSE
# *** MAE ***
Fol.mean.mae=list()
for(i in 1:5){
Fol.mean.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Fol.mean.mae=mean(as.numeric(Fol.mean.mae))
# Standard Error
Fol.mean.maeSE=list()
for(i in 1:5){
Fol.mean.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Fol.mean.maeSE=mean(as.numeric(Fol.mean.maeSE))
# Confidence Interval
Fol.mean.mae-1.96*Fol.mean.maeSE; Fol.mean.mae+1.96*Fol.mean.maeSE
# *** Relative RMSE ***
Fol.mean.RELrmse=list()
for(i in 1:5){
Fol.mean.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Task 1 Code
Fol.mean.RELrmse=mean(as.numeric(Fol.mean.RELrmse))
# Standard Error
Fol.mean.RELrmseSE=list()
for(i in 1:5){
Fol.mean.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Fol.mean.RELrmseSE=mean(as.numeric(Fol.mean.RELrmseSE))
# Confidence Interval
Fol.mean.RELrmse-1.96*Fol.mean.RELrmseSE; Fol.mean.RELrmse+1.96*Fol.mean.RELrmseSE
# *** Relative MAE ***
Fol.mean.RELmae=list()
for(i in 1:5){
Fol.mean.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Fol.mean.RELmae=mean(as.numeric(Fol.mean.RELmae))
# Standard Error
Fol.mean.RELmaeSE=list()
for(i in 1:5){
Fol.mean.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]])
}
Fol.mean.RELmaeSE=mean(as.numeric(Fol.mean.RELmaeSE))
# Confidence Interval
Fol.mean.RELmae-1.96*Fol.mean.RELmaeSE; Fol.mean.RELmae+1.96*Fol.mean.RELmaeSE
# ---------------------------------------------------------------
# Prediction method (ii): Linear model excluding time.
# -- Models
Fol.LMnoTime=list()
for(i in 1:5){
Fol.LMnoTime[[i]]=lm(ISE~S.P.500 + DAX + FTSE + NIKKEI + BOVESPA + MSCI.EU + MSCI.EM,
data=trainfolds[[i]])
}
# -- Predicted values
Fol.LMnoTime.Pred=list()
for(i in 1:5){
Fol.LMnoTime.Pred[[i]]=predict(Fol.LMnoTime[[i]], testfolds[[i]])
}
# -- Error measures
# *** RMSE ***
Fol.LMnoTime.rmse=list()
for(i in 1:5){
Fol.LMnoTime.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.rmse=mean(as.numeric(Fol.LMnoTime.rmse))
# Standard Error
Fol.LMnoTime.rmseSE=list()
for(i in 1:5){
Fol.LMnoTime.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.rmseSE=mean(as.numeric(Fol.LMnoTime.rmseSE))
# Confidence Interval
Fol.LMnoTime.rmse-1.96*Fol.LMnoTime.rmseSE; Fol.LMnoTime.rmse+1.96*Fol.LMnoTime.rmseSE
# *** MAE ***
Fol.LMnoTime.mae=list()
for(i in 1:5){
Fol.LMnoTime.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.mae=mean(as.numeric(Fol.LMnoTime.mae))
# Standard Error
Fol.LMnoTime.maeSE=list()
for(i in 1:5){
Fol.LMnoTime.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.maeSE=mean(as.numeric(Fol.LMnoTime.maeSE))
# Confidence Interval
Task 1 Code
Fol.LMnoTime.mae-1.96*Fol.LMnoTime.maeSE; Fol.LMnoTime.mae+1.96*Fol.LMnoTime.maeSE
# *** Relative RMSE ***
Fol.LMnoTime.RELrmse=list()
for(i in 1:5){
Fol.LMnoTime.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.RELrmse=mean(as.numeric(Fol.LMnoTime.RELrmse))
# Standard Error
Fol.LMnoTime.RELrmseSE=list()
for(i in 1:5){
Fol.LMnoTime.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.RELrmseSE=mean(as.numeric(Fol.LMnoTime.RELrmseSE))
# Confidence Interval
Fol.LMnoTime.RELrmse-1.96*Fol.LMnoTime.RELrmseSE;
Fol.LMnoTime.RELrmse+1.96*Fol.LMnoTime.RELrmseSE
# *** Relative MAE ***
Fol.LMnoTime.RELmae=list()
for(i in 1:5){
Fol.LMnoTime.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.RELmae=mean(as.numeric(Fol.LMnoTime.RELmae))
# Standard Error
Fol.LMnoTime.RELmaeSE=list()
for(i in 1:5){
Fol.LMnoTime.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]])
}
Fol.LMnoTime.RELmaeSE=mean(as.numeric(Fol.LMnoTime.RELmaeSE))
# Confidence Interval
Fol.LMnoTime.RELmae-1.96*Fol.LMnoTime.RELmaeSE;
Fol.LMnoTime.RELmae+1.96*Fol.LMnoTime.RELmaeSE
# ---------------------------------------------------------------
# Prediction method (iii): Linear model including time.
# -- Models
Fol.LMwithTime=list()
for(i in 1:5){
Fol.LMwithTime[[i]]=lm(ISE~date+S.P.500+DAX+FTSE+NIKKEI+BOVESPA+MSCI.EU+MSCI.EM,
data=trainfolds[[i]])
}
# -- Predicted values
Fol.LMwithTime.Pred=list()
for(i in 1:5){
Fol.LMwithTime.Pred[[i]]=predict(Fol.LMwithTime[[i]], testfolds[[i]])
}
# -- Error measures
# *** RMSE ***
Fol.LMwithTime.rmse=list()
for(i in 1:5){
Fol.LMwithTime.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.rmse=mean(as.numeric(Fol.LMwithTime.rmse))
# Standard Error
Fol.LMwithTime.rmseSE=list()
for(i in 1:5){
Fol.LMwithTime.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.rmseSE=mean(as.numeric(Fol.LMwithTime.rmseSE))
# Confidence Interval
Fol.LMwithTime.rmse-1.96*Fol.LMwithTime.rmseSE;
Fol.LMwithTime.rmse+1.96*Fol.LMwithTime.rmseSE
# *** MAE ***
Fol.LMwithTime.mae=list()
for(i in 1:5){
Task 1 Code
Fol.LMwithTime.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.mae=mean(as.numeric(Fol.LMwithTime.mae))
# Standard Error
Fol.LMwithTime.maeSE=list()
for(i in 1:5){
Fol.LMwithTime.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.maeSE=mean(as.numeric(Fol.LMwithTime.maeSE))
# Confidence Interval
Fol.LMwithTime.mae-1.96*Fol.LMwithTime.maeSE;
Fol.LMwithTime.mae+1.96*Fol.LMwithTime.maeSE
# *** Relative RMSE ***
Fol.LMwithTime.RELrmse=list()
for(i in 1:5){
Fol.LMwithTime.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.RELrmse=mean(as.numeric(Fol.LMwithTime.RELrmse))
# Standard Error
Fol.LMwithTime.RELrmseSE=list()
for(i in 1:5){
Fol.LMwithTime.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.RELrmseSE=mean(as.numeric(Fol.LMwithTime.RELrmseSE))
# Confidence Interval
Fol.LMwithTime.RELrmse-1.96*Fol.LMwithTime.RELrmseSE;
Fol.LMwithTime.RELrmse+1.96*Fol.LMwithTime.RELrmseSE
# *** Relative MAE ***
Fol.LMwithTime.RELmae=list()
for(i in 1:5){
Fol.LMwithTime.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.RELmae=mean(as.numeric(Fol.LMwithTime.RELmae))
# Standard Error
Fol.LMwithTime.RELmaeSE=list()
for(i in 1:5){
Fol.LMwithTime.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]])
}
Fol.LMwithTime.RELmaeSE=mean(as.numeric(Fol.LMwithTime.RELmaeSE))
# Confidence Interval
Fol.LMwithTime.RELmae-1.96*Fol.LMwithTime.RELmaeSE;
Fol.LMwithTime.RELmae+1.96*Fol.LMwithTime.RELmaeSE
# ---------------------------------------------------------------
# Comparison of prediction methods.
# Vector of residuals for prediction method (i).
Fol.ISEmean.resid=list()
for(i in 1:5){
Fol.ISEmean.resid[[i]]=testfolds[[i]]$ISE-Fol.ISEmean[[i]]
}
Fol.ISEmean.resid=unlist(Fol.ISEmean.resid)
# Vector of residuals for prediction method (ii).
Fol.LMnoTime.resid=list()
for(i in 1:5){
Fol.LMnoTime.resid[[i]]=testfolds[[i]]$ISE-Fol.LMnoTime.Pred[[i]]
}
Fol.LMnoTime.resid=unlist(Fol.LMnoTime.resid)
# Vector of residuals for prediction method (iii).
Fol.LMwithTime.resid=list()
for(i in 1:5){
Fol.LMwithTime.resid[[i]]=testfolds[[i]]$ISE-Fol.LMwithTime.Pred[[i]]
}
Fol.LMwithTime.resid=unlist(Fol.LMwithTime.resid)
# Test for comparison of prediction methods.
wilcox.test(abs(Fol.ISEmean.resid), abs(Fol.LMnoTime.resid), paired=TRUE)
wilcox.test(abs(Fol.ISEmean.resid), abs(Fol.LMwithTime.resid), paired=TRUE)
wilcox.test(abs(Fol.LMnoTime.resid), abs(Fol.LMwithTime.resid), paired=TRUE)
Task 1 Code
# Part (d). Benchmarking with previous data.
#create a vector of errors for RMSE and MAE in (i)
ISE.error1=vector(mode="numeric", length=526)
result.index=0
for(n in 11:536){
result.index=result.index+1
error1=ISE_data[n,2]-ISE_data[n-1,2]
ISE.error1[result.index]=error1
}
#calculate RMSE for (i)
(RMSE1=sqrt(mean(ISE.error1^2)))
#calculate MAE for (i)
(MAE1=mean(abs(ISE.error1)))
#calculate standard error of RMSE for (i)
(SE.RMSE1=(sd(ISE.error1^2)/sqrt(526))/(2*sqrt(mean(ISE.error1^2))))
#calculate standard error of MAE for (i)
(SE.MAE1=sd(abs(ISE.error1))/sqrt(526))
#95% confidence interval for RMSE
RMSE1-1.96*SE.RMSE1; RMSE1+1.96*SE.RMSE1
#95% confidence interval for MAE
MAE1-1.96*SE.MAE1; MAE1+1.96*SE.MAE1
#create a vector of errors for relative RMSE and relative MAE in (i)
ISE.rerror1=ISE.error1/ISE_data[c(11:536),2]
(rRMSE1=sqrt(mean(ISE.rerror1^2)))
#calculate relatove MAE for (i)
(rMAE1=mean(abs(ISE.rerror1)))
#calculate standard error of relative RMSE for (i)
(SE.rRMSE1=(sd(ISE.rerror1^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror1^2))))
#calculate standard error of relative MAE for (i)
(SE.rMAE1=sd(abs(ISE.rerror1))/sqrt(526))
#95% confidence interval for relaitive RMSE
rRMSE1-1.96*SE.rRMSE1; rRMSE1+1.96*SE.rRMSE1
#95% confidence interval for relative MAE
rMAE1-1.96*SE.rMAE1; rMAE1+1.96*SE.rMAE1
############
#create a vector of errors for RMSE and MAE in (ii)
ISE.error2=vector(mode="numeric", length=526)
result.index=0
for(n in 11:536){
result.index=result.index+1
error2=ISE_data[n,2]-mean(ISE_data[c((n-5):(n-1)),2])
ISE.error2[result.index]=error2
}
#calculate RMSE for (ii)
(RMSE2=sqrt(mean(ISE.error2^2)))
#calculate MAE for (ii)
(MAE2=mean(abs(ISE.error2)))
#calculate standard error of RMSE for (ii)
(SE.RMSE2=(sd(ISE.error2^2)/sqrt(526))/(2*sqrt(mean(ISE.error2^2))))
#calculate standard error of MAE for (ii)
(SE.MAE2=sd(abs(ISE.error2))/sqrt(526))
#95% confidence interval for RMSE
RMSE2-1.96*SE.RMSE2; RMSE2+1.96*SE.RMSE2
#95% confidence interval for MAE
MAE2-1.96*SE.MAE2; MAE2+1.96*SE.MAE2
#create a vector of errors for relative RMSE and relative MAE in (ii)
ISE.rerror2=ISE.error2/ISE_data[c(11:536),2]
Task 1 Code
#calculate relative RMSE for (ii)
(rRMSE2=sqrt(mean(ISE.rerror2^2)))
#calculate relative MAE for (ii)
(rMAE2=mean(abs(ISE.rerror2)))
#calculate standard error of relative RMSE for (ii)
(SE.rRMSE2=(sd(ISE.rerror2^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror2^2))))
#calculate standard error of relative MAE for (ii)
(SE.rMAE2=sd(abs(ISE.rerror2))/sqrt(526))
#95% confidence interval for relative RMSE
rRMSE2-1.96*SE.rRMSE2; rRMSE2+1.96*SE.rRMSE2
#95% confidence interval for relative MAE
rMAE2-1.96*SE.rMAE2; rMAE2+1.96*SE.rMAE2
#################################################################################
#create a vector of errors for RMSE and MAE in (iii)
ISE_data.iii=ISE_data[-536,]
ISE_data.iii$ISE.predicted=ISE_data$ISE[2:536]
ISE.error3=vector(mode="numeric", length=526)
result.index=0
for(n in 10:535){
result.index=result.index+1
lmmodel3=lm(ISE.predicted~ISE+S.P.500+DAX+FTSE+NIKKEI+BOVESPA+MSCI.EU+MSCI.EM,
data=ISE_data.iii[(n-9):(n-1),])
error3=ISE_data.iii[n,10]-predict(lmmodel3, ISE_data.iii[n,])
ISE.error3[result.index]=error3
}
#calculate RMSE for (iii)
(RMSE3=sqrt(mean(ISE.error3^2)))
#calculate MAE for (iii)
(MAE3=mean(abs(ISE.error3)))
#calculate standard error of RMSE for (iii)
(SE.RMSE3=(sd(ISE.error3^2)/sqrt(526))/(2*sqrt(mean(ISE.error3^2))))
#calculate standard error of MAE for (iii)
(SE.MAE3=sd(abs(ISE.error3))/sqrt(526))
#95% confidence interval for RMSE
RMSE3-1.96*SE.RMSE3; RMSE3+1.96*SE.RMSE3
#95% confidence interval for MAE
MAE3-1.96*SE.MAE3; MAE3+1.96*SE.MAE3
#create a vector of errors for relative RMSE and relative MAE in (iii)
ISE.rerror3=ISE.error3/ISE_data.iii[c(10:535),10]
#calculate relative RMSE for (iii)
(rRMSE3=sqrt(mean(ISE.rerror3^2)))
#calculate relative MAE for (iii)
(rMAE3=mean(abs(ISE.rerror3)))
#calculate standard error of relative RMSE for (iii)
(SE.rRMSE3=(sd(ISE.rerror3^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror3^2))))
#calculate standard error of relative MAE for (iii)
(SE.rMAE3=sd(abs(ISE.rerror3))/sqrt(526))
#95% confidence interval for relative RMSE
rRMSE3-1.96*SE.rRMSE3; rRMSE3+1.96*SE.rRMSE3
#95% confidence interval for relative MAE
rMAE3-1.96*SE.rMAE3; rMAE3+1.96*SE.rMAE3
Task 1 Code
########################################################################################
#create a vector of errors for RMSE and MAE in (iv)
ISE_data.iv=ISE_data[-c(535,536),]
ISE_data.extracted=ISE_data[-c(1,536),-1]
ISE_data.iv=cbind(ISE_data.iv,ISE_data.extracted)
ISE_data.iv$ISE.predicted=ISE_data[-c(1,2),2]
names(ISE_data.iv)=c("date","ISE2","S.P.5002","DAX2","FTSE2",
"NIKKEI2","BOVESPA2","MSCI.EU2","MSCI.EM2",
"ISE1","S.P.5001","DAX1","FTSE1",
"NIKKEI1","BOVESPA1","MSCI.EU1","MSCI.EM1","ISE.predicted")
ISE.error4=vector(mode="numeric", length=526)
result.index=0
for(n in 9:534){
result.index=result.index+1
lmmodel4=lm(ISE.predicted~ISE2+S.P.5002+DAX2+FTSE2+NIKKEI2+BOVESPA2+MSCI.EU2+MSCI.EM2
+ISE1+S.P.5001+DAX1+FTSE1+NIKKEI1+BOVESPA1+MSCI.EU1+MSCI.EM1,
data=ISE_data.iv[(n-8):(n-1),])
error4=ISE_data.iv[n,18]-predict(lmmodel4, ISE_data.iv[n,])
ISE.error4[result.index]=error4
}
#calculate RMSE for (iv)
(RMSE4=sqrt(mean(ISE.error4^2)))
#calculate MAE for (iv)
(MAE4=mean(abs(ISE.error4)))
#calculate standard error of RMSE for (iv)
(SE.RMSE4=(sd(ISE.error4^2)/sqrt(526))/(2*sqrt(mean(ISE.error4^2))))
#calculate standard error of MAE for (iv)
(SE.MAE4=sd(abs(ISE.error4))/sqrt(526))
#95% confidence interval for RMSE
RMSE4-1.96*SE.RMSE4; RMSE4+1.96*SE.RMSE4
#95% confidence interval for MAE
MAE4-1.96*SE.MAE4; MAE4+1.96*SE.MAE4
#########
#create a vector of errors for relative RMSE and relative MAE in (iv)
ISE.rerror4=ISE.error4/ISE_data.iv[c(9:534),18]
#calculate relative RMSE for (iv)
(rRMSE4=sqrt(mean(ISE.rerror4^2)))
#calculate relative MAE for (iv)
(rMAE4=mean(abs(ISE.rerror4)))
#calculate standard error of relative RMSE for (iv)
(SE.rRMSE4=(sd(ISE.rerror4^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror4^2))))
#calculate standard error of relative MAE for (iv)
(SE.rMAE4=sd(abs(ISE.rerror4))/sqrt(526))
#95% confidence interval for relative RMSE
rRMSE4-1.96*SE.rRMSE4; rRMSE4+1.96*SE.rRMSE4
#95% confidence interval for relative MAE
rMAE4-1.96*SE.rMAE4; rMAE4+1.96*SE.rMAE4
#######################################################################################
#wilcoxon tests to compare the 4 different methods
wilcox.test(abs(ISE.error1),abs(ISE.error2), paired=TRUE)
wilcox.test(abs(ISE.error1),abs(ISE.error3), paired=TRUE)
wilcox.test(abs(ISE.error1),abs(ISE.error4), paired=TRUE)
wilcox.test(abs(ISE.error2),abs(ISE.error3), paired=TRUE)
wilcox.test(abs(ISE.error2),abs(ISE.error4), paired=TRUE)
wilcox.test(abs(ISE.error3),abs(ISE.error4), paired=TRUE)
Task 1 Code
# Part (e)-(c). Robust linear regression with Part (c) validation setups.
# ------------------------------
# Creating function for R(beta).
# ------------------------------
Rbeta=function(beta, covariates, observed){
sum(abs(as.matrix(covariates)%*%matrix(beta)-observed))
}
# ---------------------------------------------------------
# Validation set-up (i). Chronological 80-20 split of data.
# ---------------------------------------------------------
# Prediction method (iv). Robust linear regression
# -- Model
Chr.PartE=nlm(Rbeta, p=c(-1,-1,-1,-1,-1,-1,-1), observed=ISE_data$ISE[1:429],
covariates=ISE_data[1:429,3:9])
# -- Predicted values
Chr.PartE.Pred=as.matrix(ISE_data[430:536,3:9]) %*% matrix(Chr.PartE$estimate)
# -- Error measures
Chr.PartE.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.rmse-1.96*Chr.PartE.rmseSE; Chr.PartE.rmse+1.96*Chr.PartE.rmseSE
Chr.PartE.mae = mae(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.mae-1.96*Chr.PartE.maeSE; Chr.PartE.mae+1.96*Chr.PartE.maeSE
Chr.PartE.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.RELrmse-1.96*Chr.PartE.RELrmseSE; Chr.PartE.RELrmse+1.96*Chr.PartE.RELrmseSE
Chr.PartE.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred)
Chr.PartE.RELmae-1.96*Chr.PartE.RELmaeSE; Chr.PartE.RELmae+1.96*Chr.PartE.RELmaeSE
# Comparison of prediction methods.
wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.ISEmean),
abs(ISE_data$ISE[c(430:536)]-Chr.PartE.Pred), paired=TRUE)
wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.LMnoTime.Pred),
abs(ISE_data$ISE[c(430:536)]-Chr.PartE.Pred), paired=TRUE)
wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.LMwithTime.Pred),
abs(ISE_data$ISE[c(430:536)]-Chr.PartE.Pred), paired=TRUE)
# ---------------------------------------------------
# Validation set-up (ii). Five-fold cross-validation.
# ---------------------------------------------------
# Prediction method (iv). Robust linear regression
# -- Models
Fol.PartE=list()
for(i in c(1,3,5)){
Fol.PartE[[i]]=nlm(Rbeta, p=c(-0.5,-0.5,-0.5,-0.5,-0.5,-0.5,-0.5),
observed=trainfolds[[i]]$ISE, covariates=trainfolds[[i]][c(3:9)])
}
for(i in c(2,4)){
Fol.PartE[[i]]=nlm(Rbeta, p=c(-1,-1,-1,-1,-1,-1,-1), observed=trainfolds[[i]]$ISE,
covariates=trainfolds[[i]][c(3:9)])
}
# -- Predicted values
Fol.PartE.Pred=list()
for(i in 1:5){
Fol.PartE.Pred[[i]]=as.matrix(testfolds[[i]][c(3:9)])%*%matrix(Fol.PartE[[i]]$estimate)
}
# -- Error measures
# *** RMSE ***
Fol.PartE.rmse=list()
Task 1 Code
for(i in 1:5){
Fol.PartE.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.rmse=mean(as.numeric(Fol.PartE.rmse))
# Standard Error
Fol.PartE.rmseSE=list()
for(i in 1:5){
Fol.PartE.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.rmseSE=mean(as.numeric(Fol.PartE.rmseSE))
# Confidence Interval
Fol.PartE.rmse-1.96*Fol.PartE.rmseSE; Fol.PartE.rmse+1.96*Fol.PartE.rmseSE
# *** MAE ***
Fol.PartE.mae=list()
for(i in 1:5){
Fol.PartE.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.mae=mean(as.numeric(Fol.PartE.mae))
# Standard Error
Fol.PartE.maeSE=list()
for(i in 1:5){
Fol.PartE.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.maeSE=mean(as.numeric(Fol.PartE.maeSE))
# Confidence Interval
Fol.PartE.mae-1.96*Fol.PartE.maeSE; Fol.PartE.mae+1.96*Fol.PartE.maeSE
# *** Relative RMSE ***
Fol.PartE.RELrmse=list()
for(i in 1:5){
Fol.PartE.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.RELrmse=mean(as.numeric(Fol.PartE.RELrmse))
# Standard Error
Fol.PartE.RELrmseSE=list()
for(i in 1:5){
Fol.PartE.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.RELrmseSE=mean(as.numeric(Fol.PartE.RELrmseSE))
# Confidence Interval
Fol.PartE.RELrmse-1.96*Fol.PartE.RELrmseSE; Fol.PartE.RELrmse+1.96*Fol.PartE.RELrmseSE
# *** Relative MAE ***
Fol.PartE.RELmae=list()
for(i in 1:5){
Fol.PartE.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.RELmae=mean(as.numeric(Fol.PartE.RELmae))
# Standard Error
Fol.PartE.RELmaeSE=list()
for(i in 1:5){
Fol.PartE.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]])
}
Fol.PartE.RELmaeSE=mean(as.numeric(Fol.PartE.RELmaeSE))
# Confidence Interval
Fol.PartE.RELmae-1.96*Fol.PartE.RELmaeSE; Fol.PartE.RELmae+1.96*Fol.PartE.RELmaeSE
# Comparison of prediction methods.
# Vector of residuals for prediction method (iv).
Fol.PartE.resid=list()
for(i in 1:5){
Fol.PartE.resid[[i]]=testfolds[[i]]$ISE - Fol.PartE.Pred[[i]]
}
Fol.PartE.resid=unlist(Fol.PartE.resid)
# Test for comparison of prediction methods.
wilcox.test(abs(Fol.ISEmean.resid), abs(Fol.PartE.resid), paired=TRUE)
wilcox.test(abs(Fol.LMnoTime.resid), abs(Fol.PartE.resid), paired=TRUE)
wilcox.test(abs(Fol.LMwithTime.resid), abs(Fol.PartE.resid), paired=TRUE)
Task 1 Code
# Part (e)-(d). Robust linear regression with Part (d) validation setup.
#create a vector of errors for RMSE and MAE for 526 data splits
ISE.error5=vector(mode="numeric", length=526)
result.index=0
for(n in 10:535){
result.index=result.index+1
Sum.residuals=function(be,x,y){
res=be%*%t(x)
SAR=sum(abs(res-y))
return(SAR)
}
beta=nlm(Sum.residuals, p=c(10,10,10,10,10,10,10,10),
x=ISE_data.iii[(n-9):(n-1),-c(1,10)],
y=ISE_data.iii$ISE.predicted[(n-9):(n-1)], iterlim=300)$estimate
error5=ISE_data.iii$ISE.predicted[n]-beta%*%t(ISE_data.iii[n,2:9])
ISE.error5[result.index]=error5
}
#calculate RMSE
(RMSE5=sqrt(mean(ISE.error5^2)))
#calculate MAE
(MAE5=mean(abs(ISE.error5)))
#calculate standard error of RMSE
(SE.RMSE5=(sd(ISE.error5^2)/sqrt(526))/(2*sqrt(mean(ISE.error5^2))))
#calculate standard error of MAE
(SE.MAE5=sd(abs(ISE.error5))/sqrt(526))
#95% confidence interval for RMSE
RMSE5-1.96*SE.RMSE5; RMSE5+1.96*SE.RMSE5
#95% confidence interval for MAE
MAE5-1.96*SE.MAE5; MAE5+1.96*SE.MAE5
#create a vector of errors for relative RMSE and relative MAE in (i)
ISE.rerror5=ISE.error5/ISE_data[c(11:536),2]
#calculate relative RMSE
(rRMSE5=sqrt(mean(ISE.rerror5^2)))
#calculate relative MAE
(rMAE5=mean(abs(ISE.rerror5)))
#calculate standard error of relative RMSE
(SE.rRMSE5=(sd(ISE.rerror5^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror5^2))))
#calculate standard error of relative MAE
(SE.rMAE5=sd(abs(ISE.rerror5))/sqrt(526))
#95% confidence interval for relative RMSE
rRMSE5-1.96*SE.rRMSE5; RMSE5+1.96*SE.rRMSE5
#95% confidence interval for relative MAE
rMAE5-1.96*SE.rMAE5; rMAE5+1.96*SE.rMAE5
#wilcoxon tests to compare the mothod of part e with different 4 methods from part d
wilcox.test(abs(ISE.error5),abs(ISE.error1), paired=TRUE)
wilcox.test(abs(ISE.error5),abs(ISE.error2), paired=TRUE)
wilcox.test(abs(ISE.error5),abs(ISE.error3), paired=TRUE)
wilcox.test(abs(ISE.error5),abs(ISE.error4), paired=TRUE)
Task 2 Code
Appendix B: Task 2 SAS Code
libname cps "C:/Users/User/Documents/STAT7001/cps";
data cps.Rd;
input R d;
datalines;
0.00093 0.2588
0.00148 0.2053
0.0024 0.1628
0.0037 0.1291
0.0059 0.1024
0.0095 0.08118
0.0150 0.06438
0.024 0.05106
0.038 0.04049
0.048 0.03606
0.061 0.03211
0.096 0.02546
0.153 0.02019
0.24 0.01601
0.39 0.01270
0.98 0.00799
run;
PROC print;
run;
*TASK 2(a);
*1(a);
*setting the font size to 12pt;
goptions device=gif
hsize=4in vsize=3in
border
ftext="sasfont" htext=12pt;
proc univariate data=cps.Rd;
var R d;
histogram;
qqplot / normal(mu=est sigma=est);
run;
title;
title2 "Resistance versus diameter";
symbol1 value=plus colour=red;
axis1 label=("Diameter(cm)");
axis2 label=(angle=90 "Resistance(Ohm)");
proc gplot data=cps.Rd;
plot R*d /haxis=axis1 vaxis=axis2;
run;
proc reg data=cps.Rd;
model R=d;
run;
*TASK 2 (b);
data cps.Rd2;
set cps.Rd;
logR=log(R);
recd2=1/(d**2);
recd=1/d;
logd=log(d);
d2=d**2;
d3=d**3;
d4=d**4;
d5=d**5;
d6=d**6;
d7=d**7;
Task 2 Code
d8=d**8;
d9=d**9;
d10=d**10;
d11=d**11;
d12=d**12;
d13=d**13;
d14=d**14;
d15=d**15;
run;
PROC print;
run;
*suggested i;
PROC reg data=cps.Rd2;
model logR = logd;
run;
*TASK 2 (b) i;
proc reg data=cps.Rd2;
model R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15;
run;
*TASK 2 (b) ii;
PROC reg data=cps.Rd2;
model R = recd2 recd;
run;
*suggested ii;
PROC reg data=cps.Rd2;
model R = recd2;
run;
proc corr plots=(matrix);
with R logR;
var recd recd2 logd d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 ;
run;
*LEAVE-ONE-OUT CROSS VALIDATION
*Suggested i model logR = logd;
*Generate the cross validation data;
data cps.cv4;
do replicate = 1 to datasize;
do rec = 1 to datasize;
set cps.Rd2 nobs=datasize point=rec;
if rec ^= replicate then new_R=logR; else new_R=.;
output;
end;
end;
stop;
run;
proc print;
run;
*get predicted values for the missing new_R in each replicate;
proc reg data=cps.cv4;
model new_R=logd;
by replicate;
output out=out4a(where=(new_R=.)) predicted=R_hat;
run;
proc print;
run;
*and summarize the results;
data cps.out4b;
set out4a;
diff=logR-R_hat;
absd=abs(diff);
run;
title;
Task 2 Code
title2 "Residual Plot for Model logR = logd";
symbol1 value=plus colour=red;
axis1 label=("logR");
axis2 label=(angle=90 "Residual");
proc gplot data=cps.out4b;
plot diff*logR /haxis=axis1 vaxis=axis2;
run;
proc summary data=cps.out4b;
var diff absd;
output out=out4c
std(diff)=rmse mean(absd)=mae std(absd)=c;
run;
proc print;
run;
data out4d;
set cps.out4b;
diff2=diff**2;
mse=0.009292428**2;
a=(diff2-mse)**2;
run;
proc summary data=out4d;
var a;
output out=out4e
sum(a)=b ;
run;
data out3f;
set out3e;
seRMSE=((b**0.5)/16)/(2*0.009292428);
seMAE=.006464840/4;
run;
proc print;
run;
*2(b)i model: R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15;
*Generate the cross validation data;
data cps.cv2;
do replicate = 1 to datasize;
do rec = 1 to datasize;
set cps.Rd2 nobs=datasize point=rec;
if rec ^= replicate then new_R=R; else new_R=.;
output;
end;
end;
stop;
run;
proc print;
run;
*get predicted values for the missing new_R in each replicate;
proc reg data=cps.cv2;
model new_R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15;
by replicate;
output out=out2a(where=(new_R=.)) predicted=R_hat;
run;
*and summarize the results;
data cps.out2b;
set out2a;
diff=R-R_hat;
absd=abs(diff);
run;
title;
title2 "Residual Plot for Model R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14
d15";
symbol1 value=plus colour=red;
axis1 label=("R");
Task 2 Code
axis2 label=(angle=90 "Residual");
proc gplot data=out2b;
plot diff*R /haxis=axis1 vaxis=axis2;
run;
proc summary data=cps.out2b;
var diff absd;
output out=out2c
std(diff)=rmse mean(absd)=mae std(absd)=c;
run;
proc print;
run;
data out2d;
set cps.out2b;
diff2=diff**2;
mse=5358.40**2;
a=(diff2-mse)**2;
run;
proc summary data=out2d;
var a;
output out=out2e
sum(a)=b;
run;
data out2f;
set out2e;
seRMSE=((b**0.5)/16)/(2*5358.40);
seMAE=5357.98/4;
run;
proc print;
run;
*2(b)ii model R = recd2 recd;
*Generate the cross validation data;
data cps.cv3;
do replicate = 1 to datasize;
do rec = 1 to datasize;
set cps.Rd2 nobs=datasize point=rec;
if rec ^= replicate then new_R=R; else new_R=.;
output;
end;
end;
stop;
run;
proc print;
run;
*get predicted values for the missing new_R in each replicate;
proc reg data=cps.cv3;
model new_R=recd2 recd;
by replicate;
output out=out3a(where=(new_R=.)) predicted=R_hat;
run;
proc print;
run;
*and summarize the results;
data cps.out3b;
set out3a;
diff=R-R_hat;
absd=abs(diff);
run;
title;
title2 "Residual Plot for Model R = recd2 recd";
symbol1 value=plus colour=red;
axis1 label=("R");
axis2 label=(angle=90 "Residual");
proc gplot data=cps.out3b;
Task 2 Code
plot diff*R /haxis=axis1 vaxis=axis2;
run;
proc print;
run;
proc summary data=cps.out3b;
var diff absd;
output out=out3c
std(diff)=rmse mean(absd)=mae std(absd)=c;
run;
proc print;
run;
data out3d;
set cps.out3b;
diff2=diff**2;
mse=0.001920634**2;
a=(diff2-mse)**2;
run;
proc summary data=out3d;
var a;
output out=out3e
sum(a)=b ;
run;
data out3f;
set out3e;
seRMSE=((b**0.5)/16)/(2*.001920634);
seMAE=.001679167/4;
run;
proc print;
run;
*Suggested ii model R = recd2;
*Generate the cross validation data;
data cps.cv5;
do replicate = 1 to datasize;
do rec = 1 to datasize;
set cps.Rd2 nobs=datasize point=rec;
if rec ^= replicate then new_R=R; else new_R=.;
output;
end;
end;
stop;
run;
proc print;
run;
*get predicted values for the missing new_R in each replicate;
proc reg data=cps.cv5;
model new_R=recd2;
by replicate;
output out=out5a(where=(new_R=.)) predicted=R_hat;
run;
proc print;
run;
*and summarize the results;
data cps.out5b;
set out5a;
diff=R-R_hat;
absd=abs(diff);
run;
title;
title2 "Residual Plot for Model R = recd2";
symbol1 value=plus colour=red;
axis1 label=("R");
axis2 label=(angle=90 "Residual");
proc gplot data=cps.out5b;
Task 2 Code
plot diff*R /haxis=axis1 vaxis=axis2;
run;
proc print;
run;
proc summary data=cps.out5b;
var diff absd;
output out=out5c
std(diff)=rmse mean(absd)=mae std(absd)=c;
run;
proc print;
run;
data out5d;
set cps.out5b;
diff2=diff**2;
mse=0.001314149**2;
a=(diff2-mse)**2;
run;
proc summary data=out5d;
var a;
output out=out5e
sum(a)=b ;
run;
data out5f;
set out5e;
seRMSE=((b**0.5)/16)/(2*0.001314149);
seMAE=.001131049/4;
run;
proc print ;
run;
*producing a table containing absd from all 4 models to carry out paired
wilcoxon signed rank test;
PROC SQL;
SELECT A.absd, B.absd, C.absd, D.absd
FROM cps.out4b AS A, cps.out2b AS B, cps.out3b AS C, cps.out5b AS D
WHERE A.replicate=B.replicate=C.replicate=D.replicate;
data cps.absd;
input model1 model2 model3 model4;
datalines;
0.002509 21432.82 0.000178 0.000222
0.000547 8.835317 0.000144 0.000221
0.022544 3.601289 0.000052 0.000269
0.013488 0.548649 0.000112 0.000167
0.009526 0.158066 0.000069 0.000153
0.003606 0.060452 0.000078 0.000232
0.003878 0.043877 0.000043 0.000121
0.003001 0.016238 0.000236 0.000225
0.001622 0.024246 0.000159 0.000046
0.0004 0.023091 0.000266 0.000096
0.008668 0.009097 0.000809 0.00056
0.00284 0.031168 0.000013 0.000341
0.000346 0.043857 0.000134 0.000307
0.016335 0.003074 0.004498 0.004225
0.010083 0.072958 0.003586 0.002621
0.004023 0.278421 0.004755 0.000581
run;
data cps.diff;
set cps.absd;
AB=model1-model2;
AC=model1-model3;
AD=model1-model4;
BC=model2-model3;
BD=model2-model4;
CD=model3-model4;
Task 2 Code
run;
proc univariate data = cps.diff;
var AB AC AD BC BD CD ;
run;
Task 2 Code
Appendix C: References
1. Jeff Cartier. The Basics of Creating Graphs with SAS/GRAPH® Software. [online].
Available from: https://support.sas.com/rnd/datavisualization/papers/GraphBasics.pdf
[Accessed 24 February 2016]
2. Steven M. LaLonde. 2012. Transforming Variables for Normality and Linearity –
When, How, Why and Why Not's. [online]. Available from:
http://support.sas.com/resources/papers/proceedings12/430-2012.pdf [13 March 2016]
3. David L. Cassell. 2007. Don't Be Loopy: Re-Sampling and Simulation the SAS® Way.
[online]. Available from: http://www2.sas.com/proceedings/forum2007/183-2007.pdf
[14 March 2016]
4. Michael J. Wieczkowski. Alternatives to Merging SAS Data Sets … But Be Careful.
[online]. Available from: http://www.ats.ucla.edu/stat/sas/library/nesug99/bt150.pdf
[23 March 2016].

More Related Content

Similar to Report (istanbul stock exchange and resistance)

Assets price impact exchange rate and stock rate
Assets price impact exchange rate and stock rate Assets price impact exchange rate and stock rate
Assets price impact exchange rate and stock rate Rehman khan shama
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answersBookStoreLib
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answersFiras Husseini
 
Earned schedule - concept and technique
Earned schedule - concept and techniqueEarned schedule - concept and technique
Earned schedule - concept and techniqueSanjeevaniSathe
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorXuanhua(Peter) Yin
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time seriesLuigi Piva CQF
 
Fpe 90min-all
Fpe 90min-allFpe 90min-all
Fpe 90min-allwenchyan
 
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docxInstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docxdirkrplav
 
Playing with the Rubik cube: Principal Component Analysis Solving the Close E...
Playing with the Rubik cube: Principal Component Analysis Solving the Close E...Playing with the Rubik cube: Principal Component Analysis Solving the Close E...
Playing with the Rubik cube: Principal Component Analysis Solving the Close E...Ismael Torres-Pizarro, PhD, PE, Esq.
 
Home Work; Chapter 8; Forecasting Supply Chain Requirements
Home Work; Chapter 8; Forecasting Supply Chain RequirementsHome Work; Chapter 8; Forecasting Supply Chain Requirements
Home Work; Chapter 8; Forecasting Supply Chain RequirementsShaheen Sardar
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
 
Lesson 6 coefficient of determination
Lesson 6   coefficient of determinationLesson 6   coefficient of determination
Lesson 6 coefficient of determinationMehediHasan1023
 
Sheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docx
Sheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docxSheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docx
Sheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docxmaoanderton
 
FinalThesis_AnasRadouani
FinalThesis_AnasRadouaniFinalThesis_AnasRadouani
FinalThesis_AnasRadouaniAnas Raduani
 

Similar to Report (istanbul stock exchange and resistance) (20)

Assets price impact exchange rate and stock rate
Assets price impact exchange rate and stock rate Assets price impact exchange rate and stock rate
Assets price impact exchange rate and stock rate
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answers
 
161783709 chapter-04-answers
161783709 chapter-04-answers161783709 chapter-04-answers
161783709 chapter-04-answers
 
Earned schedule - concept and technique
Earned schedule - concept and techniqueEarned schedule - concept and technique
Earned schedule - concept and technique
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for Semiconductor
 
Looking for cooperation on working paper - Expenditure model
Looking for cooperation on working paper - Expenditure modelLooking for cooperation on working paper - Expenditure model
Looking for cooperation on working paper - Expenditure model
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 
Fpe 90min-all
Fpe 90min-allFpe 90min-all
Fpe 90min-all
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
 
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docxInstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
 
Occidental petroleum corp.
Occidental petroleum corp.Occidental petroleum corp.
Occidental petroleum corp.
 
Occidental petroleum corp.
Occidental petroleum corp.Occidental petroleum corp.
Occidental petroleum corp.
 
Playing with the Rubik cube: Principal Component Analysis Solving the Close E...
Playing with the Rubik cube: Principal Component Analysis Solving the Close E...Playing with the Rubik cube: Principal Component Analysis Solving the Close E...
Playing with the Rubik cube: Principal Component Analysis Solving the Close E...
 
Home Work; Chapter 8; Forecasting Supply Chain Requirements
Home Work; Chapter 8; Forecasting Supply Chain RequirementsHome Work; Chapter 8; Forecasting Supply Chain Requirements
Home Work; Chapter 8; Forecasting Supply Chain Requirements
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
Ijetcas14 608
Ijetcas14 608Ijetcas14 608
Ijetcas14 608
 
Lesson 6 coefficient of determination
Lesson 6   coefficient of determinationLesson 6   coefficient of determination
Lesson 6 coefficient of determination
 
Sheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docx
Sheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docxSheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docx
Sheet1Washer ThicknessKevinYuxuanBolt DimensionsKevinYuxuanSample .docx
 
FinalThesis_AnasRadouani
FinalThesis_AnasRadouaniFinalThesis_AnasRadouani
FinalThesis_AnasRadouani
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Report (istanbul stock exchange and resistance)

  • 1. STAT7001 Computing for Practical Statistics In-Course Assessment 2 TASK 1: PREDICTION OF THE ISTANBUL STOCK MARKET 2 TASK 2: THE RESISTANCE OF CONSTANTIN 7 APPENDIX A: TASK 1 R CODE 11 APPENDIX B: TASK 2 SAS CODE 26 APPENDIX C: REFERENCES 33
  • 2. Task 1: Prediction of the Istanbul Stock Market Task 1: Prediction of the Istanbul Stock Market Main Question The main task was to use different prediction strategies to predict the daily returns of the Istanbul Stock Exchange (ISE) index based on the data of ISE returns as well as the returns of 7 other stock indices; and compare the performance of these prediction methods by calculating error measures such as RMSE, MAE, and the relative variants of these. For the following report, we apply significance level 5% to all analyses. Summary For benchmarking experiments where ISE returns were predicted based on data from the same day, the models based on other stock indices were significantly better than taking the mean ISE return as a predictor; while the inclusion of time did not result in any significant changes in the goodness of prediction. For benchmarking experiments where predictions were made only based on previous data, the reverse was observed as predictors based only on prior ISE returns performed significantly better than models based on previous stock index returns, suggesting a non-linear relationship may exist between ISE returns and that of previous days. Exploratory Data Analysis Figures 1 to 8. Scatter plots of stock index returns (y-axis) against number of days since earliest record (x-axis), with respective correlation estimates and p-values. From the scatter plots in Figures 1 to 8, it can be seen that there is no apparent association between the returns of stock indices and time, as the location of the index returns do not appear to change with time. A correlation test was performed on each of the stock index returns and time, with the results indicating no apparent linear association between the variables at 95% confidence, as all p-values were greater than 0.05. Figure 1. ISE: cor=-0.0499, p-value=0.2485 Figure 2. S&P 500: cor=0.0245, p-value=0.5714 Figure 3. DAX: cor=0.0299, p-value=0.4891 Figure 4. FTSE 100: cor=0.0190, p-value=0.6615 Figure 5. Nikkei 225: cor=0.00533, p-value=0.9019 Figure 6. Ibovespa: cor=-0.0582, p-value=0.1786 Figure 7. MSCI EU: cor=0.0121, p-value=0.7803 Figure 8. MSCI EM: cor=-0.0538, p-value=0.2136
  • 3. Task 1: Prediction of the Istanbul Stock Market Figures 9 to 16. Scatter plots of stock index returns for day N-1, N-2 and N-3 (y-axis) respectively (left to right) against stock index returns for day N (x-axis), with respective correlation estimates and p-values. The scatter plots in Figures 9 to 16 shows that there is generally no patterns in the stock indices returns against its returns one, two, and three days before. A correlation test has been carried out to confirm this and it suggests that there is no correlation for the all scatter plots except the first one in Figure 16, which shows that there is slightly positive association (cor=0.149) between MSCI EM returns and its returns one day earlier (p-value=0.0005403<0.05). In the light of the above, it may be reasonable to suggest that stock indices returns are not linearly related with the recent past few days in general. cor = 0.0188 p-value = 0.6651 cor = -0.0124 p-value = 0.7745 cor = -0.0337 p-value = 0.4369 Figure 9. ISE cor = -0.0608 p-value = 0.1605 cor = -0.0300 p-value = 0.4888 cor = -0.00845 p-value = 0.8456 Figure 10. S&P 500 cor = 0.00132 p-value = 0.9758 cor = -0.0262 p-value = 0.5453 cor = -0.0172 p-value = 0.6919 Figure 11. DAX cor = -0.00739 p-value = 0.8646 cor = -0.0276 p-value = 0.5248 cor = -0.0218 p-value = 0.6149 Figure 12. FTSE 100 cor = -0.0782 p-value = 0.07085 cor = 0.0261 p-value = 0.5479 cor = 0.000953 p-value = 0.9825 Figure 13. Nikkei 225 cor = -0.0485 p-value = 0.2626 cor = -0.0140 p-value = 0.7463 cor = -0.0457 p-value = 0.2921 Figure 14. Ibovespa cor = 0.00995 p-value = 0.8184 cor = -0.0420 p-value = 0.3323 cor = 0.00254 p-value = 0.9533 Figure 15. MSCI EU cor = 0.149 p-value = 0.0005403 cor = -0.0141 p-value = 0.7449 cor = 0.0489 p-value = 0.2599 Figure 16. MSCI EM
  • 4. Task 1: Prediction of the Istanbul Stock Market Results and Interpretation of Prediction from Same Day Indices (Part C) Table 1. Table of error measures and their confidence intervals for different prediction methods, under respective validation set-ups. Validation Set-up Prediction Methods Linear Model w/o Time Linear Model w/ Time Robust Linear Regression Chronological 80-20 Split Mean V = 3695 p-value = 0.0123 V = 3688 p-value = 0.01307 V = 3612 p-value = 0.02473 Linear Model w/o Time V = 3031 p-value = 0.6601 V = 3135 p-value = 0.4455 Linear Model w/ Time V = 3163 p-value = 0.3953 5-Fold Cross-Validation Mean V = 96316 p-value = 1.121e-11 V = 96500 p-value = 7.849e-12 V = 96397 p-value = 9.587e-12 Linear Model w/o Time V = 72898 p-value = 0.7934 V = 68503 p-value = 0.3356 Linear Model w/ Time V = 68297 p-value = 0.3075 Table 2. Table of results for paired Wilcoxon signed-rank tests on absolute residuals of different prediction methods. The similar error measures for the linear models with and without time, as shown in Table 1, indicate that both models seem to perform as well as each other, for both the chronological and 5-fold cross-validation set-ups. This is confirmed by the paired Wilcoxon signed-rank test results in Table 2, with p-values of 0.6601 and 0.7934 (for chronological and 5-fold respectively) indicating that there is no significant difference in absolute residuals for the two models. These results agree with the preliminary conclusions derived from the exploratory findings. Since it was found that there is no apparent association between the stock index returns and time, thus it should follow that the linear models with or without time should perform just as well as each other in predicting ISE returns, as the addition of the time variable does not provide significant information. Additionally, a robust linear regression model (RLR) was created, predicting the ISE return based on the returns of other stock indices for the same day, without time. This also produced similar error measures to the ordinary least squares regression models, and paired Wilcoxon signed-rank test results similarly indicated no significant difference in absolute residuals (p-values of 0.4455 and 0.3953 for chronological; p-values of 0.3356 and 0.3075 for 5-fold). However, these 3 models have RMSE and MAE values which are considerably smaller than that of the prediction using mean ISE return for the training data set, as seen in Table 1. This is confirmed by the paired Validation Set-up Prediction Method RMSE MAE Relative RMSE Relative MAE Chronological 80-20 Split Mean 0.0131 (0.0112, 0.0149) 0.0100 (0.0084, 0.0116) 1.68 (1.13, 2.23) 1.22 (1.00, 1.44) Linear Model w/o Time 0.0108 (0.0094, 0.0122) 0.00855 (0.00729, 0.00980) 3.35 (1.63, 5.07) 1.61 (1.05, 2.17) Linear Model w/ Time 0.0107 (0.0093, 0.0121) 0.00852 (0.00729, 0.00975) 3.06 (1.63, 4.49) 1.56 (1.06, 2.06) Robust Linear Regression 0.0105 (0.0091, 0.0119) 0.00845 (0.00726, 0.00964) 3.04 (1.33, 4.74) 1.53 (1.03, 2.03) 5-Fold Cross-Validation Mean 0.0162 (0.0134, 0.0191) 0.0121 (0.0100, 0.0141) 1.49 (1.03, 1.95) 1.14 (0.96, 1.32) Linear Model w/o Time 0.0120 (0.0100, 0.0140) 0.00920 (0.00773, 0.01067) 7.43 (1.30, 13.55) 2.11 (0.76, 3.46) Linear Model w/ Time 0.0120 (0.0100, 0.0140) 0.00920 (0.00773, 0.01066) 7.46 (1.30, 13.62) 2.11 (0.75, 3.47) Robust Linear Regression 0.0121 (0.0101, 0.0142) 0.00928 (0.00780, 0.01076) 6.85 (1.36, 12.34) 2.02 (0.78, 3.26)
  • 5. Task 1: Prediction of the Istanbul Stock Market Wilcoxon signed-rank tests. For the chronological set-up, the p-values of 0.0123, 0.01307, and 0.02473 (for tests of mean vs. LM w/o time, LM w/ time and RLR respectively) suggest some evidence of a difference in the absolute residuals of the models. For the 5-fold set-up, the p-values of 1.121e-11, 7.849e-12 and 9.587e- 12 respectively, suggest strong evidence of a significant difference in the absolute residuals from the models. This allows us to conclude with 95% confidence that the mean ISE return is a worse prediction method than any of the other 3 models, under both validation set-ups. Results and Interpretation of Prediction from Previous Day Indices (Part D) Validation Set-up Prediction Method RMSE MAE Relative RMSE Relative MAE 11 Consecutive Days Most Recent ISE Return 0.0221 (0.0203, 0.0240) 0.0170 (0.0158, 0.0182) 16.31 (6.83, 25.78) 4.23 (2.88, 5.57) Mean ISE Return of Recent 5 Days 0.0173 (0.0159, 0.0187) 0.0130 (0.0120, 0.0140) 4.50 (3.11, 5.89) 1.97 (1.63, 2.32) LM - Most Recent Day 0.724 (0.319, 1.129) 0.169 (0.110, 0.230) 221.3 (91.4, 351.2) 43.1 (24.5, 61.7) LM - Most Recent 2 Days 2.59 (0.31, 4.86) 0.274 (0.054, 0.494) 280.0 (134.7, 425.3) 49.1 (25.5, 72.6) Robust Linear Regression 0.0634 (0.0397, 0.0870) 0.0344 (0.0298, 0.0389) 25.89 (17.98, 7.98) 8.79 (6.70, 10.87) Table 3. Table of error measures and their confidence intervals for different prediction methods. Validation Set-up Prediction Method Mean ISE Return of Recent 5 Days LM - Most Recent Day LM - Most Recent 2 Days Robust Linear Regression 11 Consecutive Days Most Recent ISE Return V = 95496 p-value = 5.857e-14 V = 15103 p-value < 2.2e-16 V = 12427 p-value < 2.2e-16 V = 100020 p-value < 2.2e-16 Mean ISE Return of Recent 5 Days V = 10271 p-value < 2.2e-16 V = 7468 p-value < 2.2e-16 V = 111260 p-value < 2.2e-16 LM - Most Recent Day V = 65944 p-value = 0.3359 V = 25768 p-value < 2.2e-16 LM - Most Recent 2 Days V = 31006 p-value < 2.2e-16 Table 4. Table of results for paired Wilcoxon signed-rank tests on absolute residuals of different prediction methods. The error measures from the LM of stock index returns from the most recent day are all smaller than that of the LM from the most recent 2 days. However, the large standard error of these error measures suggest that this difference might not be significant; and this is confirmed by the paired Wilcoxon signed-rank test, with a p-value of 0.3359 indicating no significant difference in absolute residuals from both these models. A robust linear regression model was also created, predicting the ISE return based on the returns of stock indices from the most recent day. These are the same covariates as in the LM for most recent day; however, the method of obtaining the coefficients for each covariate is different and more robust, so the custom regression shows lower error values for all measures of prediction goodness. This is confirmed by the paired Wilcoxon signed-rank test, with p-value < 2.2e-16 indicating a significant difference in the absolute residuals from the two models. However, the prediction method of using mean ISE return of recent 5 days shows the lowest value for each error measure. Furthermore, the upper bound of the 95% CI of all 4 of its error measures are lower than the lower bound of the 95% CI of the error measures from all other models. The p-values from the paired Wilcoxon signed-rank tests (5.857e-14, <2.2e-16, <2.2e-16, <2.2e-16) of this method against all other methods also provide support that there are significant differences in the absolute residuals obtained from the model. Thus, there is a strong evidence to suggest that the prediction method of using the mean ISE return of recent 5 days is the best method among the five different prediction methods used in this benchmarking experiment.
  • 6. Task 1: Prediction of the Istanbul Stock Market It should be noted that in the initial exploratory analysis, it was concluded that there seems to be no linear association between ISE stock index returns and its returns one, two, and three days before (Figure 9). However, the benchmark experiment in Part D suggests that the prediction method based on the recent 5 days appears to be the best prediction method, which is contradictory to the results of the exploratory analysis. This may suggest that there might be non-linear associations that exist between ISE stock index returns and its returns on the days before, thus allowing prediction to be made based on previous ISE returns. Alternatively, this may have happened due to poorly designed prediction methods, with the mean ISE return of the recent 5 days performing relatively better than the rest. Conclusion To assess the several prediction methods for the Istanbul stock market, two different benchmarking experiments have been performed in this task. One is predicted from other indices on the same day and the other one is predicted from all indices including ISE itself from previous recent days. The first benchmarking experiment showed that the prediction methods with least squares regression based on other data obtained from the same day, generally performed better than ones using the mean ISE return as a predictor, in terms of error measures such as RMSE and MAE. This was confirmed by the paired Wilcoxon signed-rank tests on the absolute residuals from the different prediction methods. Additionally, the inclusion of time as an additional covariate did not result in any significant changes in the goodness of prediction of the models, concurring with the results of the exploratory analysis. On the other hand, in the second benchmarking experiment, there was sufficient evidence to support the opposite situation to the first benchmarking experiment, where prediction models based only on prior ISE returns performed significantly better than models based on previous returns on all stock indices. This contradicts the results of the exploratory analysis, where there was no significant linear association found between ISE returns and its returns from days before, thus suggesting a possible non-linear relationship not identified by the correlation test. Comparing the benchmarking experiments in Part C and D, it can be seen that the error measures calculated in Part C tend to be smaller than those in Part D, in general. This might suggest that prediction models based on data from the same day are better at predicting the ISE returns than prediction models based only on recent previous data. This might indicate that data from the same day provides better information, or has closer associations to, the ISE returns.
  • 7. Task 2: The Resistance of Constantin Task 2: The Resistance of Constantin Main Question The data from the 8th edition of the “Rubber Bible”, contains 16 data points of resistance of Constantin wire at different diameters. The main task was to fit different regression models to explain resistance in terms of diameter and investigate the goodness of fit of these models by obtaining estimates of error measures such as RMSE and MAE. The significance level of 5% was applied to all analyses for the following report. Summary Based on the investigation, regression models that involved logarithmic or reciprocal transformations generally had higher goodness of fit. It was found that the regression model of 1/d2 best explained the relationship between resistance and diameter, producing the simplest model with high goodness of fit and the smallest residuals. The model of 1/d2 + 1/d produced a similarly performing model, but the covariate 1/d was found to be insignificant; while the log-transformed model produced residuals that were larger than the 1/d2 model. Meanwhile, fitting resistance to a polynomial of degree 15 in diameter produced an over-fitted rank-deficient model. Explorative Analysis Figure 1 (a) and (b): Histograms of both variables R and d. Both resistance and diameter variables have positive values below 1. From the histograms, both can be deduced to be positively skewed, justified with the positive skewness, 2.9870 and 1.3166 respectively. To shrink the larger values more than the smaller ones, power or log transformation can be used to result in a distribution that is more symmetric. However, as all the values are between 0 and 1, power transformation might not achieve the desired effect. From the scatter plot (Figure 2), we can see that as diameter increases, the resistance decreases. There is clearly a decreasing non-linear relationship between the two variables. Suggested transformations on the data could include logarithmic or reciprocal transformations to straighten out the bivariate non-linear relationship. Figure 2: Scatterplot of Resistance of Constantin wire and its diameter.
  • 8. Task 2: The Resistance of Constantin Regression Models There are 4 suggested regression models fitted. Namely,  model 1, log(R)=log(d);  model 2, R=d+d2 +d3 + d4 + d5 +d6 + d7 +d8 + d9 +d10 + d11 +d12 + d13 +d14 +d15 ;  model 3, and  model 4, . For model 1, the fit plot is negatively linear, as the logarithmic transformation of both variables gives negative values. Log(d) is a significant parameter as its p-value is lower than 0.05. The R-square of 1.00 indicates the model is a good fit. Meanwhile, model 2 is a rank-deficient least squares model. As such, the least-squares solutions for the parameters are not unique, producing biased estimates with some misleading statistics. Parameter Estimates R-Square Variable Parameter Estimate Standard Error t Value Pr > |t| 0.9944 Intercept 3.11362 0.21165 14.71 <.0001 d -417.05833 43.70357 -9.54 <.0001 d2 23289 3277.49356 7.11 0.0004 d3 -685537 119274 -5.75 0.0012 d4 11541180 2347752 4.92 0.0027 d5 -113141376 25879463 -4.37 0.0047 d6 617364760 154399878 4.00 0.0071 d7 -1539663710 412457801 -3.73 0.0097 d8 0 . . . d9 5151630410 1517635765 3.39 0.0146 d10 0 . . . d11 0 . . . d12 0 . . . d13 0 . . . d14 -4.28733E11 1.406149E11 -3.05 0.0225 d15 0 . . . Table 2: Parameter Estimate for Model 2. Table 2 shows that the model of resistance being a polynomial of degree 15 in diameter might be over fitted. Its R-square value of 0.9944 is the only high because every time you add a predictor to a model, the R- squared increases, supporting the argument that this model is not a good fit. Table 3: Parameter Estimate for Model 3. Model 3 is a rather good fit as it has R-square of 1. However, the p-value of 0.3798 for the parameter 1/d suggests that the parameter is not significant in the model. Based on this, model 4 is fitted with only 1/d2 , as a simpler model is usually preferred in general. Parameter Estimates R-Square Variable Parameter Estimate Standard Error t Value Pr > |t| 1.0000 Intercept -9.68163 0.00659 -1469.6 <.0001 Log(d) -1.99987 0.00208 -963.01 <.0001 Table 1: Parameter Estimate for Model 1. Parameter Estimates R-Square Variable Parameter Estimate Standard Error t Value Pr > |t| 1.0000 Intercept 0.00024745 0.00061146 0.40 0.6923 1/d2 0.00006279 2.538051E-7 247.39 <.0001 1/d -0.00002805 0.00003085 -0.91 0.3798
  • 9. Task 2: The Resistance of Constantin Table 4: Parameter Estimate for Model 4. In model 4, the transformed variable 1/d2 is a significant parameter, with p-value of less than 0.0001. The fit plot is positively linear with an R-square of value 1. Based on the analysis so far, models 1 and 4 give the best fit compared to the other 2 models. These results agree with the suggested potential models derived from the exploratory findings. Further analysis has to be carried out to establish a conclusion. Cross Validation Leave-one-out cross validation was carried out to test a goodness of fit of the models. The models are compared below. Figure 3: Residual plots for the 4 models fitted. First, we consider the residual plots of all 4 models. As shown, model 1, 3 and 4 are relatively better fitted than model 2. Model 2 has one extreme residual value of over 20000, which causes the large scale of the residual plot. Moreover, all its residual values are relatively larger compared to other 3 models. Meanwhile, the residuals in model 1 are randomly scattered, but have values that are larger than that of model 3 and 4. Parameter Estimates R-Square Variable Parameter Estimate Standard Error t Value Pr > |t| 1.0000 Intercept -0.00020809 0.00034825 -0.60 0.5597 1/d2 0.00006257 7.917185E-8 790.31 <.0001
  • 10. Task 2: The Resistance of Constantin The residual plots for Model 3 and 4 have the smallest scales among all plots. This suggests that model 3 and 4 have the smallest residuals, and thus could be better as regression models. Then, we look at the by estimates for out-of-sample RMSE and MAE obtained. Table 5: Table of error measures and their confidence intervals for different regression models. As expressed in the table above, the second suggested method (R=1/d2 ), which transforms diameter to the power of -2, shows the lowest value for both measures of prediction goodness. However, there is in overlap in the 95% confidence intervals of the error measures for R=1/d2 and R=1/d2 +1/d. We cannot conclude that model 4 is the best model yet. Hence, paired Wilcoxon signed-rank tests have been done between all 4 different models to confirm the results above. Models Log(R)=log(d) R=d+d2 +d3 +d4 +d5 +d6 +d7 +d8 +d9+d10 +d11 +d 12 +d13 +d14 +d15 R=1/d2 +1/d R=1/d2 Log(R)=log(d) S=65 p-value=0.0002 S=64 p-value=0.0002 S=68 p-value=<.0001 R=d+d2 +d3 +d4 +d5 +d6 + d7 +d8 +d9+d10 +d11 +d12 + d13 +d14 +d15 S=67 p-value=<.0001 S=67 p-value=<.0001 R=1/d2 +1/d S=5 p-value=0.8209 Table 6: Results of paired Wilcoxon signed-rank tests on absolute residuals. The tests support the deduction that the performances of the models are significantly different from one another, as most p-values are less than 0.05, indicating significant difference in the absolute residuals from each regression model. The exception to this is the 4th (R=1/d2 ) and 3rd models (R=1/d2 +1/d), with p- value=0.8209 suggesting no significant difference in model performances. Generally, models with less covariates is better when 2 models are similar. Thus, in this case, the 4th model is the best model in explaining resistance in terms of diameter. Conclusion Based on the investigation, Model 2 (R=d+d2 +d3 +d4 +d5 +d6 +d7 +d8 +d9 +d10 +d11 +d12 +d13 +d14 +d15 ) is an extreme example of fitting an overly complicated model to get a good fit. The model is too complex for the data even though it appears to explain a lot of variation in the response variable. Model 1 (Log(R)=log(d)) is relatively good but it does not have the lowest RMSE and MAE, suggesting that the residual it produces is relatively large. Meanwhile, Model 3 (R=1/d2 +1/d) has one insignificant covariate and leads to the second suggested model. In conclusion, Model 4 (R=1/d2 ) is the best model in explaining the resistance of Constantin wire in terms of varying diameter, producing the simplest model with high goodness of fit and the smallest residuals, as evidenced by the high R-squared value and low RMSE and MAE error measures. The model can be interpreted as when diameter of Constantin wire decreases, the square of diameter decreases, the reciprocal of the square of diameter increase, the resistance increases. Model RMSE MAE Suggested (i) Log(R)=log(d) 0.0093 (0.0092, 0.0094) 0.0065 (0.0049, 0.0081) Part (b) (i) R=d+d2 +d3 +d4 +d5 +d6 +d7 +d8 +d9+d10 +d11 +d12 +d13 +d14 +d15 5358.40 (2764.46, 7952.34) 1340.41 (0.91, 2679.91) Part (b) (ii) R=1/d2 +1/d 0.0019 (0.0014, 0.0024) 0.0009 (0.0005, 0.0013) Suggested (ii) R=1/d2 0.0013 (0.0009, 0.0017) 0.0006 (0.0003, 0.0009)
  • 11. Task 1 Code Appendix A: Task 1 R Code # Part (a). Data load and conversation of day column. # Load data from CSV file. ISE_data=read.csv(file="C:/Documents/STAT7001/Istanbul.csv", header=TRUE, sep=",") # Convert the date column into a recognisable date format in R. ISE_data$date=as.POSIXct(ISE_data$date, format="%d-%b-%Y") # Find the difference in numbers of days, and round off any decimals. ISE_data$date<-difftime(ISE_data$date,ISE_data$date[1], units="days") ISE_data$date<-round(ISE_data$date,digits=0) ISE_data$date=as.numeric(as.character(ISE_data$date))
  • 12. Task 1 Code # Part (b). Exploratory data analysis. # Association between index and time. plot(ISE_data[,1], ISE_data[,2], xlab="Days", ylab="ISE", abline(lm(ISE~date, ISE_data))) plot(ISE_data[,1], ISE_data[,3], xlab="Days", ylab="S&P 500", abline(lm(S.P.500~date, ISE_data))) plot(ISE_data[,1], ISE_data[,4], xlab="Days", ylab="DAX", abline(lm(DAX~date, ISE_data))) plot(ISE_data[,1], ISE_data[,5], xlab="Days", ylab="FTSE 100", abline(lm(FTSE~date, ISE_data))) plot(ISE_data[,1], ISE_data[,6], xlab="Days", ylab="Nikkei 225", abline(lm(NIKKEI~date, ISE_data))) plot(ISE_data[,1], ISE_data[,7], xlab="Days", ylab="Ibovespa", abline(lm(BOVESPA~date, ISE_data))) plot(ISE_data[,1], ISE_data[,8], xlab="Days", ylab="MSCI EU Index", abline(lm(MSCI.EU~date, ISE_data))) plot(ISE_data[,1], ISE_data[,9], xlab="Days", ylab="MSCI EM Index", abline(lm(MSCI.EM~date, ISE_data))) cor.test(ISE_data[,1], ISE_data[,2]) cor.test(ISE_data[,1], ISE_data[,3]) cor.test(ISE_data[,1], ISE_data[,4]) cor.test(ISE_data[,1], ISE_data[,5]) cor.test(ISE_data[,1], ISE_data[,6]) cor.test(ISE_data[,1], ISE_data[,7]) cor.test(ISE_data[,1], ISE_data[,8]) cor.test(ISE_data[,1], ISE_data[,9]) # Association between ISE index and index the days before. plot(ISE_data[c(2:536),2], ISE_data[c(1:535),2], xlab="ISE, Day N", ylab="ISE, Day N-1") plot(ISE_data[c(3:536),2], ISE_data[c(1:534),2], xlab="ISE, Day N", ylab="ISE, Day N-2") plot(ISE_data[c(4:536),2], ISE_data[c(1:533),2], xlab="ISE, Day N", ylab="ISE, Day N-3") cor.test(ISE_data[c(2:536),2], ISE_data[c(1:535),2]) cor.test(ISE_data[c(3:536),2], ISE_data[c(1:534),2]) cor.test(ISE_data[c(4:536),2], ISE_data[c(1:533),2]) # Association between S&P 500 index and index the days before. plot(ISE_data[c(2:536),3], ISE_data[c(1:535),3], xlab="S&P 500, Day N", ylab="S&P 500, Day N-1") plot(ISE_data[c(3:536),3], ISE_data[c(1:534),3], xlab="S&P 500, Day N", ylab="S&P 500, Day N-2") plot(ISE_data[c(4:536),3], ISE_data[c(1:533),3], xlab="S&P 500, Day N", ylab="S&P 500, Day N-3") cor.test(ISE_data[c(2:536),3], ISE_data[c(1:535),3]) cor.test(ISE_data[c(3:536),3], ISE_data[c(1:534),3]) cor.test(ISE_data[c(4:536),3], ISE_data[c(1:533),3]) # Association between DAX index and index the days before. plot(ISE_data[c(2:536),4], ISE_data[c(1:535),4], xlab="DAX, Day N", ylab="DAX, Day N-1") plot(ISE_data[c(3:536),4], ISE_data[c(1:534),4], xlab="DAX, Day N", ylab="DAX, Day N-2") plot(ISE_data[c(4:536),4], ISE_data[c(1:533),4], xlab="DAX, Day N", ylab="DAX, Day N-3") cor.test(ISE_data[c(2:536),4], ISE_data[c(1:535),4]) cor.test(ISE_data[c(3:536),4], ISE_data[c(1:534),4]) cor.test(ISE_data[c(4:536),4], ISE_data[c(1:533),4]) # Association between FTSE 100 index and index the days before. plot(ISE_data[c(2:536),5], ISE_data[c(1:535),5], xlab="FTSE 100, Day N", ylab="FTSE 100, Day N-1") plot(ISE_data[c(3:536),5], ISE_data[c(1:534),5], xlab="FTSE 100, Day N", ylab="FTSE 100, Day N-2") plot(ISE_data[c(4:536),5], ISE_data[c(1:533),5], xlab="FTSE 100, Day N", ylab="FTSE 100, Day N-3") cor.test(ISE_data[c(2:536),5], ISE_data[c(1:535),5]) cor.test(ISE_data[c(3:536),5], ISE_data[c(1:534),5]) cor.test(ISE_data[c(4:536),5], ISE_data[c(1:533),5])
  • 13. Task 1 Code # Association between Nikkei 225 index and index the days before. plot(ISE_data[c(2:536),6], ISE_data[c(1:535),6], xlab="Nikkei 225, Day N", ylab="Nikkei 225, Day N-1") plot(ISE_data[c(3:536),6], ISE_data[c(1:534),6], xlab="Nikkei 225, Day N", ylab="Nikkei 225, Day N-2") plot(ISE_data[c(4:536),6], ISE_data[c(1:533),6], xlab="Nikkei 225, Day N", ylab="Nikkei 225, Day N-3") cor.test(ISE_data[c(2:536),6], ISE_data[c(1:535),6]) cor.test(ISE_data[c(3:536),6], ISE_data[c(1:534),6]) cor.test(ISE_data[c(4:536),6], ISE_data[c(1:533),6]) # Association between Ibovespa index and index the days before. plot(ISE_data[c(2:536),7], ISE_data[c(1:535),7], xlab="Ibovespa, Day N", ylab="Ibovespa, Day N-1") plot(ISE_data[c(3:536),7], ISE_data[c(1:534),7], xlab="Ibovespa, Day N", ylab="Ibovespa, Day N-2") plot(ISE_data[c(4:536),7], ISE_data[c(1:533),7], xlab="Ibovespa, Day N", ylab="Ibovespa, Day N-3") cor.test(ISE_data[c(2:536),7], ISE_data[c(1:535),7]) cor.test(ISE_data[c(3:536),7], ISE_data[c(1:534),7]) cor.test(ISE_data[c(4:536),7], ISE_data[c(1:533),7]) # Association between MSCI EU index and index the days before. plot(ISE_data[c(2:536),8], ISE_data[c(1:535),8], xlab="MSCI EU, Day N", ylab="MSCI EU, Day N-1") plot(ISE_data[c(3:536),8], ISE_data[c(1:534),8], xlab="MSCI EU, Day N", ylab="MSCI EU, Day N-2") plot(ISE_data[c(4:536),8], ISE_data[c(1:533),8], xlab="MSCI EU, Day N", ylab="MSCI EU, Day N-3") cor.test(ISE_data[c(2:536),8], ISE_data[c(1:535),8]) cor.test(ISE_data[c(3:536),8], ISE_data[c(1:534),8]) cor.test(ISE_data[c(4:536),8], ISE_data[c(1:533),8]) # Association between MSCI EM index and index the days before. plot(ISE_data[c(2:536),9], ISE_data[c(1:535),9], xlab="MSCI EM, Day N", ylab="MSCI EM, Day N-1") plot(ISE_data[c(3:536),9], ISE_data[c(1:534),9], xlab="MSCI EM, Day N", ylab="MSCI EM, Day N-2") plot(ISE_data[c(4:536),9], ISE_data[c(1:533),9], xlab="MSCI EM, Day N", ylab="MSCI EM, Day N-3") cor.test(ISE_data[c(2:536),9], ISE_data[c(1:535),9]) cor.test(ISE_data[c(3:536),9], ISE_data[c(1:534),9]) cor.test(ISE_data[c(4:536),9], ISE_data[c(1:533),9])
  • 14. Task 1 Code # Part (c). Benchmarking with all data. # ---------------------------------------------------------------------------- # Creating functions for measures of prediction goodness and their std errors. # ---------------------------------------------------------------------------- # (i) Root mean squared error (RMSE) rmse=function(observed, fitted){ sqrt(mean((observed-fitted)^2)) } rmseSE=function(observed, fitted){ sd((observed-fitted)^2)/sqrt(length(observed))/(2*sqrt(mean((observed-fitted)^2))) } # (ii) Mean absolute error (MAE) mae=function(observed, fitted){ mean(abs(observed-fitted)) } maeSE=function(observed, fitted){ sd(abs(observed-fitted))/sqrt(length(observed)) } # (iii) Relative RMSE RELrmse=function(observed, fitted){ sqrt(mean(((observed-fitted)/observed)^2)) } RELrmseSE=function(observed, fitted){ sd(((observed-fitted)/observed)^2)/sqrt(length(observed))/ (2*sqrt(mean(((observed-fitted)/observed)^2))) } # (iv) Relative MAE RELmae=function(observed, fitted){ mean(abs((observed-fitted)/observed)) } RELmaeSE=function(observed, fitted){ sd(abs((observed-fitted)/observed))/sqrt(length(observed)) } # --------------------------------------------------------------------------------- # Comparison of prediction methods, using validation set-up (i). # i.e. Chronologically first 80% of data (428.8 or 429 entries) as training sample; # remaining data as test sample. # --------------------------------------------------------------------------------- # Prediction method (i): Mean # -- Predictor Chr.ISEmean=mean(ISE_data$ISE[c(1:429)]) # -- Predicted values Chr.ISEmean # -- Error measures Chr.mean.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.rmse-1.96*Chr.mean.rmseSE; Chr.mean.rmse+1.96*Chr.mean.rmseSE Chr.mean.mae = mae(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.mae-1.96*Chr.mean.maeSE; Chr.mean.mae+1.96*Chr.mean.maeSE Chr.mean.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.RELrmse-1.96*Chr.mean.RELrmseSE; Chr.mean.RELrmse+1.96*Chr.mean.RELrmseSE Chr.mean.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.ISEmean) Chr.mean.RELmae-1.96*Chr.mean.RELmaeSE; Chr.mean.RELmae+1.96*Chr.mean.RELmaeSE
  • 15. Task 1 Code # Prediction method (ii): Linear model excluding time. # -- Model Chr.LMnoTime=lm(ISE ~ S.P.500 + DAX + FTSE + NIKKEI + BOVESPA + MSCI.EU + MSCI.EM, data=ISE_data[c(1:429),]) summary(Chr.LMnoTime) # -- Predicted values Chr.LMnoTime.Pred=predict(Chr.LMnoTime, ISE_data[c(430:536),]) # -- Error measures Chr.LMnoTime.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.rmse-1.96*Chr.LMnoTime.rmseSE; Chr.LMnoTime.rmse+1.96*Chr.LMnoTime.rmseSE Chr.LMnoTime.mae = mae(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.mae-1.96*Chr.LMnoTime.maeSE; Chr.LMnoTime.mae+1.96*Chr.LMnoTime.maeSE Chr.LMnoTime.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.RELrmse-1.96*Chr.LMnoTime.RELrmseSE; Chr.LMnoTime.RELrmse+1.96*Chr.LMnoTime.RELrmseSE Chr.LMnoTime.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.LMnoTime.Pred) Chr.LMnoTime.RELmae-1.96*Chr.LMnoTime.RELmaeSE; Chr.LMnoTime.RELmae+1.96*Chr.LMnoTime.RELmaeSE # Prediction method (iii): Linear model including time. # -- Model Chr.LMwithTime=lm(ISE ~ date+S.P.500+DAX+FTSE+NIKKEI+BOVESPA+MSCI.EU+MSCI.EM, data=ISE_data[c(1:429),]) summary(Chr.LMwithTime) # -- Predicted values Chr.LMwithTime.Pred=predict(Chr.LMwithTime, ISE_data[c(430:536),]) # -- Error measures Chr.LMwithTime.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.rmse-1.96*Chr.LMwithTime.rmseSE; Chr.LMwithTime.rmse+1.96*Chr.LMwithTime.rmseSE Chr.LMwithTime.mae = mae(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.mae-1.96*Chr.LMwithTime.maeSE; Chr.LMwithTime.mae+1.96*Chr.LMwithTime.maeSE Chr.LMwithTime.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.RELrmse-1.96*Chr.LMwithTime.RELrmseSE; Chr.LMwithTime.RELrmse+1.96*Chr.LMwithTime.RELrmseSE Chr.LMwithTime.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.LMwithTime.Pred) Chr.LMwithTime.RELmae-1.96*Chr.LMwithTime.RELmaeSE; Chr.LMwithTime.RELmae+1.96*Chr.LMwithTime.RELmaeSE # Comparison of prediction methods. wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.ISEmean), abs(ISE_data$ISE[c(430:536)]-Chr.LMnoTime.Pred), paired=TRUE) wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.ISEmean), abs(ISE_data$ISE[c(430:536)]-Chr.LMwithTime.Pred), paired=TRUE) wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.LMnoTime.Pred), abs(ISE_data$ISE[c(430:536)]-Chr.LMwithTime.Pred), paired=TRUE)
  • 16. Task 1 Code # --------------------------------------------------------------- # Comparison of prediction methods, using validation set-up (ii). # i.e. Five-fold cross-validation with uniformly randomly sampled folds. # --------------------------------------------------------------- # Five-fold cross-validation data setup. # Create random permutation of values. set.seed(555) randperm=sample(nrow(ISE_data)) # Create lists with test folds and their respective training folds. trainfolds=list() testfolds=list() for(i in 1:5){ lower=floor((i-1)*nrow(ISE_data)/5)+1 upper=floor(i*nrow(ISE_data)/5) testfolds[[i]]=randperm[lower:upper] trainfolds[[i]]=setdiff(1:nrow(ISE_data),testfolds[[i]]) testfolds[[i]]=ISE_data[testfolds[[i]],] trainfolds[[i]]=ISE_data[trainfolds[[i]],] } # --------------------------------------------------------------- # Prediction method (i): Mean # -- Predictor Fol.ISEmean=list() for(i in 1:5){ Fol.ISEmean[[i]]=mean(trainfolds[[i]][[2]]) } # -- Predicted values Fol.ISEmean # -- Error measures # *** RMSE *** Fol.mean.rmse=list() for(i in 1:5){ Fol.mean.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) } Fol.mean.rmse=mean(as.numeric(Fol.mean.rmse)) # Standard Error Fol.mean.rmseSE=list() for(i in 1:5){ Fol.mean.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) } Fol.mean.rmseSE=mean(as.numeric(Fol.mean.rmseSE)) # Confidence Interval Fol.mean.rmse-1.96*Fol.mean.rmseSE; Fol.mean.rmse+1.96*Fol.mean.rmseSE # *** MAE *** Fol.mean.mae=list() for(i in 1:5){ Fol.mean.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) } Fol.mean.mae=mean(as.numeric(Fol.mean.mae)) # Standard Error Fol.mean.maeSE=list() for(i in 1:5){ Fol.mean.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) } Fol.mean.maeSE=mean(as.numeric(Fol.mean.maeSE)) # Confidence Interval Fol.mean.mae-1.96*Fol.mean.maeSE; Fol.mean.mae+1.96*Fol.mean.maeSE # *** Relative RMSE *** Fol.mean.RELrmse=list() for(i in 1:5){ Fol.mean.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) }
  • 17. Task 1 Code Fol.mean.RELrmse=mean(as.numeric(Fol.mean.RELrmse)) # Standard Error Fol.mean.RELrmseSE=list() for(i in 1:5){ Fol.mean.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) } Fol.mean.RELrmseSE=mean(as.numeric(Fol.mean.RELrmseSE)) # Confidence Interval Fol.mean.RELrmse-1.96*Fol.mean.RELrmseSE; Fol.mean.RELrmse+1.96*Fol.mean.RELrmseSE # *** Relative MAE *** Fol.mean.RELmae=list() for(i in 1:5){ Fol.mean.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) } Fol.mean.RELmae=mean(as.numeric(Fol.mean.RELmae)) # Standard Error Fol.mean.RELmaeSE=list() for(i in 1:5){ Fol.mean.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.ISEmean[[i]]) } Fol.mean.RELmaeSE=mean(as.numeric(Fol.mean.RELmaeSE)) # Confidence Interval Fol.mean.RELmae-1.96*Fol.mean.RELmaeSE; Fol.mean.RELmae+1.96*Fol.mean.RELmaeSE # --------------------------------------------------------------- # Prediction method (ii): Linear model excluding time. # -- Models Fol.LMnoTime=list() for(i in 1:5){ Fol.LMnoTime[[i]]=lm(ISE~S.P.500 + DAX + FTSE + NIKKEI + BOVESPA + MSCI.EU + MSCI.EM, data=trainfolds[[i]]) } # -- Predicted values Fol.LMnoTime.Pred=list() for(i in 1:5){ Fol.LMnoTime.Pred[[i]]=predict(Fol.LMnoTime[[i]], testfolds[[i]]) } # -- Error measures # *** RMSE *** Fol.LMnoTime.rmse=list() for(i in 1:5){ Fol.LMnoTime.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.rmse=mean(as.numeric(Fol.LMnoTime.rmse)) # Standard Error Fol.LMnoTime.rmseSE=list() for(i in 1:5){ Fol.LMnoTime.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.rmseSE=mean(as.numeric(Fol.LMnoTime.rmseSE)) # Confidence Interval Fol.LMnoTime.rmse-1.96*Fol.LMnoTime.rmseSE; Fol.LMnoTime.rmse+1.96*Fol.LMnoTime.rmseSE # *** MAE *** Fol.LMnoTime.mae=list() for(i in 1:5){ Fol.LMnoTime.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.mae=mean(as.numeric(Fol.LMnoTime.mae)) # Standard Error Fol.LMnoTime.maeSE=list() for(i in 1:5){ Fol.LMnoTime.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.maeSE=mean(as.numeric(Fol.LMnoTime.maeSE)) # Confidence Interval
  • 18. Task 1 Code Fol.LMnoTime.mae-1.96*Fol.LMnoTime.maeSE; Fol.LMnoTime.mae+1.96*Fol.LMnoTime.maeSE # *** Relative RMSE *** Fol.LMnoTime.RELrmse=list() for(i in 1:5){ Fol.LMnoTime.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.RELrmse=mean(as.numeric(Fol.LMnoTime.RELrmse)) # Standard Error Fol.LMnoTime.RELrmseSE=list() for(i in 1:5){ Fol.LMnoTime.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.RELrmseSE=mean(as.numeric(Fol.LMnoTime.RELrmseSE)) # Confidence Interval Fol.LMnoTime.RELrmse-1.96*Fol.LMnoTime.RELrmseSE; Fol.LMnoTime.RELrmse+1.96*Fol.LMnoTime.RELrmseSE # *** Relative MAE *** Fol.LMnoTime.RELmae=list() for(i in 1:5){ Fol.LMnoTime.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.RELmae=mean(as.numeric(Fol.LMnoTime.RELmae)) # Standard Error Fol.LMnoTime.RELmaeSE=list() for(i in 1:5){ Fol.LMnoTime.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.LMnoTime.Pred[[i]]) } Fol.LMnoTime.RELmaeSE=mean(as.numeric(Fol.LMnoTime.RELmaeSE)) # Confidence Interval Fol.LMnoTime.RELmae-1.96*Fol.LMnoTime.RELmaeSE; Fol.LMnoTime.RELmae+1.96*Fol.LMnoTime.RELmaeSE # --------------------------------------------------------------- # Prediction method (iii): Linear model including time. # -- Models Fol.LMwithTime=list() for(i in 1:5){ Fol.LMwithTime[[i]]=lm(ISE~date+S.P.500+DAX+FTSE+NIKKEI+BOVESPA+MSCI.EU+MSCI.EM, data=trainfolds[[i]]) } # -- Predicted values Fol.LMwithTime.Pred=list() for(i in 1:5){ Fol.LMwithTime.Pred[[i]]=predict(Fol.LMwithTime[[i]], testfolds[[i]]) } # -- Error measures # *** RMSE *** Fol.LMwithTime.rmse=list() for(i in 1:5){ Fol.LMwithTime.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.rmse=mean(as.numeric(Fol.LMwithTime.rmse)) # Standard Error Fol.LMwithTime.rmseSE=list() for(i in 1:5){ Fol.LMwithTime.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.rmseSE=mean(as.numeric(Fol.LMwithTime.rmseSE)) # Confidence Interval Fol.LMwithTime.rmse-1.96*Fol.LMwithTime.rmseSE; Fol.LMwithTime.rmse+1.96*Fol.LMwithTime.rmseSE # *** MAE *** Fol.LMwithTime.mae=list() for(i in 1:5){
  • 19. Task 1 Code Fol.LMwithTime.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.mae=mean(as.numeric(Fol.LMwithTime.mae)) # Standard Error Fol.LMwithTime.maeSE=list() for(i in 1:5){ Fol.LMwithTime.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.maeSE=mean(as.numeric(Fol.LMwithTime.maeSE)) # Confidence Interval Fol.LMwithTime.mae-1.96*Fol.LMwithTime.maeSE; Fol.LMwithTime.mae+1.96*Fol.LMwithTime.maeSE # *** Relative RMSE *** Fol.LMwithTime.RELrmse=list() for(i in 1:5){ Fol.LMwithTime.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.RELrmse=mean(as.numeric(Fol.LMwithTime.RELrmse)) # Standard Error Fol.LMwithTime.RELrmseSE=list() for(i in 1:5){ Fol.LMwithTime.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.RELrmseSE=mean(as.numeric(Fol.LMwithTime.RELrmseSE)) # Confidence Interval Fol.LMwithTime.RELrmse-1.96*Fol.LMwithTime.RELrmseSE; Fol.LMwithTime.RELrmse+1.96*Fol.LMwithTime.RELrmseSE # *** Relative MAE *** Fol.LMwithTime.RELmae=list() for(i in 1:5){ Fol.LMwithTime.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.RELmae=mean(as.numeric(Fol.LMwithTime.RELmae)) # Standard Error Fol.LMwithTime.RELmaeSE=list() for(i in 1:5){ Fol.LMwithTime.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.LMwithTime.Pred[[i]]) } Fol.LMwithTime.RELmaeSE=mean(as.numeric(Fol.LMwithTime.RELmaeSE)) # Confidence Interval Fol.LMwithTime.RELmae-1.96*Fol.LMwithTime.RELmaeSE; Fol.LMwithTime.RELmae+1.96*Fol.LMwithTime.RELmaeSE # --------------------------------------------------------------- # Comparison of prediction methods. # Vector of residuals for prediction method (i). Fol.ISEmean.resid=list() for(i in 1:5){ Fol.ISEmean.resid[[i]]=testfolds[[i]]$ISE-Fol.ISEmean[[i]] } Fol.ISEmean.resid=unlist(Fol.ISEmean.resid) # Vector of residuals for prediction method (ii). Fol.LMnoTime.resid=list() for(i in 1:5){ Fol.LMnoTime.resid[[i]]=testfolds[[i]]$ISE-Fol.LMnoTime.Pred[[i]] } Fol.LMnoTime.resid=unlist(Fol.LMnoTime.resid) # Vector of residuals for prediction method (iii). Fol.LMwithTime.resid=list() for(i in 1:5){ Fol.LMwithTime.resid[[i]]=testfolds[[i]]$ISE-Fol.LMwithTime.Pred[[i]] } Fol.LMwithTime.resid=unlist(Fol.LMwithTime.resid) # Test for comparison of prediction methods. wilcox.test(abs(Fol.ISEmean.resid), abs(Fol.LMnoTime.resid), paired=TRUE) wilcox.test(abs(Fol.ISEmean.resid), abs(Fol.LMwithTime.resid), paired=TRUE) wilcox.test(abs(Fol.LMnoTime.resid), abs(Fol.LMwithTime.resid), paired=TRUE)
  • 20. Task 1 Code # Part (d). Benchmarking with previous data. #create a vector of errors for RMSE and MAE in (i) ISE.error1=vector(mode="numeric", length=526) result.index=0 for(n in 11:536){ result.index=result.index+1 error1=ISE_data[n,2]-ISE_data[n-1,2] ISE.error1[result.index]=error1 } #calculate RMSE for (i) (RMSE1=sqrt(mean(ISE.error1^2))) #calculate MAE for (i) (MAE1=mean(abs(ISE.error1))) #calculate standard error of RMSE for (i) (SE.RMSE1=(sd(ISE.error1^2)/sqrt(526))/(2*sqrt(mean(ISE.error1^2)))) #calculate standard error of MAE for (i) (SE.MAE1=sd(abs(ISE.error1))/sqrt(526)) #95% confidence interval for RMSE RMSE1-1.96*SE.RMSE1; RMSE1+1.96*SE.RMSE1 #95% confidence interval for MAE MAE1-1.96*SE.MAE1; MAE1+1.96*SE.MAE1 #create a vector of errors for relative RMSE and relative MAE in (i) ISE.rerror1=ISE.error1/ISE_data[c(11:536),2] (rRMSE1=sqrt(mean(ISE.rerror1^2))) #calculate relatove MAE for (i) (rMAE1=mean(abs(ISE.rerror1))) #calculate standard error of relative RMSE for (i) (SE.rRMSE1=(sd(ISE.rerror1^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror1^2)))) #calculate standard error of relative MAE for (i) (SE.rMAE1=sd(abs(ISE.rerror1))/sqrt(526)) #95% confidence interval for relaitive RMSE rRMSE1-1.96*SE.rRMSE1; rRMSE1+1.96*SE.rRMSE1 #95% confidence interval for relative MAE rMAE1-1.96*SE.rMAE1; rMAE1+1.96*SE.rMAE1 ############ #create a vector of errors for RMSE and MAE in (ii) ISE.error2=vector(mode="numeric", length=526) result.index=0 for(n in 11:536){ result.index=result.index+1 error2=ISE_data[n,2]-mean(ISE_data[c((n-5):(n-1)),2]) ISE.error2[result.index]=error2 } #calculate RMSE for (ii) (RMSE2=sqrt(mean(ISE.error2^2))) #calculate MAE for (ii) (MAE2=mean(abs(ISE.error2))) #calculate standard error of RMSE for (ii) (SE.RMSE2=(sd(ISE.error2^2)/sqrt(526))/(2*sqrt(mean(ISE.error2^2)))) #calculate standard error of MAE for (ii) (SE.MAE2=sd(abs(ISE.error2))/sqrt(526)) #95% confidence interval for RMSE RMSE2-1.96*SE.RMSE2; RMSE2+1.96*SE.RMSE2 #95% confidence interval for MAE MAE2-1.96*SE.MAE2; MAE2+1.96*SE.MAE2 #create a vector of errors for relative RMSE and relative MAE in (ii) ISE.rerror2=ISE.error2/ISE_data[c(11:536),2]
  • 21. Task 1 Code #calculate relative RMSE for (ii) (rRMSE2=sqrt(mean(ISE.rerror2^2))) #calculate relative MAE for (ii) (rMAE2=mean(abs(ISE.rerror2))) #calculate standard error of relative RMSE for (ii) (SE.rRMSE2=(sd(ISE.rerror2^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror2^2)))) #calculate standard error of relative MAE for (ii) (SE.rMAE2=sd(abs(ISE.rerror2))/sqrt(526)) #95% confidence interval for relative RMSE rRMSE2-1.96*SE.rRMSE2; rRMSE2+1.96*SE.rRMSE2 #95% confidence interval for relative MAE rMAE2-1.96*SE.rMAE2; rMAE2+1.96*SE.rMAE2 ################################################################################# #create a vector of errors for RMSE and MAE in (iii) ISE_data.iii=ISE_data[-536,] ISE_data.iii$ISE.predicted=ISE_data$ISE[2:536] ISE.error3=vector(mode="numeric", length=526) result.index=0 for(n in 10:535){ result.index=result.index+1 lmmodel3=lm(ISE.predicted~ISE+S.P.500+DAX+FTSE+NIKKEI+BOVESPA+MSCI.EU+MSCI.EM, data=ISE_data.iii[(n-9):(n-1),]) error3=ISE_data.iii[n,10]-predict(lmmodel3, ISE_data.iii[n,]) ISE.error3[result.index]=error3 } #calculate RMSE for (iii) (RMSE3=sqrt(mean(ISE.error3^2))) #calculate MAE for (iii) (MAE3=mean(abs(ISE.error3))) #calculate standard error of RMSE for (iii) (SE.RMSE3=(sd(ISE.error3^2)/sqrt(526))/(2*sqrt(mean(ISE.error3^2)))) #calculate standard error of MAE for (iii) (SE.MAE3=sd(abs(ISE.error3))/sqrt(526)) #95% confidence interval for RMSE RMSE3-1.96*SE.RMSE3; RMSE3+1.96*SE.RMSE3 #95% confidence interval for MAE MAE3-1.96*SE.MAE3; MAE3+1.96*SE.MAE3 #create a vector of errors for relative RMSE and relative MAE in (iii) ISE.rerror3=ISE.error3/ISE_data.iii[c(10:535),10] #calculate relative RMSE for (iii) (rRMSE3=sqrt(mean(ISE.rerror3^2))) #calculate relative MAE for (iii) (rMAE3=mean(abs(ISE.rerror3))) #calculate standard error of relative RMSE for (iii) (SE.rRMSE3=(sd(ISE.rerror3^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror3^2)))) #calculate standard error of relative MAE for (iii) (SE.rMAE3=sd(abs(ISE.rerror3))/sqrt(526)) #95% confidence interval for relative RMSE rRMSE3-1.96*SE.rRMSE3; rRMSE3+1.96*SE.rRMSE3 #95% confidence interval for relative MAE rMAE3-1.96*SE.rMAE3; rMAE3+1.96*SE.rMAE3
  • 22. Task 1 Code ######################################################################################## #create a vector of errors for RMSE and MAE in (iv) ISE_data.iv=ISE_data[-c(535,536),] ISE_data.extracted=ISE_data[-c(1,536),-1] ISE_data.iv=cbind(ISE_data.iv,ISE_data.extracted) ISE_data.iv$ISE.predicted=ISE_data[-c(1,2),2] names(ISE_data.iv)=c("date","ISE2","S.P.5002","DAX2","FTSE2", "NIKKEI2","BOVESPA2","MSCI.EU2","MSCI.EM2", "ISE1","S.P.5001","DAX1","FTSE1", "NIKKEI1","BOVESPA1","MSCI.EU1","MSCI.EM1","ISE.predicted") ISE.error4=vector(mode="numeric", length=526) result.index=0 for(n in 9:534){ result.index=result.index+1 lmmodel4=lm(ISE.predicted~ISE2+S.P.5002+DAX2+FTSE2+NIKKEI2+BOVESPA2+MSCI.EU2+MSCI.EM2 +ISE1+S.P.5001+DAX1+FTSE1+NIKKEI1+BOVESPA1+MSCI.EU1+MSCI.EM1, data=ISE_data.iv[(n-8):(n-1),]) error4=ISE_data.iv[n,18]-predict(lmmodel4, ISE_data.iv[n,]) ISE.error4[result.index]=error4 } #calculate RMSE for (iv) (RMSE4=sqrt(mean(ISE.error4^2))) #calculate MAE for (iv) (MAE4=mean(abs(ISE.error4))) #calculate standard error of RMSE for (iv) (SE.RMSE4=(sd(ISE.error4^2)/sqrt(526))/(2*sqrt(mean(ISE.error4^2)))) #calculate standard error of MAE for (iv) (SE.MAE4=sd(abs(ISE.error4))/sqrt(526)) #95% confidence interval for RMSE RMSE4-1.96*SE.RMSE4; RMSE4+1.96*SE.RMSE4 #95% confidence interval for MAE MAE4-1.96*SE.MAE4; MAE4+1.96*SE.MAE4 ######### #create a vector of errors for relative RMSE and relative MAE in (iv) ISE.rerror4=ISE.error4/ISE_data.iv[c(9:534),18] #calculate relative RMSE for (iv) (rRMSE4=sqrt(mean(ISE.rerror4^2))) #calculate relative MAE for (iv) (rMAE4=mean(abs(ISE.rerror4))) #calculate standard error of relative RMSE for (iv) (SE.rRMSE4=(sd(ISE.rerror4^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror4^2)))) #calculate standard error of relative MAE for (iv) (SE.rMAE4=sd(abs(ISE.rerror4))/sqrt(526)) #95% confidence interval for relative RMSE rRMSE4-1.96*SE.rRMSE4; rRMSE4+1.96*SE.rRMSE4 #95% confidence interval for relative MAE rMAE4-1.96*SE.rMAE4; rMAE4+1.96*SE.rMAE4 ####################################################################################### #wilcoxon tests to compare the 4 different methods wilcox.test(abs(ISE.error1),abs(ISE.error2), paired=TRUE) wilcox.test(abs(ISE.error1),abs(ISE.error3), paired=TRUE) wilcox.test(abs(ISE.error1),abs(ISE.error4), paired=TRUE) wilcox.test(abs(ISE.error2),abs(ISE.error3), paired=TRUE) wilcox.test(abs(ISE.error2),abs(ISE.error4), paired=TRUE) wilcox.test(abs(ISE.error3),abs(ISE.error4), paired=TRUE)
  • 23. Task 1 Code # Part (e)-(c). Robust linear regression with Part (c) validation setups. # ------------------------------ # Creating function for R(beta). # ------------------------------ Rbeta=function(beta, covariates, observed){ sum(abs(as.matrix(covariates)%*%matrix(beta)-observed)) } # --------------------------------------------------------- # Validation set-up (i). Chronological 80-20 split of data. # --------------------------------------------------------- # Prediction method (iv). Robust linear regression # -- Model Chr.PartE=nlm(Rbeta, p=c(-1,-1,-1,-1,-1,-1,-1), observed=ISE_data$ISE[1:429], covariates=ISE_data[1:429,3:9]) # -- Predicted values Chr.PartE.Pred=as.matrix(ISE_data[430:536,3:9]) %*% matrix(Chr.PartE$estimate) # -- Error measures Chr.PartE.rmse = rmse(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.rmseSE = rmseSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.rmse-1.96*Chr.PartE.rmseSE; Chr.PartE.rmse+1.96*Chr.PartE.rmseSE Chr.PartE.mae = mae(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.maeSE = maeSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.mae-1.96*Chr.PartE.maeSE; Chr.PartE.mae+1.96*Chr.PartE.maeSE Chr.PartE.RELrmse = RELrmse(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.RELrmseSE = RELrmseSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.RELrmse-1.96*Chr.PartE.RELrmseSE; Chr.PartE.RELrmse+1.96*Chr.PartE.RELrmseSE Chr.PartE.RELmae = RELmae(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.RELmaeSE = RELmaeSE(ISE_data$ISE[c(430:536)], Chr.PartE.Pred) Chr.PartE.RELmae-1.96*Chr.PartE.RELmaeSE; Chr.PartE.RELmae+1.96*Chr.PartE.RELmaeSE # Comparison of prediction methods. wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.ISEmean), abs(ISE_data$ISE[c(430:536)]-Chr.PartE.Pred), paired=TRUE) wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.LMnoTime.Pred), abs(ISE_data$ISE[c(430:536)]-Chr.PartE.Pred), paired=TRUE) wilcox.test(abs(ISE_data$ISE[c(430:536)]-Chr.LMwithTime.Pred), abs(ISE_data$ISE[c(430:536)]-Chr.PartE.Pred), paired=TRUE) # --------------------------------------------------- # Validation set-up (ii). Five-fold cross-validation. # --------------------------------------------------- # Prediction method (iv). Robust linear regression # -- Models Fol.PartE=list() for(i in c(1,3,5)){ Fol.PartE[[i]]=nlm(Rbeta, p=c(-0.5,-0.5,-0.5,-0.5,-0.5,-0.5,-0.5), observed=trainfolds[[i]]$ISE, covariates=trainfolds[[i]][c(3:9)]) } for(i in c(2,4)){ Fol.PartE[[i]]=nlm(Rbeta, p=c(-1,-1,-1,-1,-1,-1,-1), observed=trainfolds[[i]]$ISE, covariates=trainfolds[[i]][c(3:9)]) } # -- Predicted values Fol.PartE.Pred=list() for(i in 1:5){ Fol.PartE.Pred[[i]]=as.matrix(testfolds[[i]][c(3:9)])%*%matrix(Fol.PartE[[i]]$estimate) } # -- Error measures # *** RMSE *** Fol.PartE.rmse=list()
  • 24. Task 1 Code for(i in 1:5){ Fol.PartE.rmse[[i]]=rmse(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.rmse=mean(as.numeric(Fol.PartE.rmse)) # Standard Error Fol.PartE.rmseSE=list() for(i in 1:5){ Fol.PartE.rmseSE[[i]]=rmseSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.rmseSE=mean(as.numeric(Fol.PartE.rmseSE)) # Confidence Interval Fol.PartE.rmse-1.96*Fol.PartE.rmseSE; Fol.PartE.rmse+1.96*Fol.PartE.rmseSE # *** MAE *** Fol.PartE.mae=list() for(i in 1:5){ Fol.PartE.mae[[i]]=mae(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.mae=mean(as.numeric(Fol.PartE.mae)) # Standard Error Fol.PartE.maeSE=list() for(i in 1:5){ Fol.PartE.maeSE[[i]]=maeSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.maeSE=mean(as.numeric(Fol.PartE.maeSE)) # Confidence Interval Fol.PartE.mae-1.96*Fol.PartE.maeSE; Fol.PartE.mae+1.96*Fol.PartE.maeSE # *** Relative RMSE *** Fol.PartE.RELrmse=list() for(i in 1:5){ Fol.PartE.RELrmse[[i]]=RELrmse(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.RELrmse=mean(as.numeric(Fol.PartE.RELrmse)) # Standard Error Fol.PartE.RELrmseSE=list() for(i in 1:5){ Fol.PartE.RELrmseSE[[i]]=RELrmseSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.RELrmseSE=mean(as.numeric(Fol.PartE.RELrmseSE)) # Confidence Interval Fol.PartE.RELrmse-1.96*Fol.PartE.RELrmseSE; Fol.PartE.RELrmse+1.96*Fol.PartE.RELrmseSE # *** Relative MAE *** Fol.PartE.RELmae=list() for(i in 1:5){ Fol.PartE.RELmae[[i]]=RELmae(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.RELmae=mean(as.numeric(Fol.PartE.RELmae)) # Standard Error Fol.PartE.RELmaeSE=list() for(i in 1:5){ Fol.PartE.RELmaeSE[[i]]=RELmaeSE(testfolds[[i]]$ISE, Fol.PartE.Pred[[i]]) } Fol.PartE.RELmaeSE=mean(as.numeric(Fol.PartE.RELmaeSE)) # Confidence Interval Fol.PartE.RELmae-1.96*Fol.PartE.RELmaeSE; Fol.PartE.RELmae+1.96*Fol.PartE.RELmaeSE # Comparison of prediction methods. # Vector of residuals for prediction method (iv). Fol.PartE.resid=list() for(i in 1:5){ Fol.PartE.resid[[i]]=testfolds[[i]]$ISE - Fol.PartE.Pred[[i]] } Fol.PartE.resid=unlist(Fol.PartE.resid) # Test for comparison of prediction methods. wilcox.test(abs(Fol.ISEmean.resid), abs(Fol.PartE.resid), paired=TRUE) wilcox.test(abs(Fol.LMnoTime.resid), abs(Fol.PartE.resid), paired=TRUE) wilcox.test(abs(Fol.LMwithTime.resid), abs(Fol.PartE.resid), paired=TRUE)
  • 25. Task 1 Code # Part (e)-(d). Robust linear regression with Part (d) validation setup. #create a vector of errors for RMSE and MAE for 526 data splits ISE.error5=vector(mode="numeric", length=526) result.index=0 for(n in 10:535){ result.index=result.index+1 Sum.residuals=function(be,x,y){ res=be%*%t(x) SAR=sum(abs(res-y)) return(SAR) } beta=nlm(Sum.residuals, p=c(10,10,10,10,10,10,10,10), x=ISE_data.iii[(n-9):(n-1),-c(1,10)], y=ISE_data.iii$ISE.predicted[(n-9):(n-1)], iterlim=300)$estimate error5=ISE_data.iii$ISE.predicted[n]-beta%*%t(ISE_data.iii[n,2:9]) ISE.error5[result.index]=error5 } #calculate RMSE (RMSE5=sqrt(mean(ISE.error5^2))) #calculate MAE (MAE5=mean(abs(ISE.error5))) #calculate standard error of RMSE (SE.RMSE5=(sd(ISE.error5^2)/sqrt(526))/(2*sqrt(mean(ISE.error5^2)))) #calculate standard error of MAE (SE.MAE5=sd(abs(ISE.error5))/sqrt(526)) #95% confidence interval for RMSE RMSE5-1.96*SE.RMSE5; RMSE5+1.96*SE.RMSE5 #95% confidence interval for MAE MAE5-1.96*SE.MAE5; MAE5+1.96*SE.MAE5 #create a vector of errors for relative RMSE and relative MAE in (i) ISE.rerror5=ISE.error5/ISE_data[c(11:536),2] #calculate relative RMSE (rRMSE5=sqrt(mean(ISE.rerror5^2))) #calculate relative MAE (rMAE5=mean(abs(ISE.rerror5))) #calculate standard error of relative RMSE (SE.rRMSE5=(sd(ISE.rerror5^2)/sqrt(526))/(2*sqrt(mean(ISE.rerror5^2)))) #calculate standard error of relative MAE (SE.rMAE5=sd(abs(ISE.rerror5))/sqrt(526)) #95% confidence interval for relative RMSE rRMSE5-1.96*SE.rRMSE5; RMSE5+1.96*SE.rRMSE5 #95% confidence interval for relative MAE rMAE5-1.96*SE.rMAE5; rMAE5+1.96*SE.rMAE5 #wilcoxon tests to compare the mothod of part e with different 4 methods from part d wilcox.test(abs(ISE.error5),abs(ISE.error1), paired=TRUE) wilcox.test(abs(ISE.error5),abs(ISE.error2), paired=TRUE) wilcox.test(abs(ISE.error5),abs(ISE.error3), paired=TRUE) wilcox.test(abs(ISE.error5),abs(ISE.error4), paired=TRUE)
  • 26. Task 2 Code Appendix B: Task 2 SAS Code libname cps "C:/Users/User/Documents/STAT7001/cps"; data cps.Rd; input R d; datalines; 0.00093 0.2588 0.00148 0.2053 0.0024 0.1628 0.0037 0.1291 0.0059 0.1024 0.0095 0.08118 0.0150 0.06438 0.024 0.05106 0.038 0.04049 0.048 0.03606 0.061 0.03211 0.096 0.02546 0.153 0.02019 0.24 0.01601 0.39 0.01270 0.98 0.00799 run; PROC print; run; *TASK 2(a); *1(a); *setting the font size to 12pt; goptions device=gif hsize=4in vsize=3in border ftext="sasfont" htext=12pt; proc univariate data=cps.Rd; var R d; histogram; qqplot / normal(mu=est sigma=est); run; title; title2 "Resistance versus diameter"; symbol1 value=plus colour=red; axis1 label=("Diameter(cm)"); axis2 label=(angle=90 "Resistance(Ohm)"); proc gplot data=cps.Rd; plot R*d /haxis=axis1 vaxis=axis2; run; proc reg data=cps.Rd; model R=d; run; *TASK 2 (b); data cps.Rd2; set cps.Rd; logR=log(R); recd2=1/(d**2); recd=1/d; logd=log(d); d2=d**2; d3=d**3; d4=d**4; d5=d**5; d6=d**6; d7=d**7;
  • 27. Task 2 Code d8=d**8; d9=d**9; d10=d**10; d11=d**11; d12=d**12; d13=d**13; d14=d**14; d15=d**15; run; PROC print; run; *suggested i; PROC reg data=cps.Rd2; model logR = logd; run; *TASK 2 (b) i; proc reg data=cps.Rd2; model R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15; run; *TASK 2 (b) ii; PROC reg data=cps.Rd2; model R = recd2 recd; run; *suggested ii; PROC reg data=cps.Rd2; model R = recd2; run; proc corr plots=(matrix); with R logR; var recd recd2 logd d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15 ; run; *LEAVE-ONE-OUT CROSS VALIDATION *Suggested i model logR = logd; *Generate the cross validation data; data cps.cv4; do replicate = 1 to datasize; do rec = 1 to datasize; set cps.Rd2 nobs=datasize point=rec; if rec ^= replicate then new_R=logR; else new_R=.; output; end; end; stop; run; proc print; run; *get predicted values for the missing new_R in each replicate; proc reg data=cps.cv4; model new_R=logd; by replicate; output out=out4a(where=(new_R=.)) predicted=R_hat; run; proc print; run; *and summarize the results; data cps.out4b; set out4a; diff=logR-R_hat; absd=abs(diff); run; title;
  • 28. Task 2 Code title2 "Residual Plot for Model logR = logd"; symbol1 value=plus colour=red; axis1 label=("logR"); axis2 label=(angle=90 "Residual"); proc gplot data=cps.out4b; plot diff*logR /haxis=axis1 vaxis=axis2; run; proc summary data=cps.out4b; var diff absd; output out=out4c std(diff)=rmse mean(absd)=mae std(absd)=c; run; proc print; run; data out4d; set cps.out4b; diff2=diff**2; mse=0.009292428**2; a=(diff2-mse)**2; run; proc summary data=out4d; var a; output out=out4e sum(a)=b ; run; data out3f; set out3e; seRMSE=((b**0.5)/16)/(2*0.009292428); seMAE=.006464840/4; run; proc print; run; *2(b)i model: R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15; *Generate the cross validation data; data cps.cv2; do replicate = 1 to datasize; do rec = 1 to datasize; set cps.Rd2 nobs=datasize point=rec; if rec ^= replicate then new_R=R; else new_R=.; output; end; end; stop; run; proc print; run; *get predicted values for the missing new_R in each replicate; proc reg data=cps.cv2; model new_R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15; by replicate; output out=out2a(where=(new_R=.)) predicted=R_hat; run; *and summarize the results; data cps.out2b; set out2a; diff=R-R_hat; absd=abs(diff); run; title; title2 "Residual Plot for Model R=d d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 d15"; symbol1 value=plus colour=red; axis1 label=("R");
  • 29. Task 2 Code axis2 label=(angle=90 "Residual"); proc gplot data=out2b; plot diff*R /haxis=axis1 vaxis=axis2; run; proc summary data=cps.out2b; var diff absd; output out=out2c std(diff)=rmse mean(absd)=mae std(absd)=c; run; proc print; run; data out2d; set cps.out2b; diff2=diff**2; mse=5358.40**2; a=(diff2-mse)**2; run; proc summary data=out2d; var a; output out=out2e sum(a)=b; run; data out2f; set out2e; seRMSE=((b**0.5)/16)/(2*5358.40); seMAE=5357.98/4; run; proc print; run; *2(b)ii model R = recd2 recd; *Generate the cross validation data; data cps.cv3; do replicate = 1 to datasize; do rec = 1 to datasize; set cps.Rd2 nobs=datasize point=rec; if rec ^= replicate then new_R=R; else new_R=.; output; end; end; stop; run; proc print; run; *get predicted values for the missing new_R in each replicate; proc reg data=cps.cv3; model new_R=recd2 recd; by replicate; output out=out3a(where=(new_R=.)) predicted=R_hat; run; proc print; run; *and summarize the results; data cps.out3b; set out3a; diff=R-R_hat; absd=abs(diff); run; title; title2 "Residual Plot for Model R = recd2 recd"; symbol1 value=plus colour=red; axis1 label=("R"); axis2 label=(angle=90 "Residual"); proc gplot data=cps.out3b;
  • 30. Task 2 Code plot diff*R /haxis=axis1 vaxis=axis2; run; proc print; run; proc summary data=cps.out3b; var diff absd; output out=out3c std(diff)=rmse mean(absd)=mae std(absd)=c; run; proc print; run; data out3d; set cps.out3b; diff2=diff**2; mse=0.001920634**2; a=(diff2-mse)**2; run; proc summary data=out3d; var a; output out=out3e sum(a)=b ; run; data out3f; set out3e; seRMSE=((b**0.5)/16)/(2*.001920634); seMAE=.001679167/4; run; proc print; run; *Suggested ii model R = recd2; *Generate the cross validation data; data cps.cv5; do replicate = 1 to datasize; do rec = 1 to datasize; set cps.Rd2 nobs=datasize point=rec; if rec ^= replicate then new_R=R; else new_R=.; output; end; end; stop; run; proc print; run; *get predicted values for the missing new_R in each replicate; proc reg data=cps.cv5; model new_R=recd2; by replicate; output out=out5a(where=(new_R=.)) predicted=R_hat; run; proc print; run; *and summarize the results; data cps.out5b; set out5a; diff=R-R_hat; absd=abs(diff); run; title; title2 "Residual Plot for Model R = recd2"; symbol1 value=plus colour=red; axis1 label=("R"); axis2 label=(angle=90 "Residual"); proc gplot data=cps.out5b;
  • 31. Task 2 Code plot diff*R /haxis=axis1 vaxis=axis2; run; proc print; run; proc summary data=cps.out5b; var diff absd; output out=out5c std(diff)=rmse mean(absd)=mae std(absd)=c; run; proc print; run; data out5d; set cps.out5b; diff2=diff**2; mse=0.001314149**2; a=(diff2-mse)**2; run; proc summary data=out5d; var a; output out=out5e sum(a)=b ; run; data out5f; set out5e; seRMSE=((b**0.5)/16)/(2*0.001314149); seMAE=.001131049/4; run; proc print ; run; *producing a table containing absd from all 4 models to carry out paired wilcoxon signed rank test; PROC SQL; SELECT A.absd, B.absd, C.absd, D.absd FROM cps.out4b AS A, cps.out2b AS B, cps.out3b AS C, cps.out5b AS D WHERE A.replicate=B.replicate=C.replicate=D.replicate; data cps.absd; input model1 model2 model3 model4; datalines; 0.002509 21432.82 0.000178 0.000222 0.000547 8.835317 0.000144 0.000221 0.022544 3.601289 0.000052 0.000269 0.013488 0.548649 0.000112 0.000167 0.009526 0.158066 0.000069 0.000153 0.003606 0.060452 0.000078 0.000232 0.003878 0.043877 0.000043 0.000121 0.003001 0.016238 0.000236 0.000225 0.001622 0.024246 0.000159 0.000046 0.0004 0.023091 0.000266 0.000096 0.008668 0.009097 0.000809 0.00056 0.00284 0.031168 0.000013 0.000341 0.000346 0.043857 0.000134 0.000307 0.016335 0.003074 0.004498 0.004225 0.010083 0.072958 0.003586 0.002621 0.004023 0.278421 0.004755 0.000581 run; data cps.diff; set cps.absd; AB=model1-model2; AC=model1-model3; AD=model1-model4; BC=model2-model3; BD=model2-model4; CD=model3-model4;
  • 32. Task 2 Code run; proc univariate data = cps.diff; var AB AC AD BC BD CD ; run;
  • 33. Task 2 Code Appendix C: References 1. Jeff Cartier. The Basics of Creating Graphs with SAS/GRAPH® Software. [online]. Available from: https://support.sas.com/rnd/datavisualization/papers/GraphBasics.pdf [Accessed 24 February 2016] 2. Steven M. LaLonde. 2012. Transforming Variables for Normality and Linearity – When, How, Why and Why Not's. [online]. Available from: http://support.sas.com/resources/papers/proceedings12/430-2012.pdf [13 March 2016] 3. David L. Cassell. 2007. Don't Be Loopy: Re-Sampling and Simulation the SAS® Way. [online]. Available from: http://www2.sas.com/proceedings/forum2007/183-2007.pdf [14 March 2016] 4. Michael J. Wieczkowski. Alternatives to Merging SAS Data Sets … But Be Careful. [online]. Available from: http://www.ats.ucla.edu/stat/sas/library/nesug99/bt150.pdf [23 March 2016].