SlideShare a Scribd company logo
1 of 7
Download to read offline
Self-made project: Predictive Modelling and
Multivariate Analysis using climate data
October 1, 2016
Harris Phan
1 Introduction/Aim
To find the significant factors which contribute to Ice free days in Kotzbehue,
Alaska. The dataset can be found on this website:
http://www.multivariatestatistics.org/data.html
There are many factors that could contribute to icy weather in any country.
To be able to predict any statistical model, one must consider data from the
past. The data that has been recorded will be used to predict how many icy
days will there be in the future. However, sometimes datasets have factors that
seem similar to each other, which means they could be made redundant thus the
need for multivariate analysis techniques such as principal components analysis
and factor analysis. The first part of this report will consist of multivariate
analysis techniques, starting with the multivariate test of normality, Hotelling’s
T2
, multiple and partial correlation, principal components analysis and factor
analysis. The second part will consists of predictive modelling techniques such
as time series regression (possibly neural networks). SAS will be used for this
report.
1
2 Data and description
Data Description
Year The Year in which the data was recorded from 1981-2003.
AO
This stands for Arctic Oscillation.
https://www.ncdc.noaa.gov/teleconnections/ao/
for more information. AOsumm represents the AO in the summer
and AOwint represents the AO in the winter.
NPI
This stands for the North Pacific Index, which is the area-weighted sea
level pressure. NPIspring represents the NPI in spring and NPIwinter
represents the NPI in winter. More information can be found here:
https://climatedataguide.ucar.edu/
climate-data/north-pacific-np-index
-trenberth-and-hurrell-monthly-and-winter
Temp
Represents the temperature in degrees celsius. TempSummer rep-
resents the temperature in summer and TempWinter represents the
temperature in winter.
Rain
This represents the amount of rainfall. RainSumm represents how
much rain fell in summer and RainWint represents how much rain
fell in winter.
Ice
This represents how much ice was there during the whole year. Ice-
JanJul represents how much ice was there between January and July
on averageand IceOctDec represents how much ice was there on av-
erage between October and December.
IceFreeDays
This represents how many days that were ice-free in Kotzbehue,
Alaska.
3 Multivariate analysis
3.1 Tests of multivariate normality
In order to assume that 3 or more variables have reasonable multivariate normal
characteristics, 2 test statistics will be used here: Mardia’s multivariate skewness
and kurtosis measures. A hypothesis test will be conducted:
H0 : (X1, X2, X3, X4, X5)T
is multivariate normal
H1 : (X1, X2, X3, X4, X5)T
is not multivariate normal.
2
Now we use Mardia’s multivariate skewness (κ1) and kurtosis measures (κ2),
which is given by:
κ1 = nˆβ1,p/6 ∼ χ2
p(p+1)(p+2)/6
κ2 = [ˆβ2,p − p(p + 2)]/[8p(p + 2)/n]
1
2 ∼ N(0, 1).
Since there is no default procedure, I will put down my code here:
r = nrow(x);
c = ncol(x);
dfc = c*(c+1)*(c+2)/6;
q = i(r) - (1/r)*j(r,r,1);
s = (1/(r))*x‘*q*x ; s_inv = inv(s) ;
g_matrix = q*x*s_inv*x‘*q ;
beta1hat = ( sum(g_matrix#g_matrix#g_matrix) )/(r*r) ;
beta2hat =trace( g_matrix#g_matrix )/r ;
kappa1 = r*beta1hat/6 ;
kappa2 = (beta2hat - c*(c+2) ) /sqrt(8*c*(c+2)/r) ;
pvalskew = 1 - probchi(kappa1,dfc) ;
pvalkurt = 2*( 1 - probnorm(abs(kappa2)) ) ;
print s ;
print s_inv ;
print beta1hat ;
print kappa1 ;
print pvalskew ;
print beta2hat ;
print kappa2 ;
print pvalkurt ;
quit;
Suppose we let X1 = AO, X2 = NPI, X3 = Temp, X4 = Rain and X5 =
Ice. By our SAS output, κ1 = 23.527265, which has a respective p-value of
0.930064. This means at the 0.05 significance level, there is not enough evidence
to reject the null hypothesis. However we also need κ2, which is 29.913077 and
corresponds to the kurtosis of the multivariate normal distribution, which has
a respective p-value of 0.1448567 hence at the 0.05 significance level there is
enough evidence to reject the null hypothesis. This means that the multivariate
normality assumption is not in reasonable agreement with the data. Note that
this is only considering the year round factors, season has not been accounted
for. Therefore we can proceed with Hotelling’s T2
.
3.2 Hotelling’s T2
We shall compare the means of rainfall and Temperature in the summer and the
means of rainfall and Temperature in the winter. The primary reason for using
3
Hotelling’s T2
is to see if the seasons have an effect on both the temperature
and the rainfall. The big hint here is to use the hypothesis:
H0 : E
X1 − X2
X3 − X4
= E
Y1
Y2
= 0.
This hypothesis can be formulated by using the matrix:
Y =
1 0 −1 0
0 1 0 −1
X.
The SAS code is provided here:
proc corr cov noprob nocorr outp = outcovmatrix;
var x1-x4;
quit;
proc iml;
use outcovmatrix where(_TYPE_="COV");
read all var _NUM_ into cov[colname=varNames];
use outcovmatrix where(_TYPE_="MEAN");
read all var _NUM_ into meansMatrix[colname=varNames];
print cov, meansMatrix;
Sigma_11 = cov[ {3 4}, {3 4}];
Sigma_22 = cov[ {1 2}, {1 2}];
Sigma_12 = cov[ {3 4}, {1 2}];
Sigma_21 = cov[ {1 2}, {3 4}];
mu_1 = meansMatrix[{3 4}];
mu_2 = meansMatrix[{1 2}];
SInv = inv(cov);
print SInv;
y = {1 0 -1 0, 0 1 0 -1};
covyx = y*cov*y‘;
print covyx;
covyxinv = inv(covyx);
print covyxinv;
mudiff_1 = meansMatrix[{1}]-meansMatrix[{2}];
mudiff_2 = meansMatrix[{3}]-meansMatrix[{4}];
print mudiff_1;
print mudiff_2;
n = 23;
p = 4;
mu_vect = {27.396087,3.2408696};
T2 = 23* mu_vect‘ * covyxinv * mu_vect;
print T2;
critical_value_of_F = p*(n-1)/(n-p)*finv(.95, p, n-p);
print critical_value_of_F;
quit;
4
Note that the formula for Hotelling’s T2
is given by:
T2
= n(¯X − µ0) S−1
(¯X − µ0) ∼
(n − 1)p
n − p
Fp,n−p,
By the SAS output, Hotelling’s T2
is 5631.1594, which is greater than the critical
value of 13.408918. This means there is enough evidence to reject the null
hypothesis and thus there is a difference between the means of the rainfall
and the temperature in the summer and the means of the rainfall and the
temperature in the winter.
Now do something similar, except this time we compare the means of AO and
Rain in the summer and the means of AO and rain in the winter. The Hotelling’s
T2
value is 5945.4702, which is greater than the critical value of 13.408918.
Therefore overall, summer conditions differ greatly from winter conditions.
3.3 Simple, Partial and Multiple correlation
We now wish to find out the strength of the relationships between variables. To
do this, correlation procedures are used. If the variables are highly correlated
with each other, that makes them a potential candidate for principal components
analysis and factor analysis, which will be done later. In thus subsection, the
same types of variables will be tested. That is test the correlation of A0 like
variables, NPI like variables, Temperature like variables, Rain like variables and
Ice like variables. Then we test to see if there is a strong correlation between
days with no ice and other variables. Firstly the AO variables will be tested.
Simple correlation in SAS gives us,
proc corr data=climatedata;
var AO AO_wint AO_summ;
run;
AO AOwint AOsumm
AO 1.00000 0.58909 0.56242
AOwint 0.58909 1.00000 0.93234
AOsumm 0.56242 0.93234 1.00000
So there is a strong correlation between Arctic Oscillation in Winter and Arctic
Oscillation in Winter. However the overall Arctic Oscillation is not that strongly
correlated between the Winter and Summer Arctic Oscillation. Now consider
the NPI like variables,
proc corr data=climatedata;
var NPI NPI_spring NPI_winter;
run;
NPI NPIspring NPIwinter
NPI 1.00000 0.18645 0.99790
NPIspring 0.18645 1.00000 0.19444
NPIwinter 0.99790 0.19444 1.00000
5
In this case, only the NPI in winter has a strong correlation with the overall NPI.
The others are weakly correlated with each other. However out of curiousity I
decided to try and use partial correlation.
proc corr data=climatedata;
var NPI NPI_spring;
partial NPI_winter;
run;
The correlation between NPI and NPIspring with NPIwinter being taken as
the partial variable is -0.11919, which makes it negatively correlated but still
weak. However making NPIwinter the variable and NPIspring the partial gives
a correlation coefficient closer to 1 which means taking the spring NPI into
account, the winter NPI becomes more strongly correlated with the overall NPI.
3.4 Principal components analysis
We now do Principal Components analysis. This is a variable reduction proce-
dure and in the climate dataset, there seems to be variables that are correlated
with each other. It is advised that when using Principal components analysis
requires a large sample size. This is due to correlations needing large sample
size before they stabilise. The SAS princomp procedure would allow us to do
Principal components analysis. There are 3 parts to the princomp procedure:
The simple statistics and the correlation matrix, eigenvalues of the correlation
matrix and the eigenvectors and finally the scree plot/variance explained. The
eigenvalues are simply the variances of the principal components themselves.
proc princomp data= climatedata;
var Ice Ice_JanJul Ice_OctDec;
run;
What is important here is the scree plot as it graphs the eigenvalue against the
component number. For the 3 candidate variables Ice, IceJanJul and IceOctDec,
By the minigen criterion, only one of the eigenvalues is above 1 and it should
be the only one that is retained. Components with an eigenvalue of less than 1
become of little use. So according to principal components analysis, seasonality
does not really matter much so only Ice is retained.
proc princomp data= climatedata;
var AO AO_wint AO_summ;
run;
Even though AOsumm and AOwint have a correlation of above 0.9, I will still
continue with it in the Principal Components Analysis. Again like the Ice-like
variable, there is only one eigenvalue which is above 1 when it comes to AO-like
variable. So again it seems that according to PCA, seasonality does not matter
6
and we retain AO was the variable.
Using a similar princomp procedure we see that only one NPI-like variable
should be used.
proc princomp data= climatedata;
var Temp Temp_summ Temp_wint;
run;
However, Temperature-like variables become more tricky to handle. We now get
eigenvalues of 1.12191146, 1.02992761 and 0.84816093. It is clear from the cor-
relation matrix provided in princomp that the temperature in summer is weakly
correlated to the yearly temperature. Same goes for the winter temperature.
Thus seasonal variance should be accounted for. Therefore by the eyeball test,
none of the Temperature-like variables should be discarded.
3.5 Factor Analysis
7

More Related Content

What's hot

Shortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-ExceShortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-ExceIJERA Editor
 
Yet another statistical analysis of the data of the ‘loophole free’ experime...
Yet another statistical analysis of the data of the  ‘loophole free’ experime...Yet another statistical analysis of the data of the  ‘loophole free’ experime...
Yet another statistical analysis of the data of the ‘loophole free’ experime...Richard Gill
 
Validity Of Principle Of Exchange Of Stabilities Of Rivilin- Ericksen Fluid...
Validity Of  Principle Of  Exchange Of Stabilities Of Rivilin- Ericksen Fluid...Validity Of  Principle Of  Exchange Of Stabilities Of Rivilin- Ericksen Fluid...
Validity Of Principle Of Exchange Of Stabilities Of Rivilin- Ericksen Fluid...IRJET Journal
 
Peng & robinson, 1976
Peng & robinson, 1976Peng & robinson, 1976
Peng & robinson, 1976Vero Miranda
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...Eswar Publications
 
TWO DIMENSIONAL STEADY STATE HEAT CONDUCTION
TWO DIMENSIONAL STEADY STATE HEAT CONDUCTIONTWO DIMENSIONAL STEADY STATE HEAT CONDUCTION
TWO DIMENSIONAL STEADY STATE HEAT CONDUCTIONDebre Markos University
 
Thermal_diffusivity_of_plastic
Thermal_diffusivity_of_plasticThermal_diffusivity_of_plastic
Thermal_diffusivity_of_plasticMeirin Evans
 
Presentacion 1 lab heat final
Presentacion 1 lab heat finalPresentacion 1 lab heat final
Presentacion 1 lab heat finalJorge Sepulveda
 
Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...
Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...
Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...eSAT Publishing House
 
Visualize & analyze energy data
Visualize & analyze energy dataVisualize & analyze energy data
Visualize & analyze energy dataChaitali Bose Roy
 
Heat flow through concrete floor
Heat flow through concrete floorHeat flow through concrete floor
Heat flow through concrete floorAmy Do
 
Unit 3 transient heat condution
Unit 3 transient heat condutionUnit 3 transient heat condution
Unit 3 transient heat condutionYashawantha K M
 

What's hot (17)

Shortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-ExceShortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-Exce
 
Meteorologist
MeteorologistMeteorologist
Meteorologist
 
Yet another statistical analysis of the data of the ‘loophole free’ experime...
Yet another statistical analysis of the data of the  ‘loophole free’ experime...Yet another statistical analysis of the data of the  ‘loophole free’ experime...
Yet another statistical analysis of the data of the ‘loophole free’ experime...
 
Validity Of Principle Of Exchange Of Stabilities Of Rivilin- Ericksen Fluid...
Validity Of  Principle Of  Exchange Of Stabilities Of Rivilin- Ericksen Fluid...Validity Of  Principle Of  Exchange Of Stabilities Of Rivilin- Ericksen Fluid...
Validity Of Principle Of Exchange Of Stabilities Of Rivilin- Ericksen Fluid...
 
Fins
FinsFins
Fins
 
Peng & robinson, 1976
Peng & robinson, 1976Peng & robinson, 1976
Peng & robinson, 1976
 
A02120105
A02120105A02120105
A02120105
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
 
TWO DIMENSIONAL STEADY STATE HEAT CONDUCTION
TWO DIMENSIONAL STEADY STATE HEAT CONDUCTIONTWO DIMENSIONAL STEADY STATE HEAT CONDUCTION
TWO DIMENSIONAL STEADY STATE HEAT CONDUCTION
 
Class 9 mathematical modeling of thermal systems
Class 9   mathematical modeling of thermal systemsClass 9   mathematical modeling of thermal systems
Class 9 mathematical modeling of thermal systems
 
Thermal_diffusivity_of_plastic
Thermal_diffusivity_of_plasticThermal_diffusivity_of_plastic
Thermal_diffusivity_of_plastic
 
Presentacion 1 lab heat final
Presentacion 1 lab heat finalPresentacion 1 lab heat final
Presentacion 1 lab heat final
 
Heat Convection by Latif M. Jiji - solutions
Heat Convection by Latif M. Jiji - solutionsHeat Convection by Latif M. Jiji - solutions
Heat Convection by Latif M. Jiji - solutions
 
Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...
Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...
Analysis of mhd non darcian boundary layer flow and heat transfer over an exp...
 
Visualize & analyze energy data
Visualize & analyze energy dataVisualize & analyze energy data
Visualize & analyze energy data
 
Heat flow through concrete floor
Heat flow through concrete floorHeat flow through concrete floor
Heat flow through concrete floor
 
Unit 3 transient heat condution
Unit 3 transient heat condutionUnit 3 transient heat condution
Unit 3 transient heat condution
 

Viewers also liked

Solving dynamics problems with matlab
Solving dynamics problems with matlabSolving dynamics problems with matlab
Solving dynamics problems with matlabSérgio Castilho
 
Applications of Integrations
Applications of IntegrationsApplications of Integrations
Applications of Integrationsitutor
 
Integration
IntegrationIntegration
Integrationsuefee
 
Methods of multivariate analysis
Methods of multivariate analysisMethods of multivariate analysis
Methods of multivariate analysisharamaya university
 
THE CALCULUS INTEGRAL (Beta Version 2009)
THE CALCULUS INTEGRAL (Beta Version 2009)THE CALCULUS INTEGRAL (Beta Version 2009)
THE CALCULUS INTEGRAL (Beta Version 2009)briansthomson
 
Integral Calculus
Integral CalculusIntegral Calculus
Integral Calculusitutor
 
ppt on application of integrals
ppt on application of integralsppt on application of integrals
ppt on application of integralsharshid panchal
 
Applied Multivariate Techniques
Applied Multivariate TechniquesApplied Multivariate Techniques
Applied Multivariate Techniquesdivyaj82
 
Research methodology for behavioral research
Research methodology for behavioral researchResearch methodology for behavioral research
Research methodology for behavioral researchrip1971
 
Calculus in real life
Calculus in real lifeCalculus in real life
Calculus in real lifeSamiul Ehsan
 

Viewers also liked (19)

Chap019
Chap019Chap019
Chap019
 
Multivariate
MultivariateMultivariate
Multivariate
 
Solving dynamics problems with matlab
Solving dynamics problems with matlabSolving dynamics problems with matlab
Solving dynamics problems with matlab
 
Integration
IntegrationIntegration
Integration
 
Applications of Integrations
Applications of IntegrationsApplications of Integrations
Applications of Integrations
 
Multivariate Analysis
Multivariate AnalysisMultivariate Analysis
Multivariate Analysis
 
Integration
IntegrationIntegration
Integration
 
Integration Ppt
Integration PptIntegration Ppt
Integration Ppt
 
Methods of multivariate analysis
Methods of multivariate analysisMethods of multivariate analysis
Methods of multivariate analysis
 
Integral calculus
Integral calculusIntegral calculus
Integral calculus
 
THE CALCULUS INTEGRAL (Beta Version 2009)
THE CALCULUS INTEGRAL (Beta Version 2009)THE CALCULUS INTEGRAL (Beta Version 2009)
THE CALCULUS INTEGRAL (Beta Version 2009)
 
Gate mathematics
Gate mathematicsGate mathematics
Gate mathematics
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Integral Calculus
Integral CalculusIntegral Calculus
Integral Calculus
 
ppt on application of integrals
ppt on application of integralsppt on application of integrals
ppt on application of integrals
 
Applied Multivariate Techniques
Applied Multivariate TechniquesApplied Multivariate Techniques
Applied Multivariate Techniques
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Research methodology for behavioral research
Research methodology for behavioral researchResearch methodology for behavioral research
Research methodology for behavioral research
 
Calculus in real life
Calculus in real lifeCalculus in real life
Calculus in real life
 

Similar to Predictive Modelling of Ice Days in Alaska

Maths A - Chapter 11
Maths A - Chapter 11Maths A - Chapter 11
Maths A - Chapter 11westy67968
 
1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-mainZulyy Astutik
 
1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-mainZulyy Astutik
 
Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...
Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...
Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...Agriculture Journal IJOEAR
 
20180220 monbetsu18 presentation
20180220 monbetsu18 presentation20180220 monbetsu18 presentation
20180220 monbetsu18 presentationSyo Kyojin
 
Chapter 16 Inference for RegressionClimate ChangeThe .docx
Chapter 16 Inference for RegressionClimate ChangeThe .docxChapter 16 Inference for RegressionClimate ChangeThe .docx
Chapter 16 Inference for RegressionClimate ChangeThe .docxketurahhazelhurst
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionKhalid Aziz
 
1 s2.0-s0309170811002351-main
1 s2.0-s0309170811002351-main1 s2.0-s0309170811002351-main
1 s2.0-s0309170811002351-mainRameshGowda24
 
Capstone Poster Design 4-23
Capstone Poster Design 4-23Capstone Poster Design 4-23
Capstone Poster Design 4-23Trevor Bengtsson
 
Climate Change in Cincinnati
Climate Change in CincinnatiClimate Change in Cincinnati
Climate Change in CincinnatiSpandana Pothuri
 
Short-term load forecasting with using multiple linear regression
Short-term load forecasting with using multiple  linear regression Short-term load forecasting with using multiple  linear regression
Short-term load forecasting with using multiple linear regression IJECEIAES
 
IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...
IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...
IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...StatsCommunications
 
ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...
ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...
ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...ijfls
 
Intro to kinetics
Intro to kineticsIntro to kinetics
Intro to kineticslallen
 
11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeriaAlexander Decker
 

Similar to Predictive Modelling of Ice Days in Alaska (20)

Climate Extremes Workshop - The Dependence Between Extreme Precipitation and...
Climate Extremes Workshop -  The Dependence Between Extreme Precipitation and...Climate Extremes Workshop -  The Dependence Between Extreme Precipitation and...
Climate Extremes Workshop - The Dependence Between Extreme Precipitation and...
 
Maths A - Chapter 11
Maths A - Chapter 11Maths A - Chapter 11
Maths A - Chapter 11
 
1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main
 
1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main1 s2.0-0272696386900197-main
1 s2.0-0272696386900197-main
 
Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...
Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...
Presentation of Four Centennial-long Global Gridded Datasets of the Standardi...
 
20180220 monbetsu18 presentation
20180220 monbetsu18 presentation20180220 monbetsu18 presentation
20180220 monbetsu18 presentation
 
Chapter 16 Inference for RegressionClimate ChangeThe .docx
Chapter 16 Inference for RegressionClimate ChangeThe .docxChapter 16 Inference for RegressionClimate ChangeThe .docx
Chapter 16 Inference for RegressionClimate ChangeThe .docx
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
1 s2.0-s0309170811002351-main
1 s2.0-s0309170811002351-main1 s2.0-s0309170811002351-main
1 s2.0-s0309170811002351-main
 
Time Series Decomposition
Time Series DecompositionTime Series Decomposition
Time Series Decomposition
 
Capstone Poster Design 4-23
Capstone Poster Design 4-23Capstone Poster Design 4-23
Capstone Poster Design 4-23
 
Climate Change in Cincinnati
Climate Change in CincinnatiClimate Change in Cincinnati
Climate Change in Cincinnati
 
Short-term load forecasting with using multiple linear regression
Short-term load forecasting with using multiple  linear regression Short-term load forecasting with using multiple  linear regression
Short-term load forecasting with using multiple linear regression
 
Unu gtp-sc-04-13
Unu gtp-sc-04-13Unu gtp-sc-04-13
Unu gtp-sc-04-13
 
IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...
IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...
IAOS 2018 - Seasonal Adjustment of Daily Data, D. Ladiray, J. Palate, G.L. Ma...
 
Answers
AnswersAnswers
Answers
 
ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...
ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...
ANNUAL PRECIPITATION IN SOUTHERN OF MADAGASCAR: MODELING USING HIGH ORDER FUZ...
 
Intro to kinetics
Intro to kineticsIntro to kinetics
Intro to kinetics
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria
 

Predictive Modelling of Ice Days in Alaska

  • 1. Self-made project: Predictive Modelling and Multivariate Analysis using climate data October 1, 2016 Harris Phan 1 Introduction/Aim To find the significant factors which contribute to Ice free days in Kotzbehue, Alaska. The dataset can be found on this website: http://www.multivariatestatistics.org/data.html There are many factors that could contribute to icy weather in any country. To be able to predict any statistical model, one must consider data from the past. The data that has been recorded will be used to predict how many icy days will there be in the future. However, sometimes datasets have factors that seem similar to each other, which means they could be made redundant thus the need for multivariate analysis techniques such as principal components analysis and factor analysis. The first part of this report will consist of multivariate analysis techniques, starting with the multivariate test of normality, Hotelling’s T2 , multiple and partial correlation, principal components analysis and factor analysis. The second part will consists of predictive modelling techniques such as time series regression (possibly neural networks). SAS will be used for this report. 1
  • 2. 2 Data and description Data Description Year The Year in which the data was recorded from 1981-2003. AO This stands for Arctic Oscillation. https://www.ncdc.noaa.gov/teleconnections/ao/ for more information. AOsumm represents the AO in the summer and AOwint represents the AO in the winter. NPI This stands for the North Pacific Index, which is the area-weighted sea level pressure. NPIspring represents the NPI in spring and NPIwinter represents the NPI in winter. More information can be found here: https://climatedataguide.ucar.edu/ climate-data/north-pacific-np-index -trenberth-and-hurrell-monthly-and-winter Temp Represents the temperature in degrees celsius. TempSummer rep- resents the temperature in summer and TempWinter represents the temperature in winter. Rain This represents the amount of rainfall. RainSumm represents how much rain fell in summer and RainWint represents how much rain fell in winter. Ice This represents how much ice was there during the whole year. Ice- JanJul represents how much ice was there between January and July on averageand IceOctDec represents how much ice was there on av- erage between October and December. IceFreeDays This represents how many days that were ice-free in Kotzbehue, Alaska. 3 Multivariate analysis 3.1 Tests of multivariate normality In order to assume that 3 or more variables have reasonable multivariate normal characteristics, 2 test statistics will be used here: Mardia’s multivariate skewness and kurtosis measures. A hypothesis test will be conducted: H0 : (X1, X2, X3, X4, X5)T is multivariate normal H1 : (X1, X2, X3, X4, X5)T is not multivariate normal. 2
  • 3. Now we use Mardia’s multivariate skewness (κ1) and kurtosis measures (κ2), which is given by: κ1 = nˆβ1,p/6 ∼ χ2 p(p+1)(p+2)/6 κ2 = [ˆβ2,p − p(p + 2)]/[8p(p + 2)/n] 1 2 ∼ N(0, 1). Since there is no default procedure, I will put down my code here: r = nrow(x); c = ncol(x); dfc = c*(c+1)*(c+2)/6; q = i(r) - (1/r)*j(r,r,1); s = (1/(r))*x‘*q*x ; s_inv = inv(s) ; g_matrix = q*x*s_inv*x‘*q ; beta1hat = ( sum(g_matrix#g_matrix#g_matrix) )/(r*r) ; beta2hat =trace( g_matrix#g_matrix )/r ; kappa1 = r*beta1hat/6 ; kappa2 = (beta2hat - c*(c+2) ) /sqrt(8*c*(c+2)/r) ; pvalskew = 1 - probchi(kappa1,dfc) ; pvalkurt = 2*( 1 - probnorm(abs(kappa2)) ) ; print s ; print s_inv ; print beta1hat ; print kappa1 ; print pvalskew ; print beta2hat ; print kappa2 ; print pvalkurt ; quit; Suppose we let X1 = AO, X2 = NPI, X3 = Temp, X4 = Rain and X5 = Ice. By our SAS output, κ1 = 23.527265, which has a respective p-value of 0.930064. This means at the 0.05 significance level, there is not enough evidence to reject the null hypothesis. However we also need κ2, which is 29.913077 and corresponds to the kurtosis of the multivariate normal distribution, which has a respective p-value of 0.1448567 hence at the 0.05 significance level there is enough evidence to reject the null hypothesis. This means that the multivariate normality assumption is not in reasonable agreement with the data. Note that this is only considering the year round factors, season has not been accounted for. Therefore we can proceed with Hotelling’s T2 . 3.2 Hotelling’s T2 We shall compare the means of rainfall and Temperature in the summer and the means of rainfall and Temperature in the winter. The primary reason for using 3
  • 4. Hotelling’s T2 is to see if the seasons have an effect on both the temperature and the rainfall. The big hint here is to use the hypothesis: H0 : E X1 − X2 X3 − X4 = E Y1 Y2 = 0. This hypothesis can be formulated by using the matrix: Y = 1 0 −1 0 0 1 0 −1 X. The SAS code is provided here: proc corr cov noprob nocorr outp = outcovmatrix; var x1-x4; quit; proc iml; use outcovmatrix where(_TYPE_="COV"); read all var _NUM_ into cov[colname=varNames]; use outcovmatrix where(_TYPE_="MEAN"); read all var _NUM_ into meansMatrix[colname=varNames]; print cov, meansMatrix; Sigma_11 = cov[ {3 4}, {3 4}]; Sigma_22 = cov[ {1 2}, {1 2}]; Sigma_12 = cov[ {3 4}, {1 2}]; Sigma_21 = cov[ {1 2}, {3 4}]; mu_1 = meansMatrix[{3 4}]; mu_2 = meansMatrix[{1 2}]; SInv = inv(cov); print SInv; y = {1 0 -1 0, 0 1 0 -1}; covyx = y*cov*y‘; print covyx; covyxinv = inv(covyx); print covyxinv; mudiff_1 = meansMatrix[{1}]-meansMatrix[{2}]; mudiff_2 = meansMatrix[{3}]-meansMatrix[{4}]; print mudiff_1; print mudiff_2; n = 23; p = 4; mu_vect = {27.396087,3.2408696}; T2 = 23* mu_vect‘ * covyxinv * mu_vect; print T2; critical_value_of_F = p*(n-1)/(n-p)*finv(.95, p, n-p); print critical_value_of_F; quit; 4
  • 5. Note that the formula for Hotelling’s T2 is given by: T2 = n(¯X − µ0) S−1 (¯X − µ0) ∼ (n − 1)p n − p Fp,n−p, By the SAS output, Hotelling’s T2 is 5631.1594, which is greater than the critical value of 13.408918. This means there is enough evidence to reject the null hypothesis and thus there is a difference between the means of the rainfall and the temperature in the summer and the means of the rainfall and the temperature in the winter. Now do something similar, except this time we compare the means of AO and Rain in the summer and the means of AO and rain in the winter. The Hotelling’s T2 value is 5945.4702, which is greater than the critical value of 13.408918. Therefore overall, summer conditions differ greatly from winter conditions. 3.3 Simple, Partial and Multiple correlation We now wish to find out the strength of the relationships between variables. To do this, correlation procedures are used. If the variables are highly correlated with each other, that makes them a potential candidate for principal components analysis and factor analysis, which will be done later. In thus subsection, the same types of variables will be tested. That is test the correlation of A0 like variables, NPI like variables, Temperature like variables, Rain like variables and Ice like variables. Then we test to see if there is a strong correlation between days with no ice and other variables. Firstly the AO variables will be tested. Simple correlation in SAS gives us, proc corr data=climatedata; var AO AO_wint AO_summ; run; AO AOwint AOsumm AO 1.00000 0.58909 0.56242 AOwint 0.58909 1.00000 0.93234 AOsumm 0.56242 0.93234 1.00000 So there is a strong correlation between Arctic Oscillation in Winter and Arctic Oscillation in Winter. However the overall Arctic Oscillation is not that strongly correlated between the Winter and Summer Arctic Oscillation. Now consider the NPI like variables, proc corr data=climatedata; var NPI NPI_spring NPI_winter; run; NPI NPIspring NPIwinter NPI 1.00000 0.18645 0.99790 NPIspring 0.18645 1.00000 0.19444 NPIwinter 0.99790 0.19444 1.00000 5
  • 6. In this case, only the NPI in winter has a strong correlation with the overall NPI. The others are weakly correlated with each other. However out of curiousity I decided to try and use partial correlation. proc corr data=climatedata; var NPI NPI_spring; partial NPI_winter; run; The correlation between NPI and NPIspring with NPIwinter being taken as the partial variable is -0.11919, which makes it negatively correlated but still weak. However making NPIwinter the variable and NPIspring the partial gives a correlation coefficient closer to 1 which means taking the spring NPI into account, the winter NPI becomes more strongly correlated with the overall NPI. 3.4 Principal components analysis We now do Principal Components analysis. This is a variable reduction proce- dure and in the climate dataset, there seems to be variables that are correlated with each other. It is advised that when using Principal components analysis requires a large sample size. This is due to correlations needing large sample size before they stabilise. The SAS princomp procedure would allow us to do Principal components analysis. There are 3 parts to the princomp procedure: The simple statistics and the correlation matrix, eigenvalues of the correlation matrix and the eigenvectors and finally the scree plot/variance explained. The eigenvalues are simply the variances of the principal components themselves. proc princomp data= climatedata; var Ice Ice_JanJul Ice_OctDec; run; What is important here is the scree plot as it graphs the eigenvalue against the component number. For the 3 candidate variables Ice, IceJanJul and IceOctDec, By the minigen criterion, only one of the eigenvalues is above 1 and it should be the only one that is retained. Components with an eigenvalue of less than 1 become of little use. So according to principal components analysis, seasonality does not really matter much so only Ice is retained. proc princomp data= climatedata; var AO AO_wint AO_summ; run; Even though AOsumm and AOwint have a correlation of above 0.9, I will still continue with it in the Principal Components Analysis. Again like the Ice-like variable, there is only one eigenvalue which is above 1 when it comes to AO-like variable. So again it seems that according to PCA, seasonality does not matter 6
  • 7. and we retain AO was the variable. Using a similar princomp procedure we see that only one NPI-like variable should be used. proc princomp data= climatedata; var Temp Temp_summ Temp_wint; run; However, Temperature-like variables become more tricky to handle. We now get eigenvalues of 1.12191146, 1.02992761 and 0.84816093. It is clear from the cor- relation matrix provided in princomp that the temperature in summer is weakly correlated to the yearly temperature. Same goes for the winter temperature. Thus seasonal variance should be accounted for. Therefore by the eyeball test, none of the Temperature-like variables should be discarded. 3.5 Factor Analysis 7