SlideShare a Scribd company logo
NATIONAL COLLEGE OF IRELAND
STATISTICS FOR DATA ANALYTICS
CA 2 - PROJECT
Analysis of statistical models
Submitted by,
SRIVATSAV KATTUKOTTAI MANI
X18145922
MSc in Data Analytics ‘B’
(MSCDAD_B)
MULTIPLE REGRESSION MODEL
Multiple Regression is the method used for analysis or prediction of an independent variable
(also called as outcome) using two or more dependent variables (also called as predictors). This
method can be used to predict the variance of the model and contribution made by the
independent variables to obtain the overall variance.
Objective of Analysis:
The main objective of performing multiple regression analysis to the collected data is to predict
the Average daily traffic rate in various regions of New Zealand using other factors like peak
traffic rate, percent of cars/light commercial/medium commercial/heavy commercial vehicles.
Context of data being analysed:
Cleaned data contains 7 columns such as Average daily traffic rate, peak traffic rate and percent
of light/medium/heavy commercial1 and heavy commercial2 vehicles. All the measures are
taken for various co-ordinates and peak hours of New Zealand country.
Data Source used:
The dataset used in this model has been taken from the New Zealand Government data
depository:
https://www.data.govt.nz/
Fig.1 attached below shows the sample of cleaned data from the dataset. The raw data collected
from the depository contains nearly 9359 rows and 15 columns. To make our data suitable for
analysis using multiple regression, it has been cleaned by removing the null values and further
reduced to around 500 rows with 7 columns for making reliable predictions.
Fig.1: Sample of data used for multiple regression analysis.
Measurement levels of all variables:
1 dependent and 6 independent variables has been used in our dataset. All the variables used are
continuous and there is no ratio/interval/categorical variables used for analysis.
According to Tabachnick and Fidell (2007, p.123), the formula for calculating the size of sample
with independent variables taken into account is N > 50+8m (where m= number of independent
variables). In our case, m=6, hence N>98. Since the data obtained from depository is huge, it is
cleaned and 500 samples has been taken into account for analysis.
Procedures for multiple regression analysis:
 After importing the data into SPSS software, Select Analyze > Regression > Linear.
 Drag the dependent variable (Average daily traffic rate) and the independent variables
(peak traffic rate, percent of cars/light commercial/medium commercial/heavy commercial
vehicles) and drop into their respective fields.
 Click Statistics button, tick the Estimates, Confidence intervals set at 95%, Part and partial
correlations, Model fit, collinearity diagnostics, Descriptives check boxes.
 Click Plots button, move *ZPRED into X-box and *ZRESID into Y-box. Under
Standardized Residual Plots, tick the Histogram and Normal probability plot check boxes.
 Click Continue and OK to view the results.
Assumptions for multiple regression model:
 Dependent variable should be measured on a continuous scale.
 Two or more independent variables should be used which can be either continuous or
categorical.
 Multicollinearity should not be present. Multi-collinearity occurs when two or more
independent variables are highly correlated with each other.
 Homoscedasticity should be present. (i.e.) variances along the line of best fit remain similar
as you move along the line.
 Linear relationship between dependent and independent variable should be present. Also
there should be a linear relationship between each of the independent variables.
Checking that Assumptions used are not violated:
1. Muticollinearity should not be present:
Fig.2 attached below shows the correlations table of the analysed data. As per reference to
Julie Pallant (2007, p.155), correlation between dependant and the independent variables
should be above 0.3. From the below figure, we can see Peak traffic and Percent heavy
commercial2 variables correlate substantially with Average daily traffic rate (0.690 and -0.313
respectively). Also below figure shows the correlation between the independent variables were
not too high. (i.e. none of the independent variables has the correlation value above 0.7 (as
referred to Julie Pallant (2007, p.155) hence all are retained).
Fig.2: Correlations table of the output
Reference to Julie Pallant (2007, p.156), multicollinearity can be predicted using the
Tolerance and VIF (Variance Inflation Factor) values that correlation is very high if the
Tolerance value is less than 0.10 and VIF is above 10. From the below Fig.3, we can see, the
correlation is under control (Tolerance > 0.10 and VIF <10), thus the multicollinearity
assumption is not violated.
Fig.3: Coefficients table of the output
2. Homoscedasticity and linear relationship should be present:
Below attached Figures 4, 5 & 6 shows the Histogram Plot, Normal P-P Plot and Scatter Plot
respectively of the variables used for analysis. Histogram Plot shows that the model
undergoes normal distribution and P-P Plot shows there is no much deviations from
normality and there is a linear relationship between dependent and independent variables, at
last the Scatter Plot shows the presence of homoscedasticity as the samples are centralised.
Fig.4: Histogram Plot of the output
Fig.5: Normal P-P Plot of the output
Fig.6: Scatter Plot of the output
3. Analysing the Results or output of the model:
According to Julie Pallant (2007, p.158), R-square value explains the variance of the
dependent variable (Average daily traffic) with respect to the independent variables. From
below Fig.7 we can see the R-square (variance) of dependent variable is 53.1% and Adjusted
R-square value is 52.6% which depicts better estimation of total population value. The
quality in which the dependent variable (Average daily traffic) is predicted can be given by R
value which shows 72.9%, hence providing a good prediction level.
Fig.7: Model Summary of the model
4. Evaluation of independent variables:
From the attached Fig.8: ANOVA table, we can see that Significance (p-value) is less than
0.05 with degrees of freedom 5 and 493 and F value as 111.724 (i.e. F (5,493) =111.724),
thus making the model statistically significant.
With reference to Julie Pallant (2007, p.159), beta values under Standardized coefficients
and their Significance (p-value) explains the significant contribution of a particular variable
in explaining the dependent variable. From Fig.8.1, it is clear that three variables (Peak
traffic, percent heavy commercial 1 and percent heavy commercial 2) makes unique
significant contribution in predicting the dependent variable (Average daily traffic) with the
p-values < 0.05. To form the regression equation, we can use the Unstandardized B values as
below.
ADT => Average Daily Traffic
ADT = 58.181+ (3.186*Peak traffic)-(0.813*percent light commercial) + (0.221*percent
medium commercial)-(2.740*percent heavy commercial1)-(9.281*percent heavy
commercial2)
Fig.8: ANOVA table
Fig.8.1: Coefficients table
Percent Car variable has been excluded since it has Tolerance = 0, which means the prediction
made by this variable is redundant of another variable.
Fig.9: Excluded Variables table.
Conclusion:
By using the Multiple regression model, it can be concluded that (Peak traffic, percent heavy
commercial 1 and percent heavy commercial 2) variables makes unique significant contribution
in predicting the dependent variable (Average daily traffic) out of which Peak traffic variable
provides maximum contribution for ADT with overall quality of prediction value equals 72.9%.
---------------------------------------------------------------------------------------------------------------------
BINARY LOGISTIC REGRESSION MODEL
Binomial logistic regression is a method of analysis used for predicting the chance that the
prediction falls into one of two categories of a dichotomous dependent variable based on two or
more independent variables which can either be categorical or continuous.
Data Source used:
The dataset used in this model has been taken from UN data depository:
http://data.un.org/
Fig.1 attached below shows the sample of cleaned data from the dataset. The raw data collected
from the depository contains nearly 3095 rows and 8 columns. To make our data suitable for
analysis using logistic regression, it has been cleaned by removing the null values and further
reduced to around 500 rows with 4 columns for making reliable predictions.
Fig.10: Sample data of logistic regression model
Objective of Analysis:
The main objective of using binary logistic regression model is to predict the growth rate
(dependent variable) of a country is increased or decreased using percent of employees in
services, industries and agriculture fields (independent variables).
Context of data being analysed:
Cleaned data contains 4 columns such as Growth rate, percent services, percent industry and
percent agriculture for various countries to predict whether the growth rate change depends on
percent of employees in various sectors.
Measurement level of variables:
1 dependent and 3 independent variables has been used in our dataset where the independent
variables (Percent in services, industry and agriculture) are of continuous type and the dependent
variable (Growth rate) is of dichotomous type. Since the data obtained from depository is huge, it
is cleaned and 500 samples or residuals has been taken into account for analysis.
Procedures for multiple regression analysis:
 After importing the data into SPSS software, Select Analyze > Regression > Binary
Logistic.
 Drag the dependent variable (Growth rate) into dependent field and the independent
variables (percent of services, industry and agriculture) into the Covariates box.
 Click Options button, tick the CI for Exp (B), casewise listing of residuals, Classification
Plots and Hosmer-Lemeshow goodness of fit check boxes.
 Click Continue and OK to see the output.
Assumptions for Binary logistic regression model:
 The dependent variable should be a dichotomous or binary categorical variable.
 Independent variables should be continuous or categorical type.
 Categories of dependent variable should be mutually exclusive.
 There should be linear relationship between dependent and independent variables.
 High intercorrelation between independent (predictors) variables should be present.
Analysing the Results or output of the model:
Fig.11 attached below represents the total number of cases or samples used in this model whereas
Fig.12 shows how the dichotomous dependent variable has been encoded in SPSS. In this case, if
the growth rate is increased, it is encoded as 1 and 0 if it is decreased.
(i.e. increase = 1 and decrease = 0).
Fig.11: Case processing summary table
Fig.12: Dependent Variable encoding table
Below Fig.13 (Block 0) clearly depicts the prediction of the model by SPSS without including
the independent variables with overall percentage of 51.8%.
Fig.13: Classification table of output
Below Fig.14 (Block 1) shows the results of the logistic regression model by inclusion of all the
independent variables or predictors. Omnibus test provide better accuracy over the results
obtained for Block 0 (without predictors). In this case, we can see the significance of all the
independent variables were below 0.05, thus making the model a better one than Block 0.
Fig.14: Omnibus tests output
Hosmer and Lemeshow test is another form of testing the goodness of fit. According to Julie
Pallant (2007, p.174), poor fit is indicated by the significance value less than 0.05. In this case
from below Fig.15 we can see the significance value is 0.803 which clearly depicts the model has
a good fit.
Fig.15: Hosmer and Lemeshow Test output
Below Fig.16 has two values for Cox & Snell R-square and Nagelkerke R-square which explains
the variance of dependent variable due to the predictors in this model. We can see the variation
lies between 62% and 82.7%.
Fig.16: Model summary table of output
From the below Fig.17, we can see there is an improvement in the overall prediction of the
model with inclusion of predictors when compared to Block 0 without predictors. And we can
see the overall prediction has been increased from 51.8% to 92.2% with a sensitivity of 91.1%
prediction in increase of growth rate and 93.4% prediction of decrease in growth rate.
Fig.17: Classification table
Considering Fig.18. Variables in the equation table, we can see the percent industry and percent
services variables provide statistically significant results with p < 0.05. B values provide a direct
relationship with dependent variable (Growth rate). Since all the B values are negative, we can
depict that more percent of employees in services, industry and agriculture leads to increase in
growth rate of the country. The regression equation will be as follows:
Growth rate = 16.409 – (0.075*percent_services) – (0.453*percent_industry) – (0.163*percent
agriculture)
Fig.18: Variables in the Equation
Conclusion:
This model contains 3 independent variables (percent of employees in services, industry and
agriculture) which clearly supports the analysis significantly with χ2
(3, N=500) = 483.966,
p<0.05. Hence we can conclude that the percent of employees in industry and agriculture
contribute maximum prediction whether the growth rate of a country is increasing or decreasing
with an overall prediction quality of 92.2%.
References:
[1] SPSS survival manual, Julie Pallant, 3rd Edition (2007)
[2] https://statistics.laerd.com/spss-tutorials/binomial-logistic-regression-using-spss-statistics.php
[3] https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php
[4] Using Multivariate Statistics, Tabachnick & Fidell, 5th Edition (2007)

More Related Content

What's hot

Econometrics project
Econometrics projectEconometrics project
Econometrics project
Shubham Joon
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
Satish Gupta
 
Assessing Discriminatory Performance of a Binary Logistic Regression Model
Assessing Discriminatory Performance of a Binary Logistic Regression ModelAssessing Discriminatory Performance of a Binary Logistic Regression Model
Assessing Discriminatory Performance of a Binary Logistic Regression Model
sajjalp
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02
Stephen Ong
 
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
Renzil D'cruz
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
Kaushik Rajan
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
Allame Tabatabaei
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
somimemon
 
Regression
RegressionRegression
Regression
SAURABH KUMAR
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments
Dr Lendy Spires
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
Julio Martinez Andrade
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
saba khan
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
Anirudha si
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
Venkata Reddy Konasani
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Restaurant Revenue Prediction using Machine Learning
Restaurant Revenue Prediction using Machine LearningRestaurant Revenue Prediction using Machine Learning
Restaurant Revenue Prediction using Machine Learning
researchinventy
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
Raymond Christopher Peralta
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
Vamshi krishna Guptha
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
Dr. C.V. Suresh Babu
 

What's hot (20)

Econometrics project
Econometrics projectEconometrics project
Econometrics project
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
Assessing Discriminatory Performance of a Binary Logistic Regression Model
Assessing Discriminatory Performance of a Binary Logistic Regression ModelAssessing Discriminatory Performance of a Binary Logistic Regression Model
Assessing Discriminatory Performance of a Binary Logistic Regression Model
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02
 
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression
RegressionRegression
Regression
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Restaurant Revenue Prediction using Machine Learning
Restaurant Revenue Prediction using Machine LearningRestaurant Revenue Prediction using Machine Learning
Restaurant Revenue Prediction using Machine Learning
 
Case Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R CodeCase Study of Petroleum Consumption With R Code
Case Study of Petroleum Consumption With R Code
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 

Similar to X18145922 statistics ca2 final

Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
Saleesh Satheeshchandran
 
Modelling Inflation using Generalized Additive Mixed Models (GAMM)
Modelling Inflation using Generalized Additive Mixed Models (GAMM)Modelling Inflation using Generalized Additive Mixed Models (GAMM)
Modelling Inflation using Generalized Additive Mixed Models (GAMM)
AI Publications
 
report
reportreport
report
Arthur He
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
ABHISHEKDAHALE
 
Regression and Classification Analysis
Regression and Classification AnalysisRegression and Classification Analysis
Regression and Classification Analysis
YashIyengar
 
Statistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic RegressionStatistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic Regression
SindhujanDhayalan
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
Satyam Barsaiyan
 
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisFuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
IRJET Journal
 
5.0 -Chapter Introduction
5.0 -Chapter Introduction5.0 -Chapter Introduction
5.0 -Chapter Introduction
Sabrina Baloi
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
Nisheet Mahajan
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptx
BHUSHANKPATEL
 
A Prediction Model for Taiwan Tourism Industry Stock Index
A Prediction Model for Taiwan Tourism Industry Stock IndexA Prediction Model for Taiwan Tourism Industry Stock Index
A Prediction Model for Taiwan Tourism Industry Stock Index
ijcsit
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
Osama Yousaf
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
ijmvsc
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
clairbycraft
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
DaliaCulbertson719
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docx
Mr. Moms
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
TanyaWadhwani4
 
Forecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewForecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A Review
Dr. Sudhir Kumar Srivastava
 
Managerial Economics (Chapter 5 - Demand Estimation)
 Managerial Economics (Chapter 5 - Demand Estimation) Managerial Economics (Chapter 5 - Demand Estimation)
Managerial Economics (Chapter 5 - Demand Estimation)
Nurul Shareena Misran
 

Similar to X18145922 statistics ca2 final (20)

Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Modelling Inflation using Generalized Additive Mixed Models (GAMM)
Modelling Inflation using Generalized Additive Mixed Models (GAMM)Modelling Inflation using Generalized Additive Mixed Models (GAMM)
Modelling Inflation using Generalized Additive Mixed Models (GAMM)
 
report
reportreport
report
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Regression and Classification Analysis
Regression and Classification AnalysisRegression and Classification Analysis
Regression and Classification Analysis
 
Statistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic RegressionStatistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic Regression
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisFuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
 
5.0 -Chapter Introduction
5.0 -Chapter Introduction5.0 -Chapter Introduction
5.0 -Chapter Introduction
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptx
 
A Prediction Model for Taiwan Tourism Industry Stock Index
A Prediction Model for Taiwan Tourism Industry Stock IndexA Prediction Model for Taiwan Tourism Industry Stock Index
A Prediction Model for Taiwan Tourism Industry Stock Index
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
 
NPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docxNPTL Machine Learning Week 2.docx
NPTL Machine Learning Week 2.docx
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Forecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewForecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A Review
 
Managerial Economics (Chapter 5 - Demand Estimation)
 Managerial Economics (Chapter 5 - Demand Estimation) Managerial Economics (Chapter 5 - Demand Estimation)
Managerial Economics (Chapter 5 - Demand Estimation)
 

Recently uploaded

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 

Recently uploaded (20)

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 

X18145922 statistics ca2 final

  • 1. NATIONAL COLLEGE OF IRELAND STATISTICS FOR DATA ANALYTICS CA 2 - PROJECT Analysis of statistical models Submitted by, SRIVATSAV KATTUKOTTAI MANI X18145922 MSc in Data Analytics ‘B’ (MSCDAD_B)
  • 2. MULTIPLE REGRESSION MODEL Multiple Regression is the method used for analysis or prediction of an independent variable (also called as outcome) using two or more dependent variables (also called as predictors). This method can be used to predict the variance of the model and contribution made by the independent variables to obtain the overall variance. Objective of Analysis: The main objective of performing multiple regression analysis to the collected data is to predict the Average daily traffic rate in various regions of New Zealand using other factors like peak traffic rate, percent of cars/light commercial/medium commercial/heavy commercial vehicles. Context of data being analysed: Cleaned data contains 7 columns such as Average daily traffic rate, peak traffic rate and percent of light/medium/heavy commercial1 and heavy commercial2 vehicles. All the measures are taken for various co-ordinates and peak hours of New Zealand country. Data Source used: The dataset used in this model has been taken from the New Zealand Government data depository: https://www.data.govt.nz/ Fig.1 attached below shows the sample of cleaned data from the dataset. The raw data collected from the depository contains nearly 9359 rows and 15 columns. To make our data suitable for analysis using multiple regression, it has been cleaned by removing the null values and further reduced to around 500 rows with 7 columns for making reliable predictions. Fig.1: Sample of data used for multiple regression analysis.
  • 3. Measurement levels of all variables: 1 dependent and 6 independent variables has been used in our dataset. All the variables used are continuous and there is no ratio/interval/categorical variables used for analysis. According to Tabachnick and Fidell (2007, p.123), the formula for calculating the size of sample with independent variables taken into account is N > 50+8m (where m= number of independent variables). In our case, m=6, hence N>98. Since the data obtained from depository is huge, it is cleaned and 500 samples has been taken into account for analysis. Procedures for multiple regression analysis:  After importing the data into SPSS software, Select Analyze > Regression > Linear.  Drag the dependent variable (Average daily traffic rate) and the independent variables (peak traffic rate, percent of cars/light commercial/medium commercial/heavy commercial vehicles) and drop into their respective fields.  Click Statistics button, tick the Estimates, Confidence intervals set at 95%, Part and partial correlations, Model fit, collinearity diagnostics, Descriptives check boxes.  Click Plots button, move *ZPRED into X-box and *ZRESID into Y-box. Under Standardized Residual Plots, tick the Histogram and Normal probability plot check boxes.  Click Continue and OK to view the results. Assumptions for multiple regression model:  Dependent variable should be measured on a continuous scale.  Two or more independent variables should be used which can be either continuous or categorical.  Multicollinearity should not be present. Multi-collinearity occurs when two or more independent variables are highly correlated with each other.  Homoscedasticity should be present. (i.e.) variances along the line of best fit remain similar as you move along the line.  Linear relationship between dependent and independent variable should be present. Also there should be a linear relationship between each of the independent variables. Checking that Assumptions used are not violated: 1. Muticollinearity should not be present: Fig.2 attached below shows the correlations table of the analysed data. As per reference to Julie Pallant (2007, p.155), correlation between dependant and the independent variables should be above 0.3. From the below figure, we can see Peak traffic and Percent heavy commercial2 variables correlate substantially with Average daily traffic rate (0.690 and -0.313 respectively). Also below figure shows the correlation between the independent variables were not too high. (i.e. none of the independent variables has the correlation value above 0.7 (as referred to Julie Pallant (2007, p.155) hence all are retained).
  • 4. Fig.2: Correlations table of the output Reference to Julie Pallant (2007, p.156), multicollinearity can be predicted using the Tolerance and VIF (Variance Inflation Factor) values that correlation is very high if the Tolerance value is less than 0.10 and VIF is above 10. From the below Fig.3, we can see, the correlation is under control (Tolerance > 0.10 and VIF <10), thus the multicollinearity assumption is not violated. Fig.3: Coefficients table of the output
  • 5. 2. Homoscedasticity and linear relationship should be present: Below attached Figures 4, 5 & 6 shows the Histogram Plot, Normal P-P Plot and Scatter Plot respectively of the variables used for analysis. Histogram Plot shows that the model undergoes normal distribution and P-P Plot shows there is no much deviations from normality and there is a linear relationship between dependent and independent variables, at last the Scatter Plot shows the presence of homoscedasticity as the samples are centralised. Fig.4: Histogram Plot of the output Fig.5: Normal P-P Plot of the output
  • 6. Fig.6: Scatter Plot of the output 3. Analysing the Results or output of the model: According to Julie Pallant (2007, p.158), R-square value explains the variance of the dependent variable (Average daily traffic) with respect to the independent variables. From below Fig.7 we can see the R-square (variance) of dependent variable is 53.1% and Adjusted R-square value is 52.6% which depicts better estimation of total population value. The quality in which the dependent variable (Average daily traffic) is predicted can be given by R value which shows 72.9%, hence providing a good prediction level. Fig.7: Model Summary of the model 4. Evaluation of independent variables: From the attached Fig.8: ANOVA table, we can see that Significance (p-value) is less than 0.05 with degrees of freedom 5 and 493 and F value as 111.724 (i.e. F (5,493) =111.724), thus making the model statistically significant. With reference to Julie Pallant (2007, p.159), beta values under Standardized coefficients and their Significance (p-value) explains the significant contribution of a particular variable in explaining the dependent variable. From Fig.8.1, it is clear that three variables (Peak traffic, percent heavy commercial 1 and percent heavy commercial 2) makes unique significant contribution in predicting the dependent variable (Average daily traffic) with the
  • 7. p-values < 0.05. To form the regression equation, we can use the Unstandardized B values as below. ADT => Average Daily Traffic ADT = 58.181+ (3.186*Peak traffic)-(0.813*percent light commercial) + (0.221*percent medium commercial)-(2.740*percent heavy commercial1)-(9.281*percent heavy commercial2) Fig.8: ANOVA table Fig.8.1: Coefficients table Percent Car variable has been excluded since it has Tolerance = 0, which means the prediction made by this variable is redundant of another variable. Fig.9: Excluded Variables table.
  • 8. Conclusion: By using the Multiple regression model, it can be concluded that (Peak traffic, percent heavy commercial 1 and percent heavy commercial 2) variables makes unique significant contribution in predicting the dependent variable (Average daily traffic) out of which Peak traffic variable provides maximum contribution for ADT with overall quality of prediction value equals 72.9%. --------------------------------------------------------------------------------------------------------------------- BINARY LOGISTIC REGRESSION MODEL Binomial logistic regression is a method of analysis used for predicting the chance that the prediction falls into one of two categories of a dichotomous dependent variable based on two or more independent variables which can either be categorical or continuous. Data Source used: The dataset used in this model has been taken from UN data depository: http://data.un.org/ Fig.1 attached below shows the sample of cleaned data from the dataset. The raw data collected from the depository contains nearly 3095 rows and 8 columns. To make our data suitable for analysis using logistic regression, it has been cleaned by removing the null values and further reduced to around 500 rows with 4 columns for making reliable predictions. Fig.10: Sample data of logistic regression model
  • 9. Objective of Analysis: The main objective of using binary logistic regression model is to predict the growth rate (dependent variable) of a country is increased or decreased using percent of employees in services, industries and agriculture fields (independent variables). Context of data being analysed: Cleaned data contains 4 columns such as Growth rate, percent services, percent industry and percent agriculture for various countries to predict whether the growth rate change depends on percent of employees in various sectors. Measurement level of variables: 1 dependent and 3 independent variables has been used in our dataset where the independent variables (Percent in services, industry and agriculture) are of continuous type and the dependent variable (Growth rate) is of dichotomous type. Since the data obtained from depository is huge, it is cleaned and 500 samples or residuals has been taken into account for analysis. Procedures for multiple regression analysis:  After importing the data into SPSS software, Select Analyze > Regression > Binary Logistic.  Drag the dependent variable (Growth rate) into dependent field and the independent variables (percent of services, industry and agriculture) into the Covariates box.  Click Options button, tick the CI for Exp (B), casewise listing of residuals, Classification Plots and Hosmer-Lemeshow goodness of fit check boxes.  Click Continue and OK to see the output. Assumptions for Binary logistic regression model:  The dependent variable should be a dichotomous or binary categorical variable.  Independent variables should be continuous or categorical type.  Categories of dependent variable should be mutually exclusive.  There should be linear relationship between dependent and independent variables.  High intercorrelation between independent (predictors) variables should be present. Analysing the Results or output of the model: Fig.11 attached below represents the total number of cases or samples used in this model whereas Fig.12 shows how the dichotomous dependent variable has been encoded in SPSS. In this case, if the growth rate is increased, it is encoded as 1 and 0 if it is decreased. (i.e. increase = 1 and decrease = 0).
  • 10. Fig.11: Case processing summary table Fig.12: Dependent Variable encoding table Below Fig.13 (Block 0) clearly depicts the prediction of the model by SPSS without including the independent variables with overall percentage of 51.8%. Fig.13: Classification table of output Below Fig.14 (Block 1) shows the results of the logistic regression model by inclusion of all the independent variables or predictors. Omnibus test provide better accuracy over the results
  • 11. obtained for Block 0 (without predictors). In this case, we can see the significance of all the independent variables were below 0.05, thus making the model a better one than Block 0. Fig.14: Omnibus tests output Hosmer and Lemeshow test is another form of testing the goodness of fit. According to Julie Pallant (2007, p.174), poor fit is indicated by the significance value less than 0.05. In this case from below Fig.15 we can see the significance value is 0.803 which clearly depicts the model has a good fit. Fig.15: Hosmer and Lemeshow Test output Below Fig.16 has two values for Cox & Snell R-square and Nagelkerke R-square which explains the variance of dependent variable due to the predictors in this model. We can see the variation lies between 62% and 82.7%. Fig.16: Model summary table of output From the below Fig.17, we can see there is an improvement in the overall prediction of the model with inclusion of predictors when compared to Block 0 without predictors. And we can see the overall prediction has been increased from 51.8% to 92.2% with a sensitivity of 91.1% prediction in increase of growth rate and 93.4% prediction of decrease in growth rate.
  • 12. Fig.17: Classification table Considering Fig.18. Variables in the equation table, we can see the percent industry and percent services variables provide statistically significant results with p < 0.05. B values provide a direct relationship with dependent variable (Growth rate). Since all the B values are negative, we can depict that more percent of employees in services, industry and agriculture leads to increase in growth rate of the country. The regression equation will be as follows: Growth rate = 16.409 – (0.075*percent_services) – (0.453*percent_industry) – (0.163*percent agriculture) Fig.18: Variables in the Equation Conclusion: This model contains 3 independent variables (percent of employees in services, industry and agriculture) which clearly supports the analysis significantly with χ2 (3, N=500) = 483.966, p<0.05. Hence we can conclude that the percent of employees in industry and agriculture contribute maximum prediction whether the growth rate of a country is increasing or decreasing with an overall prediction quality of 92.2%. References: [1] SPSS survival manual, Julie Pallant, 3rd Edition (2007) [2] https://statistics.laerd.com/spss-tutorials/binomial-logistic-regression-using-spss-statistics.php [3] https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php [4] Using Multivariate Statistics, Tabachnick & Fidell, 5th Edition (2007)