SlideShare a Scribd company logo
1 of 12
Download to read offline
National College of Ireland
Project Submission Sheet – 2018/2019
Student Name: SHANTANU DESHPANDE
Student ID: X18125514
Programme: MSc. In Data Analytics (Cohort B) Year: Jan 2019
Module: Statistics for Data Analytics
Lecturer: Prof. Tony Delaney
Submission Due
Date:
7th
January 2019
Project Title: CA2 – Regression
Word Count: 2,329
I hereby certify that the information contained in this (my submission) is information
pertaining to research I conducted for this project. All information other than my own
contribution will be fully referenced and listed in the relevant bibliography section at
the rear of the project.
ALL internet material must be referenced in the references section. Students are
encouraged to use the Harvard Referencing Standard supplied by the Library. To use
other author's written or electronic work is illegal (plagiarism) and may result in
disciplinary action. Students may be required to undergo a viva (oral examination) if
there is suspicion about the validity of their submitted work.
Signature: ………………………………………………………………………………………………………………
Date: ………………………………………………………………………………………………………………
PLEASE READ THE FOLLOWING INSTRUCTIONS:
1. Please attach a completed copy of this sheet to each project (including multiple copies).
2. Projects should be submitted to your Programme Coordinator.
3. You must ensure that you retain a HARD COPY of ALL projects, both for your own
reference and in case a project is lost or mislaid. It is not sufficient to keep a copy on
computer. Please do not bind projects or place in covers unless specifically requested.
4. You must ensure that all projects are submitted to your Programme Coordinator on or
before the required submission date. Late submissions will incur penalties.
5. All projects must be submitted and passed in order to successfully complete the year.
Any project/assignment not submitted will be marked as a fail.
Office Use Only
Signature:
Date:
Penalty Applied (if applicable):
MULTI LINEAR REGRESSION
Objective of the study: The objective is to analyze the chosen data table with help of an automated software ‘IBM-
SPSS’ by using multi linear regression to predict relation of a dependent variable with two independent variables.
Problem Analysis:
The relevant datasets have been taken from the below web links:
1) Adult Mortality rate (probability of dying between 15 and 60 years per 1000 population) by country-
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000004,
2) Adult Obesity Rate (adults aged >= 20 years who are obese (%))-
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000010
3) Alcohol Consumption among adults aged >= 15 years (litres of alcohol per person per year)
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000011
Description of the dataset:
The dataset consists of 3 variables, mortality, obesity and alcohol. Out of these 3, the mortality column is assumed as
dependent variable whereas obesity & alcohol are assumed as independent variables. The multiple regression
analysis is taken to speculate the causes of adult mortality with the help of 2 independent variables. As the
dependant variable is a continuous variable multi linear regression model is used for this data. Let us use the
independent variables and analyze to see how well adult mortality is predictable.
Description of Analysis:
The analysis is carried out to show observation of significance of different independent variables on the prediction of
the dependent variable.
1. b-Value:
The b-value explains the extent to which each independent value influences the result if the results are constant for
all other independent values.
2. Durbin Watson Method:
The value obtained from Durbin Watson method should ideally be close to 2 for an efficient result whereas if the
obtained result is <1 or >3 then observation obtained is far away from the predicted result.
3. ANOVA / F-test:
The F test asserts whether the variance thus obtained by the advocated model is considerably higher than error
within the calculated model. It will tell us whether the use of multiple regression is fine at predicting values of the
result.
4. Collinearity Test:
This test is used to check whether taken predictor variables are closely related to each other or not. This test
confirms by two results one tolerance which should not be greater than 1 and secondly the VIF should be in range of
1-10.
Assumptions:
There are several key assumptions of multi linear regression. They are: homoscedasticity, linearity, normality and
multicollinearity.
 By observing the normal PP plot, we are hoping that all our data points will lie in a reasonably straight
diagonal line from bottom left to top right. (Pallant, 2016)
 This would suggest us that there is no major deviation from normality.
 From our plot we can see that our points are either on the diagonal line or close to it hence no major
deviation from normal.
 Through the Scatter Plot, we are hoping that the data should be roughly rectangularly distributed with
almost major part of data should be concentrated in the centre. (Pallant, 2016)
 We can see from our data that it is fulfilling the criteria and no particular pattern is observed in the scatter
plot.
 Hence, we can assure that it is homoscedastic.
 Outliers can be identified from above scatter plot. Tabachnick and Fidell define outliers as cases that have
standardised residual of more than 3.3 or less than -3.3 (Pallant, 2016)
 Here, we can identify 3 potential outliers located at the top-centre of the scatter plot.
 From the above histogram, we can clearly see that model fits very well in the graph of normal distribution.
Multicollinearity:
Correlations
mortalityrate alcohol obese
Pearson Correlation mortalityrate 1.000 -.336 -.453
alcohol -.336 1.000 .240
obese -.453 .240 1.000
Sig. (1-tailed) mortalityrate . .000 .000
alcohol .000 . .001
obese .000 .001 .
N mortalityrate 178 178 178
alcohol 178 178 178
obese 178 178 178
 The correlation table gives us the idea about the correlation between the independent variables and the
dependent variables. Normally, the correlated values lie within the range of -1 to 1.
 There should be some relationship between the dependent and independent variable (preferably above 0.3)
 In this case, we can observe that our dependant variable, ‘mortalityrate’ has a moderate and negative
correlation with both the independent variables ‘alcohol’ and ‘obese’ with the values being -0.336 and -
0.453 respectively.
 Alternatively, the correlation between independent variables should not be very high (<0.7)
 Our independent variables have a weak correlation between them with a value of 0.24. As this value is much
less than the threshold value of 0.7, we can assure that there is an absence of multicollinearity within the
independent variables.
IBM SPSS Analysis Interpretation:
1) B-test:
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) 271.961 13.704 19.845 .000
alcohol -5.996 1.669 -.241 -3.592 .000
obese -3.258 .553 -.395 -5.894 .000
a. Dependent Variable: mortalityrate
 From the above table, we can explain about the constants and the slope which forms the regression line
equation.
 Based on the above figure, we can frame the equation for our regression line as follows-
Y = 271.961 – 5.996(alcohol) – 3.258(obese)
 The coefficient (B value) of a independent variable in a multiple regression model tells us the amount by
which dependent variable changes if that independent variable increases by one and the values of all other
independent variables in the model remain the same.
2) Durbin-Watson Method:
Model Summaryb
Model R
R
Square
Adjusted R
Square
Std. Error of
the Estimate
Change Statistics
Durbin-
Watson
R Square
Change
F
Change df1 df2
Sig. F
Change
1 .513a
.263 .255 87.622 .263 31.273 2 175 .000 2.011
a. Predictors: (Constant), obese, alcohol
b. Dependent Variable: mortalityrate
 The observed Durbin-Watson value is 2.011 which tells us that our observation fits in to the data model from
which we can infer that our predictor and prediction values are continuous.
 To get an idea of how well our model is able to predict the values in the dependent variable, we refer the R
value. In this case, R level of 0.51 illustrates that our model gives a moderate level of prediction.
 We refer the R square value to understand how much our response variable is explained by our predictor
variable. In this case, 26% of variance in the response variable is explained by our predictor variables.
3) ANOVA / F-Test:
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 482647.118 2 241323.559 30.678 .000b
Residual 1376609.134 175 7866.338
Total 1859256.253 177
a. Dependent Variable: mortalityrate
b. Predictors: (Constant), obese, alcohol
 In ANOVA, we can observe from the Sum of Squares column that about 482647.118 of our response out of
1859256.253 variables is explained by our predictor variable, which also means around 1376609.134 of y
variables were not explained by our x variable.
 The obtained value of F-test is 30.678 %
4) Collinearity:
Coefficientsa
Model
Unstandardized
Coefficients
Sig.
95.0% Confidence
Interval for B Correlations Collinearity Statistics
B Std. Error
Lower
Bound
Upper
Bound
Zero-
order Partial Part Tolerance VIF
1 (Constant) 270.366 13.464 .000 243.792 296.939
alcohol -5.909 1.649 .000 -9.164 -2.654 -.336 -.261 -.232 .942 1.061
obese -3.249 .543 .000 -4.321 -2.177 -.457 -.412 -.388 .942 1.061
a. Dependent Variable: mortalityrate
 Tolerance is an indicator of how much of the variability of the specified independent is not explained by the
other independent variables in the model and is calculated using the formula 1– R squared for each variable.
If this value is very small (less than .10) it indicates that the multiple correlation with other variables is high,
suggesting the possibility of multicollinearity. The other value given is the VIF (Variance inflation factor),
which is just the inverse of the Tolerance value (1 divided by Tolerance). VIF values above 10 would be a
concern here, indicating multicollinearity. (Pallant, 2016)
 The VIF under collinearity statistics gives us the score of 1.061 which is far below 10 and hence we can
assure there is no multicollinearity.
Conclusion:
Our conclusion from all test shows that there is a close association between the independent and dependent
variables. Under the influence of independent variables, there is a significant amount of change in the dependent
variable.
On the basis of the above figures, we can clearly observe how closely the dependent and independent variables are
related with each other. Any minor change in the value of dependent variable causes a change in the value of
independent variable.
LOGISTIC LINEAR REGRESSION
Objective of the study: To analyze the chosen data table by using an automated software ‘IBM-SPSS’ with the help
of logistic linear regression operation to compare dependent variable that is dichotomous with two independent
variables.
Source of dataset:
The data which was used for multiple regression has been used for logistic regression as well. Also the predictor
variable and response variable is same in logistic regression as that in multiple regression however the response
variable has been converted into dichotomous such that the value corresponding to 177.81 or above has been
assigned 1 and values corresponding below 177.81 has been assigned 0.
Description of the analysis:
The analysis is performed to observe the significance of the independent variables on the prediction of the
dichotomous dependent variable under the 95% Confidence interval.
1) The Hosmer and Lemeshow Test:
As per this test, if the significance value is less than 0.5 then the model is a poor fit. Therefore, we want a
significance value above 0.5 for our model. If so, we can say that our model is a good fit.
2) Cox & Snell and Nagalkerke R square:
This test provides an indication of the amount of variation in the dependent variable explained by the
model. Rather than the true R square values, these are termed as pseudo R square statistics. (Pallant, 2016).
A value close to 1 would be termed as perfect fit whereas close to 0 would show little to no relationship.
3) Variables in equation table:
The value in this equation, Wald value, is equivalent to the t-statistics in regression. Similarly, the B value in
this table is analogous to the b value in regression.
4) Percentage Accuracy in Classification:
A percentage table would help us understand by what percent the variable have achieved accuracy with
regards to overall factors.
5) The Omni Bus Test:
The omnibus test compares with the Block 0 on the sig. Value that is present in both the tables.
IBM SPSS Analysis Interpretation:
1) Hosmer and Lemeshow Test:
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 11.605 8 .170
 As stated by Hosmer-Lemeshow [1], in order for a model to fit properly the significance value should ideally
exceed 0.05. As our significance value is 0.170 we can say that our model is a good fit.
2) Cox & Snell and Nagarkerke R square:
Model Summary
Step -2 Log likelihood
Cox & Snell R
Square
Nagelkerke R
Square
1 174.266a
.310 .418
a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.
 The variation in the response variable of the model is demonstrated by the values in the Nagelkerke R
Square and Cox & Snell R Square [1]
 The values corresponding to both in our model are 0.418 & 0.310.
 The variability of response variables lies between the range 0.310 to 0.418
3) Variables in equation model:
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a
alcohol -.154 .051 9.148 1 .002 .858 .776 .947
obese -.108 .020 28.192 1 .000 .898 .863 .934
Constant 2.274 .430 28.014 1 .000 9.720
a. Variable(s) entered on step 1: alcohol, obese.
 Here we can see the contribution of individual x variable in the above table.
 Alcohol rate is higher than the obesity rate, hence alcohol consumption rate should be controlled first.
4) Percentage Accuracy in classification:
Classification Tablea
Observed
Predicted
mortality Percentage
Correct0 1
Step 1 mortality 0 91 15 85.8
1 19 53 73.6
Overall Percentage 80.9
a. The cut value is .500
 We can observe that 85.8 % of values are covered of 0 and 73.6 % values of 1 are covered by the software.
 A total of 80.9 % of the values are covered by both 0 and 1 and the difference, i.e. 19.1 % of the values are
not covered by the software.
5) The Omnibus Test:
Classification Tablea,b
Observed
Predicted
mortality
Percentage Correct0 1
Step 0 mortality 0 106 0 100.0
1 72 0 .0
Overall Percentage 59.6
a. Constant is included in the model.
b. The cut value is .500
In the classification table, the values that have been recognized by the software are the predominant values which
are covered in 0 and they cover 59.6 % of the values. The residual values lay under 1 and it is about 40.4% and is not
covered by the software.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 0 Constant -.387 .153 6.414 1 .011 .679
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 65.960 2 .000
Block 65.960 2 .000
Model 65.960 2 .000
In the omnibus test, the p value is 0.000 whereas the condition it should satisfy is p < 0.05. Therefore, we can assure
that the model in Block 0 is not better than this model as with 2 degrees of freedom we have chi-square value of
65.960.
Conclusion:
The conclusion that can be derived from all the above tests is that the dependent and independent variables are
building a relationship to build a data model. It can be observed that with a change in the value of independent
variable, there is a significant change in the value of the dichotomous independent variable. Thus showing how
closely the independent and dependent variables are related with each other.
References :
 Pallant, Julie. SPSS Survival Manual : a Step by Step Guide to Data Analysis Using SPSS.
 https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php
 https://statistics.laerd.com/spss-tutorials/binomial-logistic-regression-using-spssstatistics.php

More Related Content

What's hot

ECON104RoughDraft1
ECON104RoughDraft1ECON104RoughDraft1
ECON104RoughDraft1John Nguyen
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionKaushik Rajan
 
Lab Based E-portfolio
Lab Based E-portfolioLab Based E-portfolio
Lab Based E-portfolioNor Khamsiah
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1Muhammad Ali
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squaresYugesh Dutt Panday
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Nicha Tatsaneeyapan
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Preprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive ModelingPreprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive Modelingijtsrd
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
HeteroscedasticityMuhammad Ali
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsAnirudha si
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionNadzirah Hanis
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionPrasann Prem
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Multicolinearity
MulticolinearityMulticolinearity
MulticolinearityPawan Kawan
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
 

What's hot (19)

ECON104RoughDraft1
ECON104RoughDraft1ECON104RoughDraft1
ECON104RoughDraft1
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Lab Based E-portfolio
Lab Based E-portfolioLab Based E-portfolio
Lab Based E-portfolio
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Preprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive ModelingPreprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive Modeling
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost Prediction
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 

Similar to X18125514 ca2-statisticsfor dataanalytics

Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaNisheet Mahajan
 
Lecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentLecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentDaria Bogdanova
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfJermaeDizon2
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11Bonnie Green
 
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialresTrochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialrescurranalmeta
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSSANSHU TIWARI
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantPRAJAKTASAWANT33
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
Qnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comQnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comBaileya33
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regressionijtsrd
 
QNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comQNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comBromleyz33
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Dr Athar Khan
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation AssessmentsDr Lendy Spires
 

Similar to X18125514 ca2-statisticsfor dataanalytics (20)

200994363
200994363200994363
200994363
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
 
Lecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentLecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignment
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
Lab manual_statistik
Lab manual_statistikLab manual_statistik
Lab manual_statistik
 
Linearity cochran test
Linearity cochran testLinearity cochran test
Linearity cochran test
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
 
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialresTrochim, W. M. K. (2006). Internal validity.httpwww.socialres
Trochim, W. M. K. (2006). Internal validity.httpwww.socialres
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSS
 
PSPP software application
PSPP software applicationPSPP software application
PSPP software application
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Friedman-SPSS.docx
Friedman-SPSS.docxFriedman-SPSS.docx
Friedman-SPSS.docx
 
NON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta SawantNON-PARAMETRIC TESTS by Prajakta Sawant
NON-PARAMETRIC TESTS by Prajakta Sawant
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Qnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comQnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.com
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regression
 
QNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comQNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.com
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments
 

More from Shantanu Deshpande

Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesShantanu Deshpande
 
Analyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyAnalyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyShantanu Deshpande
 
Pharmaceutical store management system
Pharmaceutical store management systemPharmaceutical store management system
Pharmaceutical store management systemShantanu Deshpande
 
Data-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceData-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceShantanu Deshpande
 

More from Shantanu Deshpande (7)

Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
 
Analyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyAnalyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacy
 
Pneumonia detection using CNN
Pneumonia detection using CNNPneumonia detection using CNN
Pneumonia detection using CNN
 
Pharmaceutical store management system
Pharmaceutical store management systemPharmaceutical store management system
Pharmaceutical store management system
 
Data-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceData-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-Intelligence
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

X18125514 ca2-statisticsfor dataanalytics

  • 1. National College of Ireland Project Submission Sheet – 2018/2019 Student Name: SHANTANU DESHPANDE Student ID: X18125514 Programme: MSc. In Data Analytics (Cohort B) Year: Jan 2019 Module: Statistics for Data Analytics Lecturer: Prof. Tony Delaney Submission Due Date: 7th January 2019 Project Title: CA2 – Regression Word Count: 2,329 I hereby certify that the information contained in this (my submission) is information pertaining to research I conducted for this project. All information other than my own contribution will be fully referenced and listed in the relevant bibliography section at the rear of the project. ALL internet material must be referenced in the references section. Students are encouraged to use the Harvard Referencing Standard supplied by the Library. To use other author's written or electronic work is illegal (plagiarism) and may result in disciplinary action. Students may be required to undergo a viva (oral examination) if there is suspicion about the validity of their submitted work. Signature: ……………………………………………………………………………………………………………… Date: ……………………………………………………………………………………………………………… PLEASE READ THE FOLLOWING INSTRUCTIONS: 1. Please attach a completed copy of this sheet to each project (including multiple copies). 2. Projects should be submitted to your Programme Coordinator. 3. You must ensure that you retain a HARD COPY of ALL projects, both for your own reference and in case a project is lost or mislaid. It is not sufficient to keep a copy on computer. Please do not bind projects or place in covers unless specifically requested. 4. You must ensure that all projects are submitted to your Programme Coordinator on or before the required submission date. Late submissions will incur penalties. 5. All projects must be submitted and passed in order to successfully complete the year. Any project/assignment not submitted will be marked as a fail. Office Use Only Signature: Date: Penalty Applied (if applicable):
  • 2. MULTI LINEAR REGRESSION Objective of the study: The objective is to analyze the chosen data table with help of an automated software ‘IBM- SPSS’ by using multi linear regression to predict relation of a dependent variable with two independent variables. Problem Analysis: The relevant datasets have been taken from the below web links: 1) Adult Mortality rate (probability of dying between 15 and 60 years per 1000 population) by country- http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000004, 2) Adult Obesity Rate (adults aged >= 20 years who are obese (%))- http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000010 3) Alcohol Consumption among adults aged >= 15 years (litres of alcohol per person per year) http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000011 Description of the dataset: The dataset consists of 3 variables, mortality, obesity and alcohol. Out of these 3, the mortality column is assumed as dependent variable whereas obesity & alcohol are assumed as independent variables. The multiple regression analysis is taken to speculate the causes of adult mortality with the help of 2 independent variables. As the dependant variable is a continuous variable multi linear regression model is used for this data. Let us use the independent variables and analyze to see how well adult mortality is predictable. Description of Analysis: The analysis is carried out to show observation of significance of different independent variables on the prediction of the dependent variable. 1. b-Value: The b-value explains the extent to which each independent value influences the result if the results are constant for all other independent values. 2. Durbin Watson Method: The value obtained from Durbin Watson method should ideally be close to 2 for an efficient result whereas if the obtained result is <1 or >3 then observation obtained is far away from the predicted result. 3. ANOVA / F-test: The F test asserts whether the variance thus obtained by the advocated model is considerably higher than error within the calculated model. It will tell us whether the use of multiple regression is fine at predicting values of the result.
  • 3. 4. Collinearity Test: This test is used to check whether taken predictor variables are closely related to each other or not. This test confirms by two results one tolerance which should not be greater than 1 and secondly the VIF should be in range of 1-10. Assumptions: There are several key assumptions of multi linear regression. They are: homoscedasticity, linearity, normality and multicollinearity.  By observing the normal PP plot, we are hoping that all our data points will lie in a reasonably straight diagonal line from bottom left to top right. (Pallant, 2016)  This would suggest us that there is no major deviation from normality.  From our plot we can see that our points are either on the diagonal line or close to it hence no major deviation from normal.
  • 4.  Through the Scatter Plot, we are hoping that the data should be roughly rectangularly distributed with almost major part of data should be concentrated in the centre. (Pallant, 2016)  We can see from our data that it is fulfilling the criteria and no particular pattern is observed in the scatter plot.  Hence, we can assure that it is homoscedastic.  Outliers can be identified from above scatter plot. Tabachnick and Fidell define outliers as cases that have standardised residual of more than 3.3 or less than -3.3 (Pallant, 2016)  Here, we can identify 3 potential outliers located at the top-centre of the scatter plot.  From the above histogram, we can clearly see that model fits very well in the graph of normal distribution.
  • 5. Multicollinearity: Correlations mortalityrate alcohol obese Pearson Correlation mortalityrate 1.000 -.336 -.453 alcohol -.336 1.000 .240 obese -.453 .240 1.000 Sig. (1-tailed) mortalityrate . .000 .000 alcohol .000 . .001 obese .000 .001 . N mortalityrate 178 178 178 alcohol 178 178 178 obese 178 178 178  The correlation table gives us the idea about the correlation between the independent variables and the dependent variables. Normally, the correlated values lie within the range of -1 to 1.  There should be some relationship between the dependent and independent variable (preferably above 0.3)  In this case, we can observe that our dependant variable, ‘mortalityrate’ has a moderate and negative correlation with both the independent variables ‘alcohol’ and ‘obese’ with the values being -0.336 and - 0.453 respectively.  Alternatively, the correlation between independent variables should not be very high (<0.7)  Our independent variables have a weak correlation between them with a value of 0.24. As this value is much less than the threshold value of 0.7, we can assure that there is an absence of multicollinearity within the independent variables.
  • 6. IBM SPSS Analysis Interpretation: 1) B-test: Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig.B Std. Error Beta 1 (Constant) 271.961 13.704 19.845 .000 alcohol -5.996 1.669 -.241 -3.592 .000 obese -3.258 .553 -.395 -5.894 .000 a. Dependent Variable: mortalityrate  From the above table, we can explain about the constants and the slope which forms the regression line equation.  Based on the above figure, we can frame the equation for our regression line as follows- Y = 271.961 – 5.996(alcohol) – 3.258(obese)  The coefficient (B value) of a independent variable in a multiple regression model tells us the amount by which dependent variable changes if that independent variable increases by one and the values of all other independent variables in the model remain the same. 2) Durbin-Watson Method: Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics Durbin- Watson R Square Change F Change df1 df2 Sig. F Change 1 .513a .263 .255 87.622 .263 31.273 2 175 .000 2.011 a. Predictors: (Constant), obese, alcohol b. Dependent Variable: mortalityrate  The observed Durbin-Watson value is 2.011 which tells us that our observation fits in to the data model from which we can infer that our predictor and prediction values are continuous.  To get an idea of how well our model is able to predict the values in the dependent variable, we refer the R value. In this case, R level of 0.51 illustrates that our model gives a moderate level of prediction.  We refer the R square value to understand how much our response variable is explained by our predictor variable. In this case, 26% of variance in the response variable is explained by our predictor variables.
  • 7. 3) ANOVA / F-Test: ANOVAa Model Sum of Squares df Mean Square F Sig. 1 Regression 482647.118 2 241323.559 30.678 .000b Residual 1376609.134 175 7866.338 Total 1859256.253 177 a. Dependent Variable: mortalityrate b. Predictors: (Constant), obese, alcohol  In ANOVA, we can observe from the Sum of Squares column that about 482647.118 of our response out of 1859256.253 variables is explained by our predictor variable, which also means around 1376609.134 of y variables were not explained by our x variable.  The obtained value of F-test is 30.678 % 4) Collinearity: Coefficientsa Model Unstandardized Coefficients Sig. 95.0% Confidence Interval for B Correlations Collinearity Statistics B Std. Error Lower Bound Upper Bound Zero- order Partial Part Tolerance VIF 1 (Constant) 270.366 13.464 .000 243.792 296.939 alcohol -5.909 1.649 .000 -9.164 -2.654 -.336 -.261 -.232 .942 1.061 obese -3.249 .543 .000 -4.321 -2.177 -.457 -.412 -.388 .942 1.061 a. Dependent Variable: mortalityrate  Tolerance is an indicator of how much of the variability of the specified independent is not explained by the other independent variables in the model and is calculated using the formula 1– R squared for each variable. If this value is very small (less than .10) it indicates that the multiple correlation with other variables is high, suggesting the possibility of multicollinearity. The other value given is the VIF (Variance inflation factor), which is just the inverse of the Tolerance value (1 divided by Tolerance). VIF values above 10 would be a concern here, indicating multicollinearity. (Pallant, 2016)  The VIF under collinearity statistics gives us the score of 1.061 which is far below 10 and hence we can assure there is no multicollinearity.
  • 8. Conclusion: Our conclusion from all test shows that there is a close association between the independent and dependent variables. Under the influence of independent variables, there is a significant amount of change in the dependent variable. On the basis of the above figures, we can clearly observe how closely the dependent and independent variables are related with each other. Any minor change in the value of dependent variable causes a change in the value of independent variable.
  • 9. LOGISTIC LINEAR REGRESSION Objective of the study: To analyze the chosen data table by using an automated software ‘IBM-SPSS’ with the help of logistic linear regression operation to compare dependent variable that is dichotomous with two independent variables. Source of dataset: The data which was used for multiple regression has been used for logistic regression as well. Also the predictor variable and response variable is same in logistic regression as that in multiple regression however the response variable has been converted into dichotomous such that the value corresponding to 177.81 or above has been assigned 1 and values corresponding below 177.81 has been assigned 0. Description of the analysis: The analysis is performed to observe the significance of the independent variables on the prediction of the dichotomous dependent variable under the 95% Confidence interval. 1) The Hosmer and Lemeshow Test: As per this test, if the significance value is less than 0.5 then the model is a poor fit. Therefore, we want a significance value above 0.5 for our model. If so, we can say that our model is a good fit. 2) Cox & Snell and Nagalkerke R square: This test provides an indication of the amount of variation in the dependent variable explained by the model. Rather than the true R square values, these are termed as pseudo R square statistics. (Pallant, 2016). A value close to 1 would be termed as perfect fit whereas close to 0 would show little to no relationship. 3) Variables in equation table: The value in this equation, Wald value, is equivalent to the t-statistics in regression. Similarly, the B value in this table is analogous to the b value in regression. 4) Percentage Accuracy in Classification: A percentage table would help us understand by what percent the variable have achieved accuracy with regards to overall factors. 5) The Omni Bus Test: The omnibus test compares with the Block 0 on the sig. Value that is present in both the tables.
  • 10. IBM SPSS Analysis Interpretation: 1) Hosmer and Lemeshow Test: Hosmer and Lemeshow Test Step Chi-square df Sig. 1 11.605 8 .170  As stated by Hosmer-Lemeshow [1], in order for a model to fit properly the significance value should ideally exceed 0.05. As our significance value is 0.170 we can say that our model is a good fit. 2) Cox & Snell and Nagarkerke R square: Model Summary Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square 1 174.266a .310 .418 a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.  The variation in the response variable of the model is demonstrated by the values in the Nagelkerke R Square and Cox & Snell R Square [1]  The values corresponding to both in our model are 0.418 & 0.310.  The variability of response variables lies between the range 0.310 to 0.418 3) Variables in equation model: Variables in the Equation B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B) Lower Upper Step 1a alcohol -.154 .051 9.148 1 .002 .858 .776 .947 obese -.108 .020 28.192 1 .000 .898 .863 .934 Constant 2.274 .430 28.014 1 .000 9.720 a. Variable(s) entered on step 1: alcohol, obese.  Here we can see the contribution of individual x variable in the above table.  Alcohol rate is higher than the obesity rate, hence alcohol consumption rate should be controlled first. 4) Percentage Accuracy in classification: Classification Tablea
  • 11. Observed Predicted mortality Percentage Correct0 1 Step 1 mortality 0 91 15 85.8 1 19 53 73.6 Overall Percentage 80.9 a. The cut value is .500  We can observe that 85.8 % of values are covered of 0 and 73.6 % values of 1 are covered by the software.  A total of 80.9 % of the values are covered by both 0 and 1 and the difference, i.e. 19.1 % of the values are not covered by the software. 5) The Omnibus Test: Classification Tablea,b Observed Predicted mortality Percentage Correct0 1 Step 0 mortality 0 106 0 100.0 1 72 0 .0 Overall Percentage 59.6 a. Constant is included in the model. b. The cut value is .500 In the classification table, the values that have been recognized by the software are the predominant values which are covered in 0 and they cover 59.6 % of the values. The residual values lay under 1 and it is about 40.4% and is not covered by the software. Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 0 Constant -.387 .153 6.414 1 .011 .679 Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 65.960 2 .000 Block 65.960 2 .000 Model 65.960 2 .000 In the omnibus test, the p value is 0.000 whereas the condition it should satisfy is p < 0.05. Therefore, we can assure that the model in Block 0 is not better than this model as with 2 degrees of freedom we have chi-square value of 65.960.
  • 12. Conclusion: The conclusion that can be derived from all the above tests is that the dependent and independent variables are building a relationship to build a data model. It can be observed that with a change in the value of independent variable, there is a significant change in the value of the dichotomous independent variable. Thus showing how closely the independent and dependent variables are related with each other. References :  Pallant, Julie. SPSS Survival Manual : a Step by Step Guide to Data Analysis Using SPSS.  https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php  https://statistics.laerd.com/spss-tutorials/binomial-logistic-regression-using-spssstatistics.php