SlideShare a Scribd company logo
1 of 16
[Type here]
Statistics for Data
Analytics CA2 –
Regression Project
Multiple Regression and Logistic
Regression
MSc Data Analytics
Group A
x18134599
Mansi Atul Chowkkar
[Type here]
MULTI LINEAR REGRESSION
Dataset And Analysis:
I have taken BirthRate, Abortion rate ,Antenatal care and Birth attended by skilled doctor four datasets from
http://data.un.org site.
1. Total female/male birth rate of all countries from year 1995 to 2005 is from:
http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a9
2. Total abortion rate for all countries and from 1995 from 2005 is from:
http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a12
3. Number of births attended by skilled doctor for all counties from 1995 to 2005 :
http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a47
4. Antenatal care rate for at least one visit which shows how many females are taking care in pregnancy
from 1995 to 2005 :
http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a45
I have merged all these four datasets to form one single data for multiple regression using R. I am
considering only year 2003 data for regression.
For Multiple linear regression I am taking Birth rate variable as a dependant variable abortion
rate, Births attended by specialist and Antenatal care are independent variables.
The objective of the project is whether male/female Birth ratio in percent is depends overall
male/female Abortion rate, Births attended by skilled doctor and Antenatal care rate.
Understanding Data
Data consists of 4 variables in which 3 are independent and 1 is dependent
a) BirthAttendendedBySpecialist which is independent value and it a continuous variable
with max value as 100 that means all births are attended by skilled doctor and min value
as 43.4 which means only 43.4 % of births attended by skilled doctor from overall births
b) AntenatalCare which is independent value and it a continuous variable with max value as
100 which means all women have taken proper care during or before their pregnancy and
min value as 33 that means only 33% of the women have taken proper care before or
during pregnancy
c) AbortionRate which is independent value and it a continuous variable with max value as
25.6 which means 25.6% women have aborted their child and min value as 1.2 that
means only 1.2% women have aborted their child
d) BirtRate which is dependent value and it a continuous variable with max value as 100
which means 100 % birth of female and male ratio in that particular country and min
value as 76 which means 76% girls birth from female/male.
[Type here]
Objectives and Assumptions On which data is analysed:
Assumption1:
Dependent variable should be measured on continuous scale:
BirtRate is a continuous variable since it does not have any null value or zero in it.
Descriptive Statistics
Mean Std. Deviation N
BirtRate 93.692 5.4980 53
BirthAttendendedBySpecialis
t
86.747 14.6630 53
AntenatalCare 81.43 15.819 53
AbortionRate 9.687169811320
754
6.283150966415
133
53
Descriptive statistics
Assumption2: Sample size
 In the first output box, it is provided with the descriptive statistics for
three sets of scores (Mean, Standard deviation, N).
 Mean value of BirtRate 93.692 explains that 93.692 % is mean female birth rate.
 Mean value of abortion rate is 9.68 which means 9.68% female tend to abort their child which should
be low.
 BirthattendedBySpecialistis having mean value is 86.747 which is in percentage and antenatal care
mean is 81.43which is also in percentage and both means is expected to be higher.
 Standard deviation is more for BirthattendedBySpecialistis and antenatalCare variables which means
that values are more deviated from mean value.
 Here N value is 53 which is above 30 that means any violation of normality or equality of variance
that may exist is not going to affect too much.
Assumption 3: Data must show multicollinearity:
 I am considering BirthAttendendedBySpecialist , AntenatalCare and AbortionRate these three
variables as an independent variables which are continuous .
Assumption4: Independence of observation checked by Durbin-Watson method:
 The Durbin Watson value is 1.903 that is in between 1.5 and 2.5 that means data is not
autocorrelated.
 Antenatal care, Birth attended By Specialist And abortion rate are independent and don’t have any
relationship between them.
[Type here]
Assumption5: Significant outliers, high leverage points or highly influential points
 This can be checked by the Normal Probability Plot (P-P) of the Regression ,Standardised Residual
and the Scatter-plot that were requested as part of the analysis.
 All these parameters are presented in below diagrams from spss output.
 Result is expected that points should be lie reasonably on a straight line but in this case, they are
slightly deviated from straight line. This states that there is a slight deviation from normality.
 In the Scatterplot of the standardised residuals (the second plot displayed) expected result was
most of the points must be scattered in central area and very few to be scattered in outliners.
 In this case points are slightly deviated to right side that is majority is scattered in right side of
rectangle, this means if we draw a line of regression through scattered points then regression will be
negative.
[Type here]
SPSS Outputexplanation:
Correlations
BirtRate BirthAttendendedBySpecialist AntenatalCare AbortionRate
Pearson
Correlation
BirtRate 1.000 0.677 0.729 -0.498
BirthAttendendedBySpecialist 0.677 1.000 0.740 -0.628
AntenatalCare 0.729 0.740 1.000 -0.584
AbortionRate -0.498 -0.628 -0.584 1.000
Sig. (1-
tailed)
BirtRate 0.000 0.000 0.000
BirthAttendendedBySpecialist 0.000 0.000 0.000
AntenatalCare 0.000 0.000 0.000
AbortionRate 0.000 0.000 0.000
N BirtRate 53 53 53 53
BirthAttendendedBySpecialist 53 53 53 53
AntenatalCare 53 53 53 53
AbortionRate 53 53 53 53
 AbortionRate , BirthAttendendedBySpecialist and AntenatalCare correlate substantially with
BirthRate (–.0498,0.677and 0.729 respectively).
 The correlation between each of the independent variables is not too high. In this case two
independent variables have correlation value <0.7 that means these variables are good for the
model, Antenatal care has value 0.729 which is slightly greater than 0.7 so I am considering it in my
model.
[Type here]
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
Correlations
Collinearity
Statistics
B
Std.
Error Beta
Zero-
order Partial Part Tolerance VIF
1 (Constant) 70.213 4.846 14.489 0.000
BirthAttendendedBySpecialist 0.111 0.056 0.296 1.987 0.052 0.677 0.273 0.185 0.394 2.539
AntenatalCare 0.173 0.050 0.497 3.486 0.001 0.729 0.446 0.325 0.429 2.332
AbortionRate -0.020 0.108 -0.023 -0.183 0.855 -
0.498
-0.026 -
0.017
0.574 1.742
a. DependentVariable:BirtRate
 The results are presented in the table labelled Coefficients. Two values
are given: Tolerance and VIF.
 Tolerance is an indicator of how much of the variability of the specified independent varaible is not
explained by the other independent variables in the model and is calculated using the formula 1–R
squared for each variable.
 This value is 0.574 for AbortionRate, 0.394 for BirthattendedByspecialist and 0.429 for
AntenatalCare which indicates that correlation of BirthRate with all these three variables is high.
 The VIF (Variance inflation factor), which is just the inverse of the Tolerance value (1 divided by
Tolerance). VIF values should not exceed 10.
 In this case beta value for Antenatal care value is high is 0.497, that means this variable makes
strong contribution to calculate Birth rate.
 In this example the VIF value for each independent variable is not more than 3
which is less than 10 therefore it proves that I have not violated multicollinearity
assumption.
 The equation for our regression line can be written as :
y= 70.213– 0.020(abortionRate)
0.111(BirthAttendedBySpecialist)+0.173(AntenatalRate)
The B value tells us about how much the value of y increases with the increase in the
x variable.
 The value of coefficients is significant as the value of p is less than 0.05.
 The value 0.497 in the Beta table is the highest contributor for explaining our y
variable (BirthRate) followed by BirthattendedBySpecialist and AbortionRate as
0.296, -0.023
 From sigma value we can say 0.001 and 0.052 are significant values.Sigma value of
AbortionRate is 0.855 which is not that significant and we can remove that variable
from our model.
[Type here]
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate Durbin-Watson
1 .757a
.573 .547 3.6995 1.903
a. Predictors:(Constant),AbortionRate, AntenatalCare,BirthAttendendedBySpecialist
b. DependentVariable:BirtRate
 How perfectly line of regression is fitted to the model can be predicted from Model summary
 The R value in the table is the value of gives us the idea of how well our model is able to predict the
values in the dependent variable. The value of R which is 0.757 illustrates that our model gives
good level of prediction.
 In the Model Summary the value given under the heading R Square is 0.573. This tells variance in
the dependent variable (BirtRate) is explained by the model (which includes the variables of
AbortionRate,antenatalCare and BirthsattendedBySoecialist).
 In this case,the value is .573 expressed as a percentage, this means that our model (which includes
AbortionRate,antenatalCare and BirthsattendedBySoecialist) explains 57.3 per cent of the
female/male birth ratio is depends on abortion rate, antenatal care and birth attended by specialist.
 The adjusted R square explains us the value of R square according to the iprovement
observed in the model when a new variable is introduced, here the value of the
adjusted R square is 0.547 which is very close to our R square.
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 901.262 3 300.421 21.951 .000b
Residual 670.615 49 13.686
Total 1571.877 52
 In ANOVA, the sum of squares column states that about 901.26 of our response out
of 1571.877 variable is explained by our predictor variable, which also means that
around 670 of y variable was unexplained by our x variable.
 3 out of 52 degrees of freedom are used by our model.
 The significance value is 0 which is (p< 0.0005). As it is significant, we can reject null
hypothesis saying slope of the line is not zero.
[Type here]
Results
Based on the regression analysis results, the regression equation was obtained as it is
shown below:
BirthRate= 70.213– 0.020(abortionRate)
+0.111(BirthAttendedBySpecialist)+0.173(AntenatalRate)
The coefficient of independent variable in a multiple regression model is the amount by
which dependent variable changes
Here we can see that if AbortionRate increases the BirthRate will deacreses as expected
and birth attended by specialist, antenatal care increases then birth rate will increase.
This multiple linear regression analyses whether or not the three independent variables in
the model (AbortionRate, BirthAttendedBySpecialist, AntenatalRate) were significantly
predictive of the BirthRate, the dependent variable.
Firstly, the assumptions necessary for the multiple linear regression were examined and the
multi linear regression analysis was performed with the data which were thought to satisfy
the assumptions.
AntenatalCare the biggest contribution to the model with highest value in standardized
coefficients Beta as .497. The variables which having significance value less than .05 is said
to be statistically significant. We can say that AntenatalCare and BirthAttendedBySpecialist
significant as they have value 0.001 and 0.052 respectively, but AbortionRate is not that
significant as it is inversely proportional to BirthRate and have significant value as 0.855.
[Type here]
LOGISTIC REGRESSION
Problem Analysis
The data which is used for the logistics regression is same as used in the multiple regression. The
predictor variable used for logistics regression is same as above. One independent variable that is abortion
rate is converted into dichotomous such that abortion rate above 9.6% is 1 and the value which is below
9.6% is 0.
The response variable over here is same as above (BirthRate), but here the response variable is converted
to dichotomous such that the Birth rate value which corresponds to 93.6% above is 1 and the value which
is below 93.6% is 0.
Objective: To evaluate whether Above 93.6% female/male BirthRate ratio depends on abortion rate91),
antenatal care and Birth Attended by skilled doctor or not.
Understanding Data
BirthRate is a dependant variable and 93.6% threshold is set to convert data into dichotomous data.
From three independent variables, I have converted one variable Abortion rate in dichotomous varable.
Cleaning and conversion of data into dichotomous data is done by using R code and excel.
Assumptions:
SampleSize:
Case Processing Summary
Unweighted Casesa
N Percent
Selected Cases Included in Analysis 53 100.0
Missing Cases 0 .0
Total 53 100.0
Unselected Cases 0 .0
Total 53 100.0
a. If weightis in effect, see classification table for the total number of
cases.
Here Sample Size is 53, not that small.
Multicollinearity:
In binomial logistic method there is no method for testing multicollinearity .I have done using multiple
regression
[Type here]
Correlations
Birth
BirthAttendende
dBySpecialist AntenatalCare Abortion
Pearson Correlation Birth 1.000 .585 .613 -.184
BirthAttendendedBySpecialis
t
.585 1.000 .740 -.487
AntenatalCare .613 .740 1.000 -.498
Abortion -.184 -.487 -.498 1.000
Sig. (1-tailed) Birth . .000 .000 .094
BirthAttendendedBySpecialis
t
.000 . .000 .000
AntenatalCare .000 .000 . .000
Abortion .094 .000 .000 .
N Birth 53 53 53 53
BirthAttendendedBySpecialis
t
53 53 53 53
AntenatalCare 53 53 53 53
Abortion 53 53 53 53
From the value above it is clear that all three independent variables are not strongly related to each
other since value is not exceeding to 0.7
Abortion is a categorical variable and converted into dichotomous
Dependent Variable
Encoding
Original Value Internal Value
0 0
1 1
Here Birth rate greater than 93.6 that is average value of Birth rate is converted as 1 and value below 93.6
is converted as 0.
[Type here]
Outliers:
Casewise Listb
Case Selected Statusa
Observed
Predicted Predicted Group
Temporary Variable
Birth Resid ZResid SResid
1 S 0** .781 1 -.781 -1.888 -1.818
7 S 0** .779 1 -.779 -1.875 -1.841
12 S 0** .694 1 -.694 -1.504 -1.765
17 S 1 .579 1 .421 .852 1.142
28 S 1** .072 0 .928 3.578 2.474
29 S 1 .521 1 .479 .958 1.289
33 S 1 .521 1 .479 .959 1.256
41 S 0** .684 1 -.684 -1.470 -1.613
42 S 0 .328 0 -.328 -.699 -1.023
43 S 0** .580 1 -.580 -1.174 -1.735
49 S 0** .502 1 -.502 -1.005 -1.279
a. S = Selected,U = Unselected cases,and ** = Misclassified cases.
b. Cases with studentized residuals greater than 1.000 are listed.
By default, cases with residual exceeding 1 are listed (classified as outliers)
There is only one case having birth rate as 1 that is greater than 93.6 is misclassified.
Dependent and Independent variables:
Birth rate is Dichotomous dependant variable and 2 independent continuous variables plus one
dichotomous independent variable is considered for Logistic regression.
All three independent variables have no correlation between themselves.
[Type here]
SPSS Output Prediction:
.
Categorical Variables Codings
Frequency
Parameter
coding
(1)
Abortion 0 26 1.000
1 27 .000
Both the categories have equal number of variables, No one category group have very less number.
Block 0: Beginning Block
Classification Tablea,b
Observed
Predicted
Birth
Percentage
Correct0 1
Step 0 Birth 0 0 10 0.0
1 0 43 100.0
Overall Percentage 81.1
a. Constantis included in the model.
b. The cut value is .500
This is a beginning block which does not contain independent variables. Overall percentage with
correctly classified cases is 81.1. We must expect increase in percentage value once all
independent variables are involved in the model.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 0 Constant 1.459 .351 17.261 1 .000 4.300
[Type here]
Variables not in the Equation
Score df Sig.
Step 0 Variables BirthAttendendedBySpecialis
t
18.114 1 .000
AntenatalCare 19.948 1 .000
Abortion(1) 1.791 1 .181
Overall Statistics 23.727 3 .000
The Omnibus Tests of Model Coefficients
Omnibus Tests of Model Coefficients
Chi-square df Sig.
sStep 1 Step 23.209 3 .000
Block 23.209 3 .000
Model 23.209 3 .000
 Significant value tells us if there are significant difference between actual and predicted values.
 In this case, the value is .000 (p<.0005). Therefore, the model fit is acceptable and ideal.
 The chi-square value, which we will need to report in our results, is 23.20 with 3 degrees of
freedom.
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 9.981 8 .266
Hosmer and Lemeshow chi square value is 9.981 with 5 degrees of freedom and significance value is
0.266(it should be greater than 0.05) which implies support for model.
Model Summary
Step -2 Log likelihood
Cox & Snell R
Square
Nagelkerke R
Square
1 28.127a
.355 .572
a. Estimation terminated atiteration number 6 because
parameter estimates changed byless than .001.
 The cox & snell R square of 0.355 and Nagelkerke R square of 0.4572 are analogous to R2
measure.
[Type here]
 In this example, the two values are suggesting that between 35.5% and 45.72% of the variability is
explained by this set of variables.
Classification Tablea
Observed
Predicted
Birth
Percentage
Correct0 1
Step 1 Birth 0 4 6 40.0
1 1 42 97.7
Overall Percentage 86.8
a. The cut value is .500
 The percentage for corrected module with including independent variables is 86.6% which is
improved by 6.7%.
 This model is 97.7% sensitive and 40% is specificity
 BirthRate with 0 that is less than 93.6% is predicted to be 40% and Birth rate greater than 93.6%
which is 1 is predicted to be 97.7%
 Whereas, BirthRate with value 0 is not predicted is 60% and BirthRate with value 1 is not predicted
is 2.8%
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for
EXP(B)
Lower Upper
Step
1a
BirthAttendendedBySpecialist 0.083 0.045 3.380 1 0.066 1.087 0.995 1.187
AntenatalCare 0.104 0.048 4.680 1 0.031 1.110 1.010 1.220
Abortion(1) -1.779 1.233 2.080 1 0.149 0.169 0.015 1.893
Constant -
12.629
4.652 7.368 1 0.007 0.000
a. Variable(s) entered on step 1: BirthAttendendedBySpecialist,AntenatalCare,Abortion.
 This table provide values for the variables which contribute in our model.
 Test used here is Wald Test, the value under column name wald represent statistics value of each
of the predictor.
 Sig value represent significant value of each of the variable in the model and value should be
greater than 0.05.
 We can clearly see that BirthattendedBySpecialist (sig=0.066) and AntenatalCare (sig=0.031)
variables are more significant as compared to Abortion which is a categorical variable in the model.
[Type here]
Results
Based on logistic regression analysis we have the equation as below:
logit(p) = -12.629 + 0.083 (BirthAttendendedBySpecialist )+ 0.104 (AntenatalCare )-1.779 (AbortionRate()1)
In this model we can remove AbortionRate from model as it is having significant value greater than 0.05
which is 0.149 which implies AbortionRate which is a categorical variable is not contributing strongly in our
model.Whereas BirthAttendendedBySpecialist and AntenatalCare are contributing strongly in our model.
Ideally AbortionRate should be inversely proportional to BirthRate and it is proved from Logistic regression
that it is not contributing to increase Birth rate.
We will get probability after substituting respective independent variables in logic regression equation. If the
probability is greater than 0.5 then BirthRate will be 1 that means it is greater than 93.6% and if probability
is less than 0.5 that means BirthRate will be 0 that is it is less than 93.6%
[Type here]
References:
1. Pallant, Julie. SPSS Survival Manual : a Step by Step Guide to Data Analysis Using SPSS.
Maidenhead :Open University Press/McGraw-Hill, 2010. Print.
2. https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php
3. https://www.sheffield.ac.uk/polopoly_fs/1.531431!/file/MASHRegression_Further_SPSS.pdf

More Related Content

What's hot

Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1Muhammad Ali
 
Preprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive ModelingPreprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive Modelingijtsrd
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsAnirudha si
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
HeteroscedasticityMuhammad Ali
 
Lab Based E-portfolio
Lab Based E-portfolioLab Based E-portfolio
Lab Based E-portfolioNor Khamsiah
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squaresYugesh Dutt Panday
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Nicha Tatsaneeyapan
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation AssessmentsDr Lendy Spires
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator Muhammad Ali
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionPrasann Prem
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricinginventionjournals
 

What's hot (20)

Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1
 
Preprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive ModelingPreprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive Modeling
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Lab manual_statistik
Lab manual_statistikLab manual_statistik
Lab manual_statistik
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Lab Based E-portfolio
Lab Based E-portfolioLab Based E-portfolio
Lab Based E-portfolio
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
 
Heteroscedasticity | Eonomics
Heteroscedasticity | EonomicsHeteroscedasticity | Eonomics
Heteroscedasticity | Eonomics
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments2014 IIAG Imputation Assessments
2014 IIAG Imputation Assessments
 
Ali, Redescending M-estimator
Ali, Redescending M-estimator Ali, Redescending M-estimator
Ali, Redescending M-estimator
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost Prediction
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 

Similar to Regression project

Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxmaxinesmith73660
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpExcel Homework Help
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and ProbablityAbdul Wasay Baloch
 
RMH Concise Revision Guide - the Basics of EBM
RMH Concise Revision Guide -  the Basics of EBMRMH Concise Revision Guide -  the Basics of EBM
RMH Concise Revision Guide - the Basics of EBMAyselTuracli
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)Twene Peter
 
multiple Regression
multiple Regressionmultiple Regression
multiple RegressionAnniqah
 
2_5332511410507220042.ppt
2_5332511410507220042.ppt2_5332511410507220042.ppt
2_5332511410507220042.pptnedalalazzwy
 
20081206 Biostatistics
20081206 Biostatistics20081206 Biostatistics
20081206 BiostatisticsChung-Han Yang
 
Lecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic ModelingLecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic Modelingstone55
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningIdanGalShohet
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docxjeanettehully
 

Similar to Regression project (20)

X18136931 statistics ca2_updated
X18136931 statistics ca2_updatedX18136931 statistics ca2_updated
X18136931 statistics ca2_updated
 
The Lachman Test
The Lachman TestThe Lachman Test
The Lachman Test
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and Probablity
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
RMH Concise Revision Guide - the Basics of EBM
RMH Concise Revision Guide -  the Basics of EBMRMH Concise Revision Guide -  the Basics of EBM
RMH Concise Revision Guide - the Basics of EBM
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)
 
multiple Regression
multiple Regressionmultiple Regression
multiple Regression
 
2_5332511410507220042.ppt
2_5332511410507220042.ppt2_5332511410507220042.ppt
2_5332511410507220042.ppt
 
20081206 Biostatistics
20081206 Biostatistics20081206 Biostatistics
20081206 Biostatistics
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Lecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic ModelingLecture5 Applied Econometrics and Economic Modeling
Lecture5 Applied Econometrics and Economic Modeling
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine Learning
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx
 

More from MansiChowkkar

M sc research_project_report_x18134599
M sc research_project_report_x18134599M sc research_project_report_x18134599
M sc research_project_report_x18134599MansiChowkkar
 
X18134599 mansi chowkkar
X18134599 mansi chowkkarX18134599 mansi chowkkar
X18134599 mansi chowkkarMansiChowkkar
 
Mansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analyticsMansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analyticsMansiChowkkar
 
Data visualisation magzine
Data visualisation magzineData visualisation magzine
Data visualisation magzineMansiChowkkar
 
Safe machinelearning
Safe machinelearningSafe machinelearning
Safe machinelearningMansiChowkkar
 
Mansi_BreastCancerDetection
Mansi_BreastCancerDetectionMansi_BreastCancerDetection
Mansi_BreastCancerDetectionMansiChowkkar
 

More from MansiChowkkar (6)

M sc research_project_report_x18134599
M sc research_project_report_x18134599M sc research_project_report_x18134599
M sc research_project_report_x18134599
 
X18134599 mansi chowkkar
X18134599 mansi chowkkarX18134599 mansi chowkkar
X18134599 mansi chowkkar
 
Mansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analyticsMansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analytics
 
Data visualisation magzine
Data visualisation magzineData visualisation magzine
Data visualisation magzine
 
Safe machinelearning
Safe machinelearningSafe machinelearning
Safe machinelearning
 
Mansi_BreastCancerDetection
Mansi_BreastCancerDetectionMansi_BreastCancerDetection
Mansi_BreastCancerDetection
 

Recently uploaded

Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样wsppdmt
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptTanveerAhmed817946
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 

Recently uploaded (20)

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 

Regression project

  • 1. [Type here] Statistics for Data Analytics CA2 – Regression Project Multiple Regression and Logistic Regression MSc Data Analytics Group A x18134599 Mansi Atul Chowkkar
  • 2. [Type here] MULTI LINEAR REGRESSION Dataset And Analysis: I have taken BirthRate, Abortion rate ,Antenatal care and Birth attended by skilled doctor four datasets from http://data.un.org site. 1. Total female/male birth rate of all countries from year 1995 to 2005 is from: http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a9 2. Total abortion rate for all countries and from 1995 from 2005 is from: http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a12 3. Number of births attended by skilled doctor for all counties from 1995 to 2005 : http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a47 4. Antenatal care rate for at least one visit which shows how many females are taking care in pregnancy from 1995 to 2005 : http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a45 I have merged all these four datasets to form one single data for multiple regression using R. I am considering only year 2003 data for regression. For Multiple linear regression I am taking Birth rate variable as a dependant variable abortion rate, Births attended by specialist and Antenatal care are independent variables. The objective of the project is whether male/female Birth ratio in percent is depends overall male/female Abortion rate, Births attended by skilled doctor and Antenatal care rate. Understanding Data Data consists of 4 variables in which 3 are independent and 1 is dependent a) BirthAttendendedBySpecialist which is independent value and it a continuous variable with max value as 100 that means all births are attended by skilled doctor and min value as 43.4 which means only 43.4 % of births attended by skilled doctor from overall births b) AntenatalCare which is independent value and it a continuous variable with max value as 100 which means all women have taken proper care during or before their pregnancy and min value as 33 that means only 33% of the women have taken proper care before or during pregnancy c) AbortionRate which is independent value and it a continuous variable with max value as 25.6 which means 25.6% women have aborted their child and min value as 1.2 that means only 1.2% women have aborted their child d) BirtRate which is dependent value and it a continuous variable with max value as 100 which means 100 % birth of female and male ratio in that particular country and min value as 76 which means 76% girls birth from female/male.
  • 3. [Type here] Objectives and Assumptions On which data is analysed: Assumption1: Dependent variable should be measured on continuous scale: BirtRate is a continuous variable since it does not have any null value or zero in it. Descriptive Statistics Mean Std. Deviation N BirtRate 93.692 5.4980 53 BirthAttendendedBySpecialis t 86.747 14.6630 53 AntenatalCare 81.43 15.819 53 AbortionRate 9.687169811320 754 6.283150966415 133 53 Descriptive statistics Assumption2: Sample size  In the first output box, it is provided with the descriptive statistics for three sets of scores (Mean, Standard deviation, N).  Mean value of BirtRate 93.692 explains that 93.692 % is mean female birth rate.  Mean value of abortion rate is 9.68 which means 9.68% female tend to abort their child which should be low.  BirthattendedBySpecialistis having mean value is 86.747 which is in percentage and antenatal care mean is 81.43which is also in percentage and both means is expected to be higher.  Standard deviation is more for BirthattendedBySpecialistis and antenatalCare variables which means that values are more deviated from mean value.  Here N value is 53 which is above 30 that means any violation of normality or equality of variance that may exist is not going to affect too much. Assumption 3: Data must show multicollinearity:  I am considering BirthAttendendedBySpecialist , AntenatalCare and AbortionRate these three variables as an independent variables which are continuous . Assumption4: Independence of observation checked by Durbin-Watson method:  The Durbin Watson value is 1.903 that is in between 1.5 and 2.5 that means data is not autocorrelated.  Antenatal care, Birth attended By Specialist And abortion rate are independent and don’t have any relationship between them.
  • 4. [Type here] Assumption5: Significant outliers, high leverage points or highly influential points  This can be checked by the Normal Probability Plot (P-P) of the Regression ,Standardised Residual and the Scatter-plot that were requested as part of the analysis.  All these parameters are presented in below diagrams from spss output.  Result is expected that points should be lie reasonably on a straight line but in this case, they are slightly deviated from straight line. This states that there is a slight deviation from normality.  In the Scatterplot of the standardised residuals (the second plot displayed) expected result was most of the points must be scattered in central area and very few to be scattered in outliners.  In this case points are slightly deviated to right side that is majority is scattered in right side of rectangle, this means if we draw a line of regression through scattered points then regression will be negative.
  • 5. [Type here] SPSS Outputexplanation: Correlations BirtRate BirthAttendendedBySpecialist AntenatalCare AbortionRate Pearson Correlation BirtRate 1.000 0.677 0.729 -0.498 BirthAttendendedBySpecialist 0.677 1.000 0.740 -0.628 AntenatalCare 0.729 0.740 1.000 -0.584 AbortionRate -0.498 -0.628 -0.584 1.000 Sig. (1- tailed) BirtRate 0.000 0.000 0.000 BirthAttendendedBySpecialist 0.000 0.000 0.000 AntenatalCare 0.000 0.000 0.000 AbortionRate 0.000 0.000 0.000 N BirtRate 53 53 53 53 BirthAttendendedBySpecialist 53 53 53 53 AntenatalCare 53 53 53 53 AbortionRate 53 53 53 53  AbortionRate , BirthAttendendedBySpecialist and AntenatalCare correlate substantially with BirthRate (–.0498,0.677and 0.729 respectively).  The correlation between each of the independent variables is not too high. In this case two independent variables have correlation value <0.7 that means these variables are good for the model, Antenatal care has value 0.729 which is slightly greater than 0.7 so I am considering it in my model.
  • 6. [Type here] Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. Correlations Collinearity Statistics B Std. Error Beta Zero- order Partial Part Tolerance VIF 1 (Constant) 70.213 4.846 14.489 0.000 BirthAttendendedBySpecialist 0.111 0.056 0.296 1.987 0.052 0.677 0.273 0.185 0.394 2.539 AntenatalCare 0.173 0.050 0.497 3.486 0.001 0.729 0.446 0.325 0.429 2.332 AbortionRate -0.020 0.108 -0.023 -0.183 0.855 - 0.498 -0.026 - 0.017 0.574 1.742 a. DependentVariable:BirtRate  The results are presented in the table labelled Coefficients. Two values are given: Tolerance and VIF.  Tolerance is an indicator of how much of the variability of the specified independent varaible is not explained by the other independent variables in the model and is calculated using the formula 1–R squared for each variable.  This value is 0.574 for AbortionRate, 0.394 for BirthattendedByspecialist and 0.429 for AntenatalCare which indicates that correlation of BirthRate with all these three variables is high.  The VIF (Variance inflation factor), which is just the inverse of the Tolerance value (1 divided by Tolerance). VIF values should not exceed 10.  In this case beta value for Antenatal care value is high is 0.497, that means this variable makes strong contribution to calculate Birth rate.  In this example the VIF value for each independent variable is not more than 3 which is less than 10 therefore it proves that I have not violated multicollinearity assumption.  The equation for our regression line can be written as : y= 70.213– 0.020(abortionRate) 0.111(BirthAttendedBySpecialist)+0.173(AntenatalRate) The B value tells us about how much the value of y increases with the increase in the x variable.  The value of coefficients is significant as the value of p is less than 0.05.  The value 0.497 in the Beta table is the highest contributor for explaining our y variable (BirthRate) followed by BirthattendedBySpecialist and AbortionRate as 0.296, -0.023  From sigma value we can say 0.001 and 0.052 are significant values.Sigma value of AbortionRate is 0.855 which is not that significant and we can remove that variable from our model.
  • 7. [Type here] Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson 1 .757a .573 .547 3.6995 1.903 a. Predictors:(Constant),AbortionRate, AntenatalCare,BirthAttendendedBySpecialist b. DependentVariable:BirtRate  How perfectly line of regression is fitted to the model can be predicted from Model summary  The R value in the table is the value of gives us the idea of how well our model is able to predict the values in the dependent variable. The value of R which is 0.757 illustrates that our model gives good level of prediction.  In the Model Summary the value given under the heading R Square is 0.573. This tells variance in the dependent variable (BirtRate) is explained by the model (which includes the variables of AbortionRate,antenatalCare and BirthsattendedBySoecialist).  In this case,the value is .573 expressed as a percentage, this means that our model (which includes AbortionRate,antenatalCare and BirthsattendedBySoecialist) explains 57.3 per cent of the female/male birth ratio is depends on abortion rate, antenatal care and birth attended by specialist.  The adjusted R square explains us the value of R square according to the iprovement observed in the model when a new variable is introduced, here the value of the adjusted R square is 0.547 which is very close to our R square. ANOVAa Model Sum of Squares df Mean Square F Sig. 1 Regression 901.262 3 300.421 21.951 .000b Residual 670.615 49 13.686 Total 1571.877 52  In ANOVA, the sum of squares column states that about 901.26 of our response out of 1571.877 variable is explained by our predictor variable, which also means that around 670 of y variable was unexplained by our x variable.  3 out of 52 degrees of freedom are used by our model.  The significance value is 0 which is (p< 0.0005). As it is significant, we can reject null hypothesis saying slope of the line is not zero.
  • 8. [Type here] Results Based on the regression analysis results, the regression equation was obtained as it is shown below: BirthRate= 70.213– 0.020(abortionRate) +0.111(BirthAttendedBySpecialist)+0.173(AntenatalRate) The coefficient of independent variable in a multiple regression model is the amount by which dependent variable changes Here we can see that if AbortionRate increases the BirthRate will deacreses as expected and birth attended by specialist, antenatal care increases then birth rate will increase. This multiple linear regression analyses whether or not the three independent variables in the model (AbortionRate, BirthAttendedBySpecialist, AntenatalRate) were significantly predictive of the BirthRate, the dependent variable. Firstly, the assumptions necessary for the multiple linear regression were examined and the multi linear regression analysis was performed with the data which were thought to satisfy the assumptions. AntenatalCare the biggest contribution to the model with highest value in standardized coefficients Beta as .497. The variables which having significance value less than .05 is said to be statistically significant. We can say that AntenatalCare and BirthAttendedBySpecialist significant as they have value 0.001 and 0.052 respectively, but AbortionRate is not that significant as it is inversely proportional to BirthRate and have significant value as 0.855.
  • 9. [Type here] LOGISTIC REGRESSION Problem Analysis The data which is used for the logistics regression is same as used in the multiple regression. The predictor variable used for logistics regression is same as above. One independent variable that is abortion rate is converted into dichotomous such that abortion rate above 9.6% is 1 and the value which is below 9.6% is 0. The response variable over here is same as above (BirthRate), but here the response variable is converted to dichotomous such that the Birth rate value which corresponds to 93.6% above is 1 and the value which is below 93.6% is 0. Objective: To evaluate whether Above 93.6% female/male BirthRate ratio depends on abortion rate91), antenatal care and Birth Attended by skilled doctor or not. Understanding Data BirthRate is a dependant variable and 93.6% threshold is set to convert data into dichotomous data. From three independent variables, I have converted one variable Abortion rate in dichotomous varable. Cleaning and conversion of data into dichotomous data is done by using R code and excel. Assumptions: SampleSize: Case Processing Summary Unweighted Casesa N Percent Selected Cases Included in Analysis 53 100.0 Missing Cases 0 .0 Total 53 100.0 Unselected Cases 0 .0 Total 53 100.0 a. If weightis in effect, see classification table for the total number of cases. Here Sample Size is 53, not that small. Multicollinearity: In binomial logistic method there is no method for testing multicollinearity .I have done using multiple regression
  • 10. [Type here] Correlations Birth BirthAttendende dBySpecialist AntenatalCare Abortion Pearson Correlation Birth 1.000 .585 .613 -.184 BirthAttendendedBySpecialis t .585 1.000 .740 -.487 AntenatalCare .613 .740 1.000 -.498 Abortion -.184 -.487 -.498 1.000 Sig. (1-tailed) Birth . .000 .000 .094 BirthAttendendedBySpecialis t .000 . .000 .000 AntenatalCare .000 .000 . .000 Abortion .094 .000 .000 . N Birth 53 53 53 53 BirthAttendendedBySpecialis t 53 53 53 53 AntenatalCare 53 53 53 53 Abortion 53 53 53 53 From the value above it is clear that all three independent variables are not strongly related to each other since value is not exceeding to 0.7 Abortion is a categorical variable and converted into dichotomous Dependent Variable Encoding Original Value Internal Value 0 0 1 1 Here Birth rate greater than 93.6 that is average value of Birth rate is converted as 1 and value below 93.6 is converted as 0.
  • 11. [Type here] Outliers: Casewise Listb Case Selected Statusa Observed Predicted Predicted Group Temporary Variable Birth Resid ZResid SResid 1 S 0** .781 1 -.781 -1.888 -1.818 7 S 0** .779 1 -.779 -1.875 -1.841 12 S 0** .694 1 -.694 -1.504 -1.765 17 S 1 .579 1 .421 .852 1.142 28 S 1** .072 0 .928 3.578 2.474 29 S 1 .521 1 .479 .958 1.289 33 S 1 .521 1 .479 .959 1.256 41 S 0** .684 1 -.684 -1.470 -1.613 42 S 0 .328 0 -.328 -.699 -1.023 43 S 0** .580 1 -.580 -1.174 -1.735 49 S 0** .502 1 -.502 -1.005 -1.279 a. S = Selected,U = Unselected cases,and ** = Misclassified cases. b. Cases with studentized residuals greater than 1.000 are listed. By default, cases with residual exceeding 1 are listed (classified as outliers) There is only one case having birth rate as 1 that is greater than 93.6 is misclassified. Dependent and Independent variables: Birth rate is Dichotomous dependant variable and 2 independent continuous variables plus one dichotomous independent variable is considered for Logistic regression. All three independent variables have no correlation between themselves.
  • 12. [Type here] SPSS Output Prediction: . Categorical Variables Codings Frequency Parameter coding (1) Abortion 0 26 1.000 1 27 .000 Both the categories have equal number of variables, No one category group have very less number. Block 0: Beginning Block Classification Tablea,b Observed Predicted Birth Percentage Correct0 1 Step 0 Birth 0 0 10 0.0 1 0 43 100.0 Overall Percentage 81.1 a. Constantis included in the model. b. The cut value is .500 This is a beginning block which does not contain independent variables. Overall percentage with correctly classified cases is 81.1. We must expect increase in percentage value once all independent variables are involved in the model. Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 0 Constant 1.459 .351 17.261 1 .000 4.300
  • 13. [Type here] Variables not in the Equation Score df Sig. Step 0 Variables BirthAttendendedBySpecialis t 18.114 1 .000 AntenatalCare 19.948 1 .000 Abortion(1) 1.791 1 .181 Overall Statistics 23.727 3 .000 The Omnibus Tests of Model Coefficients Omnibus Tests of Model Coefficients Chi-square df Sig. sStep 1 Step 23.209 3 .000 Block 23.209 3 .000 Model 23.209 3 .000  Significant value tells us if there are significant difference between actual and predicted values.  In this case, the value is .000 (p<.0005). Therefore, the model fit is acceptable and ideal.  The chi-square value, which we will need to report in our results, is 23.20 with 3 degrees of freedom. Hosmer and Lemeshow Test Step Chi-square df Sig. 1 9.981 8 .266 Hosmer and Lemeshow chi square value is 9.981 with 5 degrees of freedom and significance value is 0.266(it should be greater than 0.05) which implies support for model. Model Summary Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square 1 28.127a .355 .572 a. Estimation terminated atiteration number 6 because parameter estimates changed byless than .001.  The cox & snell R square of 0.355 and Nagelkerke R square of 0.4572 are analogous to R2 measure.
  • 14. [Type here]  In this example, the two values are suggesting that between 35.5% and 45.72% of the variability is explained by this set of variables. Classification Tablea Observed Predicted Birth Percentage Correct0 1 Step 1 Birth 0 4 6 40.0 1 1 42 97.7 Overall Percentage 86.8 a. The cut value is .500  The percentage for corrected module with including independent variables is 86.6% which is improved by 6.7%.  This model is 97.7% sensitive and 40% is specificity  BirthRate with 0 that is less than 93.6% is predicted to be 40% and Birth rate greater than 93.6% which is 1 is predicted to be 97.7%  Whereas, BirthRate with value 0 is not predicted is 60% and BirthRate with value 1 is not predicted is 2.8% Variables in the Equation B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B) Lower Upper Step 1a BirthAttendendedBySpecialist 0.083 0.045 3.380 1 0.066 1.087 0.995 1.187 AntenatalCare 0.104 0.048 4.680 1 0.031 1.110 1.010 1.220 Abortion(1) -1.779 1.233 2.080 1 0.149 0.169 0.015 1.893 Constant - 12.629 4.652 7.368 1 0.007 0.000 a. Variable(s) entered on step 1: BirthAttendendedBySpecialist,AntenatalCare,Abortion.  This table provide values for the variables which contribute in our model.  Test used here is Wald Test, the value under column name wald represent statistics value of each of the predictor.  Sig value represent significant value of each of the variable in the model and value should be greater than 0.05.  We can clearly see that BirthattendedBySpecialist (sig=0.066) and AntenatalCare (sig=0.031) variables are more significant as compared to Abortion which is a categorical variable in the model.
  • 15. [Type here] Results Based on logistic regression analysis we have the equation as below: logit(p) = -12.629 + 0.083 (BirthAttendendedBySpecialist )+ 0.104 (AntenatalCare )-1.779 (AbortionRate()1) In this model we can remove AbortionRate from model as it is having significant value greater than 0.05 which is 0.149 which implies AbortionRate which is a categorical variable is not contributing strongly in our model.Whereas BirthAttendendedBySpecialist and AntenatalCare are contributing strongly in our model. Ideally AbortionRate should be inversely proportional to BirthRate and it is proved from Logistic regression that it is not contributing to increase Birth rate. We will get probability after substituting respective independent variables in logic regression equation. If the probability is greater than 0.5 then BirthRate will be 1 that means it is greater than 93.6% and if probability is less than 0.5 that means BirthRate will be 0 that is it is less than 93.6%
  • 16. [Type here] References: 1. Pallant, Julie. SPSS Survival Manual : a Step by Step Guide to Data Analysis Using SPSS. Maidenhead :Open University Press/McGraw-Hill, 2010. Print. 2. https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php 3. https://www.sheffield.ac.uk/polopoly_fs/1.531431!/file/MASHRegression_Further_SPSS.pdf