SlideShare a Scribd company logo
Statistics Report on Multiple Regression and Logistic Regression
1
National College of Ireland
STATISTICS REPORT
ON
MULTIPLE REGRESSION AND
LOGISTIC REGRESSION
By
Alekhya Bhupati
(x18132634)
MSc in Data Analytics
(MSCDAD_A)
Statistics Report on Multiple Regression and Logistic Regression
2
MULTIPLE REGRESSION ANALYSIS
Multiple regression is extension of simple linear regression, where value is predicated based on
two or more variables
Assumptions:
• The dependent variable is continuous in nature.
• Two or more independent variables can be either continuous or categorical.
• The dependent and each of the independent variables has some linear relationship.
Data source
This analysis has been done on Gender Inequality Index (GII). Data source link is as follows:
http://data.un.org/DocumentData.aspx?id=391
Objective
In this analysis we are using multiple regression on our data source to
• Study the various factors effecting GII.
• Study the relationship between all the factors effecting GII.
Data information
In this project I have considered “Gender Inequality Index (GII)” as independent variable and
dependent variables as follows
1) Maternal mortality ratio
2) Adolescent birth rate
3) Share of seats in parliament
4) Population with at least some secondary education
5) Labor force participation rate
Figure 1: Screenshot of data used for Multiple Regression
Statistics Report on Multiple Regression and Logistic Regression
3
Software
R is simple, effective and opensource language and which is highly used for analyzing data
manipulation, data handling, data visualization, statistical result and graphics.
In R studio, we use ‘read.csv’ command to load the data as shown below:
mg<- read.csv("Gender_Inequality_Index.csv",TRUE,",")
Data cleaning
The raw data consists of 228 rows of information for 10 columns. To have high quality data, rows
with insignificant and missing values were eliminated using R code as shown in the below diagram
to make our data suitable for Multiple regression. After cleaning, the data set consist of 159 rows
and 7 columns
Figure 2: R code for data cleaning
Output of multiple regression data summary
This table shows the summary of the data in terms of maximum, minimum, mean, median,
1st
quartile, 3rd
quartile of each factor.
> summary(GII)
Statistics Report on Multiple Regression and Logistic Regression
4
Figure 3: summary(GII)
Correlation matrix
Correlation shows the relationship between two variables and describes the whether the dependent
variable is having positive correlation or negative correlation.
> cor(GII[2:7])
Figure 4: Correlation Matrix
From this Correlation Matrix, we can say 2 variables ‘Maternal Mortality Ratio’ and ‘Adolescent
Birth’ rate is following the positive trend with GII and the remaining 3 three variables ‘Share of
seats by Women in Parliament’, ‘Population with at least some secondary education’ and ‘Labor
force Participation rate’ are following negative trend.
Pairwise matrix of scatter plot
Using below command we can easily analyze the relationship between each component.
> pairs(GII[2:7])
Statistics Report on Multiple Regression and Logistic Regression
5
Figure 5: Pairwise matrix of scatter plot
This Figure 5 represents, with the increase of ‘Maternal Mortality rate’ the ‘Adolescent Birth rate’ is
also increasing this shows the correlation between these two components is positive. And with the
increase of ‘Maternal Mortality rate’ there is decrease in the ‘Share of seats by Women in
Parliament’ percentage means correlation between these two components is negative. In the same
way we can analyze the correlation relationship for all the component.
Linear model
>GII.final <-
lm(Gender_Inequality_Index~Maternal_Mortality_Ratio+Adolescent_Birth_Rate+Share_of_Seat
s_by_Women_in_Parliment+Population_with_at_least_some_secondary_education+Labour_forc
e_participation_rate,data=GII)
> GII.final
Figure 6: Coefficients
Statistics Report on Multiple Regression and Logistic Regression
6
Formulae for Multiple regression Model
Where Y is predicted value for dependent variable
And b0, b1, b2 are estimates of X1, X2, X3
In Figure.6 (Coefficients), we have obtained unstandardized coefficients in the form of B values. The
B values can be assigned to each dependent variable. substitute the values for our independent
variables to predict our dependent variable (GII) is:
Gender_Inequality_Index=0.1797501 + 0.0003497xMaternal_mortality_Ratio +
0.0024407xAdolescent_Birth_Rate - 0.0030140xShare_of_seats_by_women_in_Parliament -
0.0015625xPopulation_with_at_least_some_secondary_education -
0.0033249xLabor_force_Participation_rate
> Summary(GII.final)
Figure 7: summary(GII.final)
In Figure.7, we can determine whether the independent variables used in the test are statistically
significant or not. It is evident that the ‘Maternal Mortality Ratio’, ‘Adolescent Birth’, ‘Share of
seats by Women in Parliament’ and the ‘Labor force Participation rate’ of GII are statistically
significant. However, the ‘Population with at least some secondary education’ of GII is found to be
slightly insignificant.
An R2 value close to 1 indicates that the model explains a large portion of the variance in the outcome
variable, so that the variables used in our model are relatable. The coefficient of determination (R
Square) of 0.8286 shows that our independent variables got 82.86% variability on our dependent
variable.
Statistics Report on Multiple Regression and Logistic Regression
7
we have degrees of freedom values 5 and 153 and we also have an F value of 147.9. These values
can also be represented as F (5,153) = 147.9. Also, from the output, we have a P-value of <2.2e-
16 which is lesser than 0.05 and this proves that our data has a good fit for the regression model
we have.
Analysis of variance for individual terms
> library("car")
> Anova(GII.final)
Figure 8: Anova
Simple plot of predicted values with 1-to-1 line
> GII.Predict <- GII
> GII.Predict$Predict_Value <- predict(GII.final)
> plot(Predict_Value ~ Gender_Inequality_Index,data = GII.Predict,main="Predicted vs
Actual", sub ="Dependent Variable: Gender Inequality Index",xlab = "Actual response
value",ylab ="Predicted response value")
> abline(0,1, col="blue" ,lwd=2 )
Figure 9: Predicted Response vs Actual Response
Statistics Report on Multiple Regression and Logistic Regression
8
Histogram
> hist(residuals(GII.final), col="darkgray")
Figure 10: Histogram
Residual plot
> plot(GII.final,which = 1 )
Figure 11: Residual Plot
From the Figure 9,10,11, the independent variables are normally distributed and have linear
relationship with the dependent variable (GII).
Conclusion
In this multiple regression analysis, the statistical significance to find out the Gender Inequality
Index can be grouped as F (5,153) = 147.9 and the prediction percentage is 82.86. Four out of five
variables used in our test are statistically significant.
Statistics Report on Multiple Regression and Logistic Regression
9
LOGISTIC REGRESSION ANALYSIS
Logistic regression is a statistical method to analyze the data and relationship between one of two
categories of a dichotomous dependent variable based on one or more independent variables that
can be either continuous or categorical.
Assumptions:
1. The dependent variable is dichotomous or binary in nature.
2. There must be two or more independent variables, or predictors, for a logistic regression.
3. There should be some relationship between the dependent and each of the independent variables
used for analysis.
Logistic Regression in mathematic term:
Logit(p)=
For I = 1 to n
Data source
http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a101 – Employment-to-Population ratio
Objective
In this analysis, we are using binary logistic regression to check the probability of employability
rate (dependent variable) with gender, age, years and countries.
Data information
Measurement levels of variables:
In our filtered data, we have one dependent variable and three independent variables. In this Statistical
analysis, variables were grouped into the following categories.
Nominal variables: Employability_Rate, Gender
Ordinal variables: Country
Interval variables: Year, Age
Ratio variables: None
Context of Data used:
Dependent variable:
Employability_Rate (percentage of employment-to-population ratio) coded as ‘1’ if the percentage is
higher than 55.13 else ‘0’
Statistics Report on Multiple Regression and Logistic Regression
10
Independent variable:
Gender coded as ‘0’ for Female and ‘1’ for Male.
Age coded as ‘0’ for 15+ yr and ‘1’ for 15-24 yr
Country coded as '1' for Australia, '2' for Canada, '3' for China, '4' for Germany, '5' for India,
'6' for Ireland, '7' for New Zealand, '8' for South Africa, '9' for United Kingdom, '10' for United States of
America
Software
SPSS software is used for analyzing the output of logistic regression.
R studio is used to clean the data.
Data cleaning
The raw data consists of 11265 rows of information for 7 columns. I have selected data for 10 countries
over selected years for significant and clear analysis. To have high quality data, rows with insignificant
and missing values were eliminated using R code as shown in the below diagram to make our data
suitable for our analysis. After cleaning, the data set consist of 641 rows and 5 columns.
Figure 12: Screenshot of data used for Logistic Regression
Analysis Method:
To perform this statistical analysis, we must run the Binary logistic Regression in SPSS software by
following the steps below:
• Import the cleaned and transformed data into SPSS and set proper measures for each variable.
• Click on Analyze – Regression – Binary Logistic from the menu and a dialog box will appear.
• In the dialog box appeared, move Employability_Rate as the dependent variable and Country,
year, gender and age as the independent variables.
• In the options menu, set confidence interval to 95% and make sure residual statistics, goodness
of fit and classification plots checkboxes was chosen.
• Click on continue and verify the details and click OK to run the program in SPSS software to
interpret the results from the data set.
Statistics Report on Multiple Regression and Logistic Regression
11
Results:
We have several tables generated as a result of our multiple regression model. We’ll go through each
representation and interpret our findings.
Figure 13: Model Summary
In Figure 13 (Model Summary), we have obtained two R Square values of 0.281 and 0.377 based on
two different scales. For the convenience, we will ignore the Cox & Snell’s R value and consider
Nagelkerke’s value. The Nagelkerke R Square value of 0.377 shows that our independent variables
account to 37.7% of the dependent variable’s variability.
Figure 14: Classification Table
The overall percentage of the binary logistic regression 76.3 percent. This clearly depicts that our
prediction is highly accurate, and this can be used to predict the employability rate using the other
independent variables. The cut value in the table is the probability of an event happening. If the
probability is less than the cut value, then it’s categorized in the first group. Or else, it falls in the second
group.
Figure 15: Variables Table
Statistics Report on Multiple Regression and Logistic Regression
12
From Figure 15 (Variables Table), we can predict the possibility of an event by varying an independent
variable by 1 unit keeping the others unchanged. This test also called as Wald test and used for status of
predictor variable. In this table we have looking for significance value which is less than .05. The
statistical significance of the test shows that all the variables are significant.
Conclusion:
On executing the binary logistic regression analysis to predict the employability rate for selected
countries over the years, we observed that all four independent variables are statistically significant.
Also, our model determined an R square value of 37.7% and the regression accounts for about 73.3%
accuracy.
References:
[1] PALLANT, J. SPSS Survival Manual. 6th
Edition. McGraw Hill, 2016.
[2] IBM SPSS 25 https://www.ibm.com/analytics/spss-statistics-software.
[3] Brett Lantz (2013) Machine learning with R. Second Edition.

More Related Content

What's hot

Research metholodogy report
Research metholodogy reportResearch metholodogy report
Research metholodogy report
Harjas Singh
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1Muhammad Ali
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
Geethu Rangan
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
Antoine De Henau
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
GunjanKhandelwal13
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
DrZahid Khan
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares
Yugesh Dutt Panday
 
e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1
e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1
e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1
Ella Anwar
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
Bernard Asia
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
Anirudha si
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Nicha Tatsaneeyapan
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
Nadzirah Hanis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Preprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive ModelingPreprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive Modeling
ijtsrd
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
Muhammad Ali
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost Prediction
Prasann Prem
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
Julio Martinez Andrade
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
Pawan Kawan
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
inventionjournals
 

What's hot (19)

Research metholodogy report
Research metholodogy reportResearch metholodogy report
Research metholodogy report
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares7 classical assumptions of ordinary least squares
7 classical assumptions of ordinary least squares
 
e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1
e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1
e-Portfolio for Lab-Based Statistics (PSYC 3100) part 1
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
Estimating ambiguity preferences and perceptions in multiple prior models: Ev...
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Preprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive ModelingPreprocessing of Low Response Data for Predictive Modeling
Preprocessing of Low Response Data for Predictive Modeling
 
Heteroscedasticity
HeteroscedasticityHeteroscedasticity
Heteroscedasticity
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost Prediction
 
Modelo Generalizado
Modelo GeneralizadoModelo Generalizado
Modelo Generalizado
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
 
Multiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate PricingMultiple Linear Regression Applications in Real Estate Pricing
Multiple Linear Regression Applications in Real Estate Pricing
 

Similar to Statistics_Regression_Project

Statistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic RegressionStatistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic Regression
SindhujanDhayalan
 
Add slides
Add slidesAdd slides
Add slidesRupa D
 
X18145922 statistics ca2 final
X18145922   statistics ca2 finalX18145922   statistics ca2 final
X18145922 statistics ca2 final
SRIVATSAV KATTUKOTTAI MANI
 
Introduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptxIntroduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptx
tadegebreyesus
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regression
ijtsrd
 
Cost Prediction of Health Insurance
Cost Prediction of Health InsuranceCost Prediction of Health Insurance
Cost Prediction of Health Insurance
IRJET Journal
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine Learning
IdanGalShohet
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
inventionjournals
 
Econometrics
EconometricsEconometrics
Econometrics
Stephanie King
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
VishalLabde
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
Nisheet Mahajan
 
Quantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic ProjectionsQuantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic Projections
Congressional Budget Office
 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEChaoyi WU
 
Chapter 9 Regression
Chapter 9 RegressionChapter 9 Regression
Chapter 9 Regressionghalan
 

Similar to Statistics_Regression_Project (15)

Statistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic RegressionStatistical analysis of Multiple and Logistic Regression
Statistical analysis of Multiple and Logistic Regression
 
Add slides
Add slidesAdd slides
Add slides
 
X18145922 statistics ca2 final
X18145922   statistics ca2 finalX18145922   statistics ca2 final
X18145922 statistics ca2 final
 
Introduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptxIntroduction to Econometrics for under gruadute class.pptx
Introduction to Econometrics for under gruadute class.pptx
 
Final Project Statr 503
Final Project Statr 503Final Project Statr 503
Final Project Statr 503
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regression
 
Cost Prediction of Health Insurance
Cost Prediction of Health InsuranceCost Prediction of Health Insurance
Cost Prediction of Health Insurance
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine Learning
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
 
Econometrics
EconometricsEconometrics
Econometrics
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
 
Quantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic ProjectionsQuantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic Projections
 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
 
Chapter 9 Regression
Chapter 9 RegressionChapter 9 Regression
Chapter 9 Regression
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Statistics_Regression_Project

  • 1. Statistics Report on Multiple Regression and Logistic Regression 1 National College of Ireland STATISTICS REPORT ON MULTIPLE REGRESSION AND LOGISTIC REGRESSION By Alekhya Bhupati (x18132634) MSc in Data Analytics (MSCDAD_A)
  • 2. Statistics Report on Multiple Regression and Logistic Regression 2 MULTIPLE REGRESSION ANALYSIS Multiple regression is extension of simple linear regression, where value is predicated based on two or more variables Assumptions: • The dependent variable is continuous in nature. • Two or more independent variables can be either continuous or categorical. • The dependent and each of the independent variables has some linear relationship. Data source This analysis has been done on Gender Inequality Index (GII). Data source link is as follows: http://data.un.org/DocumentData.aspx?id=391 Objective In this analysis we are using multiple regression on our data source to • Study the various factors effecting GII. • Study the relationship between all the factors effecting GII. Data information In this project I have considered “Gender Inequality Index (GII)” as independent variable and dependent variables as follows 1) Maternal mortality ratio 2) Adolescent birth rate 3) Share of seats in parliament 4) Population with at least some secondary education 5) Labor force participation rate Figure 1: Screenshot of data used for Multiple Regression
  • 3. Statistics Report on Multiple Regression and Logistic Regression 3 Software R is simple, effective and opensource language and which is highly used for analyzing data manipulation, data handling, data visualization, statistical result and graphics. In R studio, we use ‘read.csv’ command to load the data as shown below: mg<- read.csv("Gender_Inequality_Index.csv",TRUE,",") Data cleaning The raw data consists of 228 rows of information for 10 columns. To have high quality data, rows with insignificant and missing values were eliminated using R code as shown in the below diagram to make our data suitable for Multiple regression. After cleaning, the data set consist of 159 rows and 7 columns Figure 2: R code for data cleaning Output of multiple regression data summary This table shows the summary of the data in terms of maximum, minimum, mean, median, 1st quartile, 3rd quartile of each factor. > summary(GII)
  • 4. Statistics Report on Multiple Regression and Logistic Regression 4 Figure 3: summary(GII) Correlation matrix Correlation shows the relationship between two variables and describes the whether the dependent variable is having positive correlation or negative correlation. > cor(GII[2:7]) Figure 4: Correlation Matrix From this Correlation Matrix, we can say 2 variables ‘Maternal Mortality Ratio’ and ‘Adolescent Birth’ rate is following the positive trend with GII and the remaining 3 three variables ‘Share of seats by Women in Parliament’, ‘Population with at least some secondary education’ and ‘Labor force Participation rate’ are following negative trend. Pairwise matrix of scatter plot Using below command we can easily analyze the relationship between each component. > pairs(GII[2:7])
  • 5. Statistics Report on Multiple Regression and Logistic Regression 5 Figure 5: Pairwise matrix of scatter plot This Figure 5 represents, with the increase of ‘Maternal Mortality rate’ the ‘Adolescent Birth rate’ is also increasing this shows the correlation between these two components is positive. And with the increase of ‘Maternal Mortality rate’ there is decrease in the ‘Share of seats by Women in Parliament’ percentage means correlation between these two components is negative. In the same way we can analyze the correlation relationship for all the component. Linear model >GII.final <- lm(Gender_Inequality_Index~Maternal_Mortality_Ratio+Adolescent_Birth_Rate+Share_of_Seat s_by_Women_in_Parliment+Population_with_at_least_some_secondary_education+Labour_forc e_participation_rate,data=GII) > GII.final Figure 6: Coefficients
  • 6. Statistics Report on Multiple Regression and Logistic Regression 6 Formulae for Multiple regression Model Where Y is predicted value for dependent variable And b0, b1, b2 are estimates of X1, X2, X3 In Figure.6 (Coefficients), we have obtained unstandardized coefficients in the form of B values. The B values can be assigned to each dependent variable. substitute the values for our independent variables to predict our dependent variable (GII) is: Gender_Inequality_Index=0.1797501 + 0.0003497xMaternal_mortality_Ratio + 0.0024407xAdolescent_Birth_Rate - 0.0030140xShare_of_seats_by_women_in_Parliament - 0.0015625xPopulation_with_at_least_some_secondary_education - 0.0033249xLabor_force_Participation_rate > Summary(GII.final) Figure 7: summary(GII.final) In Figure.7, we can determine whether the independent variables used in the test are statistically significant or not. It is evident that the ‘Maternal Mortality Ratio’, ‘Adolescent Birth’, ‘Share of seats by Women in Parliament’ and the ‘Labor force Participation rate’ of GII are statistically significant. However, the ‘Population with at least some secondary education’ of GII is found to be slightly insignificant. An R2 value close to 1 indicates that the model explains a large portion of the variance in the outcome variable, so that the variables used in our model are relatable. The coefficient of determination (R Square) of 0.8286 shows that our independent variables got 82.86% variability on our dependent variable.
  • 7. Statistics Report on Multiple Regression and Logistic Regression 7 we have degrees of freedom values 5 and 153 and we also have an F value of 147.9. These values can also be represented as F (5,153) = 147.9. Also, from the output, we have a P-value of <2.2e- 16 which is lesser than 0.05 and this proves that our data has a good fit for the regression model we have. Analysis of variance for individual terms > library("car") > Anova(GII.final) Figure 8: Anova Simple plot of predicted values with 1-to-1 line > GII.Predict <- GII > GII.Predict$Predict_Value <- predict(GII.final) > plot(Predict_Value ~ Gender_Inequality_Index,data = GII.Predict,main="Predicted vs Actual", sub ="Dependent Variable: Gender Inequality Index",xlab = "Actual response value",ylab ="Predicted response value") > abline(0,1, col="blue" ,lwd=2 ) Figure 9: Predicted Response vs Actual Response
  • 8. Statistics Report on Multiple Regression and Logistic Regression 8 Histogram > hist(residuals(GII.final), col="darkgray") Figure 10: Histogram Residual plot > plot(GII.final,which = 1 ) Figure 11: Residual Plot From the Figure 9,10,11, the independent variables are normally distributed and have linear relationship with the dependent variable (GII). Conclusion In this multiple regression analysis, the statistical significance to find out the Gender Inequality Index can be grouped as F (5,153) = 147.9 and the prediction percentage is 82.86. Four out of five variables used in our test are statistically significant.
  • 9. Statistics Report on Multiple Regression and Logistic Regression 9 LOGISTIC REGRESSION ANALYSIS Logistic regression is a statistical method to analyze the data and relationship between one of two categories of a dichotomous dependent variable based on one or more independent variables that can be either continuous or categorical. Assumptions: 1. The dependent variable is dichotomous or binary in nature. 2. There must be two or more independent variables, or predictors, for a logistic regression. 3. There should be some relationship between the dependent and each of the independent variables used for analysis. Logistic Regression in mathematic term: Logit(p)= For I = 1 to n Data source http://data.un.org/Data.aspx?d=GenderStat&f=inID%3a101 – Employment-to-Population ratio Objective In this analysis, we are using binary logistic regression to check the probability of employability rate (dependent variable) with gender, age, years and countries. Data information Measurement levels of variables: In our filtered data, we have one dependent variable and three independent variables. In this Statistical analysis, variables were grouped into the following categories. Nominal variables: Employability_Rate, Gender Ordinal variables: Country Interval variables: Year, Age Ratio variables: None Context of Data used: Dependent variable: Employability_Rate (percentage of employment-to-population ratio) coded as ‘1’ if the percentage is higher than 55.13 else ‘0’
  • 10. Statistics Report on Multiple Regression and Logistic Regression 10 Independent variable: Gender coded as ‘0’ for Female and ‘1’ for Male. Age coded as ‘0’ for 15+ yr and ‘1’ for 15-24 yr Country coded as '1' for Australia, '2' for Canada, '3' for China, '4' for Germany, '5' for India, '6' for Ireland, '7' for New Zealand, '8' for South Africa, '9' for United Kingdom, '10' for United States of America Software SPSS software is used for analyzing the output of logistic regression. R studio is used to clean the data. Data cleaning The raw data consists of 11265 rows of information for 7 columns. I have selected data for 10 countries over selected years for significant and clear analysis. To have high quality data, rows with insignificant and missing values were eliminated using R code as shown in the below diagram to make our data suitable for our analysis. After cleaning, the data set consist of 641 rows and 5 columns. Figure 12: Screenshot of data used for Logistic Regression Analysis Method: To perform this statistical analysis, we must run the Binary logistic Regression in SPSS software by following the steps below: • Import the cleaned and transformed data into SPSS and set proper measures for each variable. • Click on Analyze – Regression – Binary Logistic from the menu and a dialog box will appear. • In the dialog box appeared, move Employability_Rate as the dependent variable and Country, year, gender and age as the independent variables. • In the options menu, set confidence interval to 95% and make sure residual statistics, goodness of fit and classification plots checkboxes was chosen. • Click on continue and verify the details and click OK to run the program in SPSS software to interpret the results from the data set.
  • 11. Statistics Report on Multiple Regression and Logistic Regression 11 Results: We have several tables generated as a result of our multiple regression model. We’ll go through each representation and interpret our findings. Figure 13: Model Summary In Figure 13 (Model Summary), we have obtained two R Square values of 0.281 and 0.377 based on two different scales. For the convenience, we will ignore the Cox & Snell’s R value and consider Nagelkerke’s value. The Nagelkerke R Square value of 0.377 shows that our independent variables account to 37.7% of the dependent variable’s variability. Figure 14: Classification Table The overall percentage of the binary logistic regression 76.3 percent. This clearly depicts that our prediction is highly accurate, and this can be used to predict the employability rate using the other independent variables. The cut value in the table is the probability of an event happening. If the probability is less than the cut value, then it’s categorized in the first group. Or else, it falls in the second group. Figure 15: Variables Table
  • 12. Statistics Report on Multiple Regression and Logistic Regression 12 From Figure 15 (Variables Table), we can predict the possibility of an event by varying an independent variable by 1 unit keeping the others unchanged. This test also called as Wald test and used for status of predictor variable. In this table we have looking for significance value which is less than .05. The statistical significance of the test shows that all the variables are significant. Conclusion: On executing the binary logistic regression analysis to predict the employability rate for selected countries over the years, we observed that all four independent variables are statistically significant. Also, our model determined an R square value of 37.7% and the regression accounts for about 73.3% accuracy. References: [1] PALLANT, J. SPSS Survival Manual. 6th Edition. McGraw Hill, 2016. [2] IBM SPSS 25 https://www.ibm.com/analytics/spss-statistics-software. [3] Brett Lantz (2013) Machine learning with R. Second Edition.