SlideShare a Scribd company logo
1 of 12
Download to read offline
National College of Ireland
Project Submission Sheet – 2018/2019
Student Name: Yash Balaji Iyengar
………………………………………………………………………………………………………………
Student ID: X18124739
………………………………………………………………………………………………………………
Programme: Msc Data Analytics Cohort B
………………………………………………………………
Year: 2019-2020
………………………
Module: Statistics in Data Analytics
………………………………………………………………………………………………………………
Lecturer: Tony Delaney
………………………………………………………………………………………………………………
Submission Due
Date:
7th
April 2019
………………………………………………………………………………………………………………
Project Title: Statistics Continuous Assessment 2
………………………………………………………………………………………………………………
Word Count: 1718………………………………………………………………………………………………………………
I hereby certify that the information contained in this (my submission) is information
pertaining to research I conducted for this project. All information other than my own
contribution will be fully referenced and listed in the relevant bibliography section at the
rear of the project.
ALL internet material must be referenced in the references section. Students are
encouraged to use the Harvard Referencing Standard supplied by the Library. To use
other author's written or electronic work is illegal (plagiarism) and may result in
disciplinary action. Students may be required to undergo a viva (oral examination) if
there is suspicion about the validity of their submitted work.
Signature: ………………………………………………………………………………………………………………
Date: 7/04/2019
…………………………………………………………………………………………………………
PLEASE READ THE FOLLOWING INSTRUCTIONS:
1. Please attach a completed copy of this sheet to each project (including multiple copies).
2. Projects should be submitted to your Programme Coordinator.
3. You must ensure that you retain a HARD COPY of ALL projects, both for your own reference
and in case a project is lost or mislaid. It is not sufficient to keep a copy on computer. Please
do not bind projects or place in covers unless specifically requested.
4. You must ensure that all projects are submitted to your Programme Coordinator on or before
the required submission date. Late submissions will incur penalties.
5. All projects must be submitted and passed in order to successfully complete the year. Any
project/assignment not submitted will be marked as a fail.
Office Use Only
Signature:
Date:
Penalty Applied (if applicable):
MULTIPLE LINEAR REGRESSION
Introduction: In statistical analysis, Regression is a set of statistical processes which is
used to understand the relationship between variables. Multiple Regression is used to
predict the value of a variable when there are two or more other variables. The variable we
predict is called the dependent variable and the variables which we use to predict the value
are called the independent variables.
Data Description:
Datasets are downloaded from the following link:
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHS2_3070_cancer
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000011
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aTOBACCO_0000000344
Three datasets have been merged to form one single dataset. The dataset consists of
• Number of Deaths due to Cancer
• Alcohol Consumption among adults in (Liters)
• Current users of any tobacco products (rate of users)
Here my dependent variable is "Number of Deaths due to Cancer" and my independent
variables are "Alcohol Consumption among adults" and "Current users of any tobacco
products". Here the model will try to predict the causes of death due to the independent
variables. We have taken a total of 152 samples for this model.
Assumptions:
1) Our dependent variable "Number of Deaths due to Cancer" is a continuous variable.
It is a count of deaths and can be measured on a continuous scale.
2) There are two independent variables "Alcohol Consumption among adults" and
"Current users of any tobacco products" both are continuous in nature.
3) Here we check for auto-correlation between the observations.
The Durbin-Watson test checks for auto-correlation between the observations if the value is
between 1.5 and 2.5, we can conclude that the observations are not auto-correlated. Our
value is 1.993, therefore we can conclude that our observations are independent. Here the
R square value gives us the strength of the model. The R square value is 12%.
4) We will now check for linearity between dependent and independent variables.
From the above figure, we can see that the variables are linearly distributed and there are
no outliers.
5) Lets now check for homoscedasticity.
From the scatterplot we can see that there is no specific pattern the variable is plotted
in except for a couple of outliers, so we can say that the variance of the data remains
similar along the best fit line. This means our data is homoscedastic in nature.
6) Let us now check for multi-collinearity between the independent variables
We can see from the above table that the dependent variable Deaths is correlated with
Alcohol Consumption as the Pearson value is 0.353 which is above (0.3) but the
correlation of Tobacco Consumption with Death is low as the Pearson Correlation value
is 0.060 which is less than 0.3.
We can see that multicollinearity does not exist between the two independent variables
Tobacco Consumption and Alcohol Consumption as the Pearson Correlation value is
0.169 which is less than 0.70.
7) Let’s check our data for Normality
From the above histogram, we can observe that the data is normally distributed except
for one outlier.
SPSS Output Interpretation:
From the above table, we can observe that there152 samples. Means and standard
deviations for all the variables is calculated.
• From the above table we can check the significance value of each independent
variable. The significance value should be less than 0.05 and it shows how much
of an impact it has on the dependent variable.
• Alcohol consumption has a significant impact on the number of Deaths but on
the other hand Tobacco consumption has almost no impact on the deaths.
• Also, the Unstandardized Coefficient column tells us about the slope of the best
fit line. From the observed values we can draw the regression line.
• The Tolerance explains collinearity and the that value should be above 0.1 to
avoid multi-collinearity our value is 0.971 also the VIF value should be less than
10 our value is 1.029.
From the above table we can interpret the following information:
• Sum of Squares column shows that 27481.190 observations were predicted out
of 220223.520. The significance value is 0.
• Also, our model predicts 2 out of 151 degrees of freedom.
Result:
As the analysis has been conducted, we have obtained the regression equation as follows:
Deaths = 130.646 + 0(Tobacco_Con) + 3.388(Alco_Con)
Since the constant for the Tobacco_Con is 0 it means it does not contribute to predicting the
cause of death. So, the equation becomes like this.
Deaths = 130.646 + 3.388(Alco_Con)
So, we see that the coefficient of an independent variable is the amount of change that
occurs in the dependent variable. So, multiple regression analysis checks what effect does
the independent variables have on predicting the dependent variable.
Binary Logistic Regression
Introduction:
Logistic Regression is used to predict a dichotomous dependent variable with the help of
one or more continuous or categorical variable.
Data Description:
Datasets are downloaded from the following link:
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHS2_3070_cancer
http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000011
The data consists of two columns
• Deaths (Factor of Yes/No)
• Alcohol Consumption (Liters)
"Deaths" column is the dichotomous dependent variable, I have coded Yes for 1 and No for
0 and "Alcohol Consumption" is the continuous independent variable. Here the model tries
to predict if the death occurs due to Alcohol consumption.
Assumptions:
1) The dependent variable should be dichotomous. “Deaths” is a dichotomous variable.
2) There should be at least one or more independent variable. Alcohol Consumption is
our independent and continuous variable.
3) The sample size should be large. We have a dataset of 152 samples.
4) Since we have only one independent variable multi-collinearity won’t occur.
5) Let us check for Outliers in the Data.
Case-wise listing was not produced since there are no outliers in the data.
SPSS Output Interpretation:
From the case processing summary, we can observe that all the samples have been
processed, total number of samples is 152.
Now there are two blocks of outputs. Block zero is the case where SPSS runs the model
without providing independent variables. Let us interpret its results as follows:
The block 0, classification table shows that the model predicts that the deaths do not occur
for all the cases. This happens because independent variable is not provided to the model.
Therefore, the model predicts only 54.6 % values as correct.
In block 1, we see the table Omnibus Tests of Model Coefficients. Here the model is tested
with the predictor variables. This is the goodness of fit test where the results are compared
with block zero to check if the predictor variables have had an impact on the dependent
variable. The significance value should be less than 0.05. Our table shows 0.013 which
means the independent variable has an impact on the dependent variable.
The Cox and Snell R square and Nagelkerke R square values both show the amount of
variation the model has on the dependent variable. It means that 3.9% to 5.3% of the
variation in the dependent variable is due to the model.
This classification table belongs to block 1 and shows prediction that is 61.2% which is better
than the block 0 prediction that is 54.6%. This happens because here the predictor variables
have been included in the model processing.
The Hosmer Lemeshow test is also used to check for goodness of fit. The significance value
should be greater than 0.05. Our significance value is 0.06 which means our model is good.
This table tells us how much the independent variable contributes to model prediction.
The significance value should be less than 0.05. Significance value for Alcohol consumption
is 0.015. B value is the constant, it gives the amount of effect it has on the dependent
variable.
Result:
Based on the Logistic regression analysis we get the following regression equation:
Deaths = 0.103 – 0.653(Alco_Con)
Once we replace the independent variable that is Alcohol Consumption value in the above
equation, we will get the probability. If the probability is higher than 0.5 then there is a
chance that death might occur and if the probability is less than 0.5 then the death might
not occur.
References
• SPSS survival manual by Julie Pallant third edition.
• https://statistics.laerd.com/spss-tutorials/binomial-logistic-regression-
using-spss-statistics.php
• https://statistics.laerd.com/spss-tutorials/multiple-regression-using-
spss-statistics.php

More Related Content

What's hot

DATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathaniaDATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathaniaSachin Pathania
 
Lesson 16 Data Analysis Ii
Lesson 16 Data Analysis IiLesson 16 Data Analysis Ii
Lesson 16 Data Analysis Iivinod
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysisAwais Salman
 
Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation Qasim Raza
 
2.3 the simple regression model
2.3 the simple regression model2.3 the simple regression model
2.3 the simple regression modelRegmi Milan
 
Multicolinearity
MulticolinearityMulticolinearity
MulticolinearityPawan Kawan
 
Topic 15 correlation spss
Topic 15 correlation spssTopic 15 correlation spss
Topic 15 correlation spssSizwan Ahammed
 
Chapter 9 Regression
Chapter 9 RegressionChapter 9 Regression
Chapter 9 Regressionghalan
 
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...CSCJournals
 
Dependence Techniques
Dependence Techniques Dependence Techniques
Dependence Techniques Hasnain Khan
 
Ch8 Regression Revby Rao
Ch8 Regression Revby RaoCh8 Regression Revby Rao
Ch8 Regression Revby RaoSumit Prajapati
 

What's hot (19)

Correlation analysis
Correlation analysis  Correlation analysis
Correlation analysis
 
Outlier managment
Outlier managmentOutlier managment
Outlier managment
 
DATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathaniaDATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathania
 
Lesson 16 Data Analysis Ii
Lesson 16 Data Analysis IiLesson 16 Data Analysis Ii
Lesson 16 Data Analysis Ii
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation Dependance Technique, Regression & Correlation
Dependance Technique, Regression & Correlation
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 
2.3 the simple regression model
2.3 the simple regression model2.3 the simple regression model
2.3 the simple regression model
 
Correlation
CorrelationCorrelation
Correlation
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
 
Relation Anaylsis
Relation AnaylsisRelation Anaylsis
Relation Anaylsis
 
Topic 15 correlation spss
Topic 15 correlation spssTopic 15 correlation spss
Topic 15 correlation spss
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Chapter 9 Regression
Chapter 9 RegressionChapter 9 Regression
Chapter 9 Regression
 
Applied statistics part 4
Applied statistics part  4Applied statistics part  4
Applied statistics part 4
 
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
Penalized Regressions with Different Tuning Parameter Choosing Criteria and t...
 
Multicollinearity
MulticollinearityMulticollinearity
Multicollinearity
 
Dependence Techniques
Dependence Techniques Dependence Techniques
Dependence Techniques
 
Ch8 Regression Revby Rao
Ch8 Regression Revby RaoCh8 Regression Revby Rao
Ch8 Regression Revby Rao
 

Similar to Regression and Classification Analysis

Lecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsLecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsDaria Bogdanova
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Dr Athar Khan
 
Econometrics project
Econometrics projectEconometrics project
Econometrics projectShubham Joon
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSOsama Yousaf
 
Statistical analysis
Statistical analysisStatistical analysis
Statistical analysishighlandn
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningIdanGalShohet
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaNisheet Mahajan
 
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisFuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisIRJET Journal
 
Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentDaria Bogdanova
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
statistical estimation
statistical estimationstatistical estimation
statistical estimationAmish Akbar
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in ResearchQasim Raza
 

Similar to Regression and Classification Analysis (20)

Lecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsLecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignments
 
X18136931 statistics ca2_updated
X18136931 statistics ca2_updatedX18136931 statistics ca2_updated
X18136931 statistics ca2_updated
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
 
Spss software
Spss softwareSpss software
Spss software
 
Statistical analysis
Statistical analysisStatistical analysis
Statistical analysis
 
Eviews forecasting
Eviews forecastingEviews forecasting
Eviews forecasting
 
Regression
RegressionRegression
Regression
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine Learning
 
X18145922 statistics ca2 final
X18145922   statistics ca2 finalX18145922   statistics ca2 final
X18145922 statistics ca2 final
 
Statistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way AnovaStatistics - Multiple Regression and Two Way Anova
Statistics - Multiple Regression and Two Way Anova
 
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease DiagnosisFuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
Fuzzy Regression Model for Knee Osteoarthritis Disease Diagnosis
 
Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignment
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
200994363
200994363200994363
200994363
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in Research
 

More from YashIyengar

Multiclass skin lesion classification with CNN and Transfer Learning
Multiclass skin lesion classification with CNN and Transfer LearningMulticlass skin lesion classification with CNN and Transfer Learning
Multiclass skin lesion classification with CNN and Transfer LearningYashIyengar
 
Research Proposal
Research ProposalResearch Proposal
Research ProposalYashIyengar
 
Big Data Analysis of Second hand Car Sales
Big Data Analysis of Second hand Car SalesBig Data Analysis of Second hand Car Sales
Big Data Analysis of Second hand Car SalesYashIyengar
 
Social Media Giant Facebook
Social Media Giant FacebookSocial Media Giant Facebook
Social Media Giant FacebookYashIyengar
 
MC Donald's Casestudy
MC Donald's CasestudyMC Donald's Casestudy
MC Donald's CasestudyYashIyengar
 
Pneumonia Detection using CNN
Pneumonia Detection using CNNPneumonia Detection using CNN
Pneumonia Detection using CNNYashIyengar
 
Performance Comparison of HBase and Cassandra
Performance Comparison of HBase and CassandraPerformance Comparison of HBase and Cassandra
Performance Comparison of HBase and CassandraYashIyengar
 
In depth Analysis of Suicide and its factors
In depth Analysis of Suicide and its factorsIn depth Analysis of Suicide and its factors
In depth Analysis of Suicide and its factorsYashIyengar
 

More from YashIyengar (9)

Master's Thesis
Master's ThesisMaster's Thesis
Master's Thesis
 
Multiclass skin lesion classification with CNN and Transfer Learning
Multiclass skin lesion classification with CNN and Transfer LearningMulticlass skin lesion classification with CNN and Transfer Learning
Multiclass skin lesion classification with CNN and Transfer Learning
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
Big Data Analysis of Second hand Car Sales
Big Data Analysis of Second hand Car SalesBig Data Analysis of Second hand Car Sales
Big Data Analysis of Second hand Car Sales
 
Social Media Giant Facebook
Social Media Giant FacebookSocial Media Giant Facebook
Social Media Giant Facebook
 
MC Donald's Casestudy
MC Donald's CasestudyMC Donald's Casestudy
MC Donald's Casestudy
 
Pneumonia Detection using CNN
Pneumonia Detection using CNNPneumonia Detection using CNN
Pneumonia Detection using CNN
 
Performance Comparison of HBase and Cassandra
Performance Comparison of HBase and CassandraPerformance Comparison of HBase and Cassandra
Performance Comparison of HBase and Cassandra
 
In depth Analysis of Suicide and its factors
In depth Analysis of Suicide and its factorsIn depth Analysis of Suicide and its factors
In depth Analysis of Suicide and its factors
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Regression and Classification Analysis

  • 1. National College of Ireland Project Submission Sheet – 2018/2019 Student Name: Yash Balaji Iyengar ……………………………………………………………………………………………………………… Student ID: X18124739 ……………………………………………………………………………………………………………… Programme: Msc Data Analytics Cohort B ……………………………………………………………… Year: 2019-2020 ……………………… Module: Statistics in Data Analytics ……………………………………………………………………………………………………………… Lecturer: Tony Delaney ……………………………………………………………………………………………………………… Submission Due Date: 7th April 2019 ……………………………………………………………………………………………………………… Project Title: Statistics Continuous Assessment 2 ………………………………………………………………………………………………………………
  • 2. Word Count: 1718……………………………………………………………………………………………………………… I hereby certify that the information contained in this (my submission) is information pertaining to research I conducted for this project. All information other than my own contribution will be fully referenced and listed in the relevant bibliography section at the rear of the project. ALL internet material must be referenced in the references section. Students are encouraged to use the Harvard Referencing Standard supplied by the Library. To use other author's written or electronic work is illegal (plagiarism) and may result in disciplinary action. Students may be required to undergo a viva (oral examination) if there is suspicion about the validity of their submitted work. Signature: ……………………………………………………………………………………………………………… Date: 7/04/2019 ………………………………………………………………………………………………………… PLEASE READ THE FOLLOWING INSTRUCTIONS: 1. Please attach a completed copy of this sheet to each project (including multiple copies). 2. Projects should be submitted to your Programme Coordinator. 3. You must ensure that you retain a HARD COPY of ALL projects, both for your own reference and in case a project is lost or mislaid. It is not sufficient to keep a copy on computer. Please do not bind projects or place in covers unless specifically requested. 4. You must ensure that all projects are submitted to your Programme Coordinator on or before the required submission date. Late submissions will incur penalties. 5. All projects must be submitted and passed in order to successfully complete the year. Any project/assignment not submitted will be marked as a fail.
  • 3. Office Use Only Signature: Date: Penalty Applied (if applicable): MULTIPLE LINEAR REGRESSION Introduction: In statistical analysis, Regression is a set of statistical processes which is used to understand the relationship between variables. Multiple Regression is used to predict the value of a variable when there are two or more other variables. The variable we predict is called the dependent variable and the variables which we use to predict the value are called the independent variables. Data Description: Datasets are downloaded from the following link: http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHS2_3070_cancer http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000011 http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aTOBACCO_0000000344 Three datasets have been merged to form one single dataset. The dataset consists of • Number of Deaths due to Cancer • Alcohol Consumption among adults in (Liters) • Current users of any tobacco products (rate of users) Here my dependent variable is "Number of Deaths due to Cancer" and my independent variables are "Alcohol Consumption among adults" and "Current users of any tobacco products". Here the model will try to predict the causes of death due to the independent variables. We have taken a total of 152 samples for this model. Assumptions: 1) Our dependent variable "Number of Deaths due to Cancer" is a continuous variable. It is a count of deaths and can be measured on a continuous scale. 2) There are two independent variables "Alcohol Consumption among adults" and "Current users of any tobacco products" both are continuous in nature. 3) Here we check for auto-correlation between the observations.
  • 4. The Durbin-Watson test checks for auto-correlation between the observations if the value is between 1.5 and 2.5, we can conclude that the observations are not auto-correlated. Our value is 1.993, therefore we can conclude that our observations are independent. Here the R square value gives us the strength of the model. The R square value is 12%. 4) We will now check for linearity between dependent and independent variables. From the above figure, we can see that the variables are linearly distributed and there are no outliers. 5) Lets now check for homoscedasticity.
  • 5. From the scatterplot we can see that there is no specific pattern the variable is plotted in except for a couple of outliers, so we can say that the variance of the data remains similar along the best fit line. This means our data is homoscedastic in nature. 6) Let us now check for multi-collinearity between the independent variables We can see from the above table that the dependent variable Deaths is correlated with Alcohol Consumption as the Pearson value is 0.353 which is above (0.3) but the correlation of Tobacco Consumption with Death is low as the Pearson Correlation value is 0.060 which is less than 0.3.
  • 6. We can see that multicollinearity does not exist between the two independent variables Tobacco Consumption and Alcohol Consumption as the Pearson Correlation value is 0.169 which is less than 0.70. 7) Let’s check our data for Normality From the above histogram, we can observe that the data is normally distributed except for one outlier. SPSS Output Interpretation:
  • 7. From the above table, we can observe that there152 samples. Means and standard deviations for all the variables is calculated. • From the above table we can check the significance value of each independent variable. The significance value should be less than 0.05 and it shows how much of an impact it has on the dependent variable. • Alcohol consumption has a significant impact on the number of Deaths but on the other hand Tobacco consumption has almost no impact on the deaths. • Also, the Unstandardized Coefficient column tells us about the slope of the best fit line. From the observed values we can draw the regression line. • The Tolerance explains collinearity and the that value should be above 0.1 to avoid multi-collinearity our value is 0.971 also the VIF value should be less than 10 our value is 1.029. From the above table we can interpret the following information: • Sum of Squares column shows that 27481.190 observations were predicted out of 220223.520. The significance value is 0. • Also, our model predicts 2 out of 151 degrees of freedom.
  • 8. Result: As the analysis has been conducted, we have obtained the regression equation as follows: Deaths = 130.646 + 0(Tobacco_Con) + 3.388(Alco_Con) Since the constant for the Tobacco_Con is 0 it means it does not contribute to predicting the cause of death. So, the equation becomes like this. Deaths = 130.646 + 3.388(Alco_Con) So, we see that the coefficient of an independent variable is the amount of change that occurs in the dependent variable. So, multiple regression analysis checks what effect does the independent variables have on predicting the dependent variable.
  • 9. Binary Logistic Regression Introduction: Logistic Regression is used to predict a dichotomous dependent variable with the help of one or more continuous or categorical variable. Data Description: Datasets are downloaded from the following link: http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHS2_3070_cancer http://data.un.org/Data.aspx?d=WHO&f=MEASURE_CODE%3aWHOSIS_000011 The data consists of two columns • Deaths (Factor of Yes/No) • Alcohol Consumption (Liters) "Deaths" column is the dichotomous dependent variable, I have coded Yes for 1 and No for 0 and "Alcohol Consumption" is the continuous independent variable. Here the model tries to predict if the death occurs due to Alcohol consumption. Assumptions: 1) The dependent variable should be dichotomous. “Deaths” is a dichotomous variable. 2) There should be at least one or more independent variable. Alcohol Consumption is our independent and continuous variable. 3) The sample size should be large. We have a dataset of 152 samples. 4) Since we have only one independent variable multi-collinearity won’t occur. 5) Let us check for Outliers in the Data. Case-wise listing was not produced since there are no outliers in the data.
  • 10. SPSS Output Interpretation: From the case processing summary, we can observe that all the samples have been processed, total number of samples is 152. Now there are two blocks of outputs. Block zero is the case where SPSS runs the model without providing independent variables. Let us interpret its results as follows: The block 0, classification table shows that the model predicts that the deaths do not occur for all the cases. This happens because independent variable is not provided to the model. Therefore, the model predicts only 54.6 % values as correct. In block 1, we see the table Omnibus Tests of Model Coefficients. Here the model is tested with the predictor variables. This is the goodness of fit test where the results are compared with block zero to check if the predictor variables have had an impact on the dependent
  • 11. variable. The significance value should be less than 0.05. Our table shows 0.013 which means the independent variable has an impact on the dependent variable. The Cox and Snell R square and Nagelkerke R square values both show the amount of variation the model has on the dependent variable. It means that 3.9% to 5.3% of the variation in the dependent variable is due to the model. This classification table belongs to block 1 and shows prediction that is 61.2% which is better than the block 0 prediction that is 54.6%. This happens because here the predictor variables have been included in the model processing. The Hosmer Lemeshow test is also used to check for goodness of fit. The significance value should be greater than 0.05. Our significance value is 0.06 which means our model is good.
  • 12. This table tells us how much the independent variable contributes to model prediction. The significance value should be less than 0.05. Significance value for Alcohol consumption is 0.015. B value is the constant, it gives the amount of effect it has on the dependent variable. Result: Based on the Logistic regression analysis we get the following regression equation: Deaths = 0.103 – 0.653(Alco_Con) Once we replace the independent variable that is Alcohol Consumption value in the above equation, we will get the probability. If the probability is higher than 0.5 then there is a chance that death might occur and if the probability is less than 0.5 then the death might not occur. References • SPSS survival manual by Julie Pallant third edition. • https://statistics.laerd.com/spss-tutorials/binomial-logistic-regression- using-spss-statistics.php • https://statistics.laerd.com/spss-tutorials/multiple-regression-using- spss-statistics.php