SlideShare a Scribd company logo
Factors influencing the Human Development Index (HDI) using Multiple linear regression ADITYA PANUGANTI 1202062944 Industrial Engineering Year of data: 2008 Source: UN Development Programme Database
Objective and Dataset description To find which of the following variables have an effect on the Human Development Index (HDI)
Fitting the full model without interaction terms The regression equation for full model is y = 0.0596 + 0.00440 LIF + 0.000007 GDP - 0.000748 GRO + 0.0158 SCH + 0.0080 GEN+ 0.0159 EXP - 0.000004 GNI + 0.000003 MAT - 0.000051 HOM - 0.000540 MOR+ 0.000176 LIT - 0.0185 DEP + 0.0023 CON1 - 0.0117 CON2 - 0.0100 CON3+ 0.00431 CON4 - 0.0268 CON5 Difficult to interpret the coefficients of the above regression equation. Hence standardized the regression coefficients using Unit Normal scaling
Fitting the full model after Standardization The regression equation is 	y = 0.684 + 0.0404 LIF + 0.100 GDP - 0.0117 GRO + 0.0408 SCH + 0.00136 GEN+ 0.0443 EXP - 0.0627 GNI + 0.00089 MAT - 0.00068 HOM - 0.0196 MOR+ 0.00259 LIT - 0.0185 DEP + 0.0023 CON1 - 0.0117 CON2 - 0.0100 CON3+ 0.00431 CON4 - 0.0268 CON5 Model Statistics: R-Sq = 98.5%   R-Sq(adj) = 98.2% Analysis of Variance (ANOVA) 	Source           DF      SS    MS      F          P 	Regression       17 2.21784  0.13046  325.49  0.000 	Residual Error  84     0.03367  0.00040 	Total                101     2.25150
Signs of Multicollinearity Inference from Variance Inflation Factor (VIFs): 	VIF of GDP = 560.116 and VIF of GNI = 533.109 (Indicating Severe Multicollinearity) 	VIF of EXP = 18.368 and VIF of GRO = 16.456 (just over 10; Indicating  Multicollinearity) Inference from Correlation matrix:   	     LIF     GDP     GRO     SCH     GEN     EXP     GNI     MAT 	GDP    0.595 	GRO    0.719   0.630 	SCH    0.603   0.553   0.776 	GEN   -0.677  -0.705  -0.758  -0.743 	EXP    0.692   0.636   0.956   0.774  -0.798 	GNI    0.584   0.999   0.618   0.539  -0.688   0.620 ,[object Object]
No change in R-sq and R-sq(adj) statistics before and after dropping the model R-Sq = 98.5%   R-Sq(adj) = 98.2% To  confirm Multicollinearity between EXP and GRO, did a further analysis using Principal Component Analysis. Found the condition number to be (Condition number = λmax/ λmin=7.8001/0.0327 = 238.53  >100, indicating moderate multicollinearity ,[object Object],[object Object]
Indicator Interactions Considered interaction terms of DEP and other numerical variables. 24 variables in all including all the interaction terms S = 0.0220704   R-Sq = 98.3%   R-Sq(adj) = 97.8%;  R-Sq(pred) = 96.80% Residual plots:
Outliers and Influential points
Other outliers in graph Fitting each of the datapoints 45, 50, 80 and checking if there is any changes in summary stats These points are not contributing to any leverage, nor being influential; except for the fact that they are outliers; also R-sq not changing much, therefore we are leaving them in the model.
Residual plots after taking off the outliers and influential points ,[object Object]
To confirm this, we have used box cox transformation which showed us that there is a need in the transformation on ‘y’,[object Object]
Residual plots after transformation Can find some outliers in the Normal probability plot
Outliers and Influential points
Residual plots after taking off the outliers and influential points No need for any transformation, Box-Cox suggests λ = 1
Variable selection and Model building
Fit the selected model Regression equation: 	y2= 0.476 - 0.0164 GEN + 0.0403 GRO + 0.0422 LIF + 0.0557 GDP + 0.0449 SCH - 0.0181 CON2 - 0.0388 MOR + 0.0523 GDP_D + 0.0289 CON5 + 0.0412 MOR_D - 0.0476 HOM_D Detected Multicollinearity using Principal component analysis condition number = 134.837 (>100, Moderate Multicollinearity) Linear dependency equation: 0.107GRO+0.337LIF+0.798MOR-0.467MOR_D (dependency between the variables in the equation) Using correlation matrix found that the variable MOR has large correlation with LIF and MOR_D. Dropping MOR removed multicollinearity from model (condition number = 39.04617 (<100, No multicollinearity)
Residual plots after dropping MOR ,[object Object]
No need for any transformation, Box-Cox suggests λ = 1,[object Object]
Model validation Considered 118 countries for modelling  102  Estimation data and 16  prediction data
Conclusion The reduced model has a better R-sq than the actual model and most of the variables are significant (low p-value) in the model. The following variables were found to be significant  Gender inequality index Combined gross enrolment Life expectancy at birth GDP Mean schooling years Countries in continent 2 GDP& intensity of deprivation Under 5 mortality rate& intensity of deprivation Homicide rate& intensity of deprivation
Possible improvements More datapoints Ridge regression to eliminate multicollinearity Robust regression – to add more weight to the datapoints and retain them in the model.

More Related Content

Viewers also liked

HUMAN DEVELOPMENT INDEX AND ITS MEASUREMENT
HUMAN DEVELOPMENT INDEX AND ITS MEASUREMENTHUMAN DEVELOPMENT INDEX AND ITS MEASUREMENT
HUMAN DEVELOPMENT INDEX AND ITS MEASUREMENT
arslan_bzu
 
Theories & factors affecting growth and development
Theories & factors affecting growth and developmentTheories & factors affecting growth and development
Theories & factors affecting growth and development
Aruna Naudasari
 
The Human Development Index
The Human Development IndexThe Human Development Index
The Human Development Index
tutor2u
 
The Cold War: Actions and Reactions
The Cold War: Actions and ReactionsThe Cold War: Actions and Reactions
The Cold War: Actions and Reactions
mspitt
 
Human development indicators
Human development indicatorsHuman development indicators
Human development indicators
Bijith VB
 
Factors That Affect Growth And Development
Factors That Affect Growth And DevelopmentFactors That Affect Growth And Development
Factors That Affect Growth And Development
lavadoods Masta
 
14 Development Definitions And Measuring Development
14 Development Definitions And Measuring Development14 Development Definitions And Measuring Development
14 Development Definitions And Measuring Development
Ecumene
 
Components of Human Development
Components of Human DevelopmentComponents of Human Development
Components of Human Development
Mypzi
 
Di indonesia
Di indonesiaDi indonesia
Di indonesia
Komalam Mariappan
 
эко предпринимательство. путь к успеху
эко предпринимательство. путь к успехуэко предпринимательство. путь к успеху
эко предпринимательство. путь к успеху
musorabolshenet
 
500 Уборок. Презентация для организаторов.
500 Уборок. Презентация для организаторов.500 Уборок. Презентация для организаторов.
500 Уборок. Презентация для организаторов.musorabolshenet
 
Radio Sua Voz
Radio Sua VozRadio Sua Voz
Radio Sua Voz
Daniel_Cajobi
 
Sugar creation preso corrected final.ver1
Sugar creation preso corrected final.ver1Sugar creation preso corrected final.ver1
Sugar creation preso corrected final.ver1
Salman Surgit
 
KIRIKU presents SOLUTION for C:F
KIRIKU presents SOLUTION for C:FKIRIKU presents SOLUTION for C:F
KIRIKU presents SOLUTION for C:F
dpereira7
 
2011-11-09 The State of Open Textbooks (Sloan-C Conference)
2011-11-09 The State of Open Textbooks (Sloan-C Conference)2011-11-09 The State of Open Textbooks (Sloan-C Conference)
2011-11-09 The State of Open Textbooks (Sloan-C Conference)
Nicole Allen
 
Presentazione progetto smm
Presentazione progetto smmPresentazione progetto smm
Presentazione progetto smmGeosnews.com
 
Road to warriors
Road to warriorsRoad to warriors
Road to warriors
Xing Liu
 
Es 08 pert final
Es 08 pert finalEs 08 pert final
Es 08 pert final
Tim Arroyo
 
Cv 2011
Cv 2011Cv 2011
Cv 2011
simonochieng
 

Viewers also liked (20)

HUMAN DEVELOPMENT INDEX AND ITS MEASUREMENT
HUMAN DEVELOPMENT INDEX AND ITS MEASUREMENTHUMAN DEVELOPMENT INDEX AND ITS MEASUREMENT
HUMAN DEVELOPMENT INDEX AND ITS MEASUREMENT
 
Theories & factors affecting growth and development
Theories & factors affecting growth and developmentTheories & factors affecting growth and development
Theories & factors affecting growth and development
 
The Human Development Index
The Human Development IndexThe Human Development Index
The Human Development Index
 
The Cold War: Actions and Reactions
The Cold War: Actions and ReactionsThe Cold War: Actions and Reactions
The Cold War: Actions and Reactions
 
Human development indicators
Human development indicatorsHuman development indicators
Human development indicators
 
Factors That Affect Growth And Development
Factors That Affect Growth And DevelopmentFactors That Affect Growth And Development
Factors That Affect Growth And Development
 
14 Development Definitions And Measuring Development
14 Development Definitions And Measuring Development14 Development Definitions And Measuring Development
14 Development Definitions And Measuring Development
 
Components of Human Development
Components of Human DevelopmentComponents of Human Development
Components of Human Development
 
Di indonesia
Di indonesiaDi indonesia
Di indonesia
 
эко предпринимательство. путь к успеху
эко предпринимательство. путь к успехуэко предпринимательство. путь к успеху
эко предпринимательство. путь к успеху
 
500 Уборок. Презентация для организаторов.
500 Уборок. Презентация для организаторов.500 Уборок. Презентация для организаторов.
500 Уборок. Презентация для организаторов.
 
Radio Sua Voz
Radio Sua VozRadio Sua Voz
Radio Sua Voz
 
Sugar creation preso corrected final.ver1
Sugar creation preso corrected final.ver1Sugar creation preso corrected final.ver1
Sugar creation preso corrected final.ver1
 
KIRIKU presents SOLUTION for C:F
KIRIKU presents SOLUTION for C:FKIRIKU presents SOLUTION for C:F
KIRIKU presents SOLUTION for C:F
 
2011-11-09 The State of Open Textbooks (Sloan-C Conference)
2011-11-09 The State of Open Textbooks (Sloan-C Conference)2011-11-09 The State of Open Textbooks (Sloan-C Conference)
2011-11-09 The State of Open Textbooks (Sloan-C Conference)
 
Presentazione progetto smm
Presentazione progetto smmPresentazione progetto smm
Presentazione progetto smm
 
Road to warriors
Road to warriorsRoad to warriors
Road to warriors
 
Es 08 pert final
Es 08 pert finalEs 08 pert final
Es 08 pert final
 
Ion Gaina
Ion GainaIon Gaina
Ion Gaina
 
Cv 2011
Cv 2011Cv 2011
Cv 2011
 

Similar to Factors influencing the Human Development Index (HDI) using Multiple Linear Regression

A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
aurkoiitk
 
Estimation of Import Regression for Canada
Estimation of Import Regression for CanadaEstimation of Import Regression for Canada
Estimation of Import Regression for Canada
Geray Gerayli
 
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docxInstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
dirkrplav
 
Trend analysis
Trend analysisTrend analysis
Trend analysis
Zahedul Islam
 
Ab data
Ab dataAb data
Housing Starts Forecast
Housing Starts ForecastHousing Starts Forecast
Housing Starts Forecast
JohnMonty15
 
Statistics project2
Statistics project2Statistics project2
Statistics project2
shri1984
 
Statistics assignment about data driven management science
Statistics assignment about data driven management scienceStatistics assignment about data driven management science
Statistics assignment about data driven management science
RahatulAshafeen
 
Statistics homework help
Statistics homework helpStatistics homework help
Statistics homework help
Expertsmind IT Education Pvt Ltd.
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
Rahul Rockers
 
Ch15
Ch15Ch15
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptx
iris765749
 
Pushover analysis of simply support concrete section beam subjected to increm...
Pushover analysis of simply support concrete section beam subjected to increm...Pushover analysis of simply support concrete section beam subjected to increm...
Pushover analysis of simply support concrete section beam subjected to increm...
Salar Delavar Qashqai
 
Econometrics Project
Econometrics ProjectEconometrics Project
Econometrics Project
Uday Tharar
 
Multiple Regression
Multiple RegressionMultiple Regression
Multiple Regression
Khawaja Naveed
 
A zero-adjusted gamma model for LGD
A zero-adjusted gamma model for LGDA zero-adjusted gamma model for LGD
A zero-adjusted gamma model for LGD
edwardtong
 
WCM PPT-1 for private limited - demo lokesh
WCM PPT-1 for private limited - demo lokeshWCM PPT-1 for private limited - demo lokesh
WCM PPT-1 for private limited - demo lokesh
Lokesh153390
 
Durbib- Watson D between 0-2 means there is a positive correlati
Durbib- Watson D between 0-2 means there is a positive correlatiDurbib- Watson D between 0-2 means there is a positive correlati
Durbib- Watson D between 0-2 means there is a positive correlati
AlyciaGold776
 
Design and Simulation of a Modified Architecture of Carry Save Adder
Design and Simulation of a Modified Architecture of Carry Save AdderDesign and Simulation of a Modified Architecture of Carry Save Adder
Design and Simulation of a Modified Architecture of Carry Save Adder
CSCJournals
 
Testing for normality
Testing for normalityTesting for normality
Testing for normality
Dr. Ankita Srivastava
 

Similar to Factors influencing the Human Development Index (HDI) using Multiple Linear Regression (20)

A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
 
Estimation of Import Regression for Canada
Estimation of Import Regression for CanadaEstimation of Import Regression for Canada
Estimation of Import Regression for Canada
 
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docxInstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docx
 
Trend analysis
Trend analysisTrend analysis
Trend analysis
 
Ab data
Ab dataAb data
Ab data
 
Housing Starts Forecast
Housing Starts ForecastHousing Starts Forecast
Housing Starts Forecast
 
Statistics project2
Statistics project2Statistics project2
Statistics project2
 
Statistics assignment about data driven management science
Statistics assignment about data driven management scienceStatistics assignment about data driven management science
Statistics assignment about data driven management science
 
Statistics homework help
Statistics homework helpStatistics homework help
Statistics homework help
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
 
Ch15
Ch15Ch15
Ch15
 
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptx
 
Pushover analysis of simply support concrete section beam subjected to increm...
Pushover analysis of simply support concrete section beam subjected to increm...Pushover analysis of simply support concrete section beam subjected to increm...
Pushover analysis of simply support concrete section beam subjected to increm...
 
Econometrics Project
Econometrics ProjectEconometrics Project
Econometrics Project
 
Multiple Regression
Multiple RegressionMultiple Regression
Multiple Regression
 
A zero-adjusted gamma model for LGD
A zero-adjusted gamma model for LGDA zero-adjusted gamma model for LGD
A zero-adjusted gamma model for LGD
 
WCM PPT-1 for private limited - demo lokesh
WCM PPT-1 for private limited - demo lokeshWCM PPT-1 for private limited - demo lokesh
WCM PPT-1 for private limited - demo lokesh
 
Durbib- Watson D between 0-2 means there is a positive correlati
Durbib- Watson D between 0-2 means there is a positive correlatiDurbib- Watson D between 0-2 means there is a positive correlati
Durbib- Watson D between 0-2 means there is a positive correlati
 
Design and Simulation of a Modified Architecture of Carry Save Adder
Design and Simulation of a Modified Architecture of Carry Save AdderDesign and Simulation of a Modified Architecture of Carry Save Adder
Design and Simulation of a Modified Architecture of Carry Save Adder
 
Testing for normality
Testing for normalityTesting for normality
Testing for normality
 

Factors influencing the Human Development Index (HDI) using Multiple Linear Regression

  • 1. Factors influencing the Human Development Index (HDI) using Multiple linear regression ADITYA PANUGANTI 1202062944 Industrial Engineering Year of data: 2008 Source: UN Development Programme Database
  • 2. Objective and Dataset description To find which of the following variables have an effect on the Human Development Index (HDI)
  • 3. Fitting the full model without interaction terms The regression equation for full model is y = 0.0596 + 0.00440 LIF + 0.000007 GDP - 0.000748 GRO + 0.0158 SCH + 0.0080 GEN+ 0.0159 EXP - 0.000004 GNI + 0.000003 MAT - 0.000051 HOM - 0.000540 MOR+ 0.000176 LIT - 0.0185 DEP + 0.0023 CON1 - 0.0117 CON2 - 0.0100 CON3+ 0.00431 CON4 - 0.0268 CON5 Difficult to interpret the coefficients of the above regression equation. Hence standardized the regression coefficients using Unit Normal scaling
  • 4. Fitting the full model after Standardization The regression equation is y = 0.684 + 0.0404 LIF + 0.100 GDP - 0.0117 GRO + 0.0408 SCH + 0.00136 GEN+ 0.0443 EXP - 0.0627 GNI + 0.00089 MAT - 0.00068 HOM - 0.0196 MOR+ 0.00259 LIT - 0.0185 DEP + 0.0023 CON1 - 0.0117 CON2 - 0.0100 CON3+ 0.00431 CON4 - 0.0268 CON5 Model Statistics: R-Sq = 98.5% R-Sq(adj) = 98.2% Analysis of Variance (ANOVA) Source DF SS MS F P Regression 17 2.21784 0.13046 325.49 0.000 Residual Error 84 0.03367 0.00040 Total 101 2.25150
  • 5.
  • 6.
  • 7. Indicator Interactions Considered interaction terms of DEP and other numerical variables. 24 variables in all including all the interaction terms S = 0.0220704 R-Sq = 98.3% R-Sq(adj) = 97.8%; R-Sq(pred) = 96.80% Residual plots:
  • 9. Other outliers in graph Fitting each of the datapoints 45, 50, 80 and checking if there is any changes in summary stats These points are not contributing to any leverage, nor being influential; except for the fact that they are outliers; also R-sq not changing much, therefore we are leaving them in the model.
  • 10.
  • 11.
  • 12. Residual plots after transformation Can find some outliers in the Normal probability plot
  • 14. Residual plots after taking off the outliers and influential points No need for any transformation, Box-Cox suggests λ = 1
  • 15. Variable selection and Model building
  • 16. Fit the selected model Regression equation: y2= 0.476 - 0.0164 GEN + 0.0403 GRO + 0.0422 LIF + 0.0557 GDP + 0.0449 SCH - 0.0181 CON2 - 0.0388 MOR + 0.0523 GDP_D + 0.0289 CON5 + 0.0412 MOR_D - 0.0476 HOM_D Detected Multicollinearity using Principal component analysis condition number = 134.837 (>100, Moderate Multicollinearity) Linear dependency equation: 0.107GRO+0.337LIF+0.798MOR-0.467MOR_D (dependency between the variables in the equation) Using correlation matrix found that the variable MOR has large correlation with LIF and MOR_D. Dropping MOR removed multicollinearity from model (condition number = 39.04617 (<100, No multicollinearity)
  • 17.
  • 18.
  • 19. Model validation Considered 118 countries for modelling 102  Estimation data and 16  prediction data
  • 20. Conclusion The reduced model has a better R-sq than the actual model and most of the variables are significant (low p-value) in the model. The following variables were found to be significant Gender inequality index Combined gross enrolment Life expectancy at birth GDP Mean schooling years Countries in continent 2 GDP& intensity of deprivation Under 5 mortality rate& intensity of deprivation Homicide rate& intensity of deprivation
  • 21. Possible improvements More datapoints Ridge regression to eliminate multicollinearity Robust regression – to add more weight to the datapoints and retain them in the model.