This document summarizes the results of statistical spatial modeling of unemployment rate data from 499 New Mexico census tracts. Several spatial regression models are fit to the data, including simultaneous autoregressive (SAR), conditional autoregressive (CAR), Kelejian-Prucha, and Bayesian hierarchical models. Diagnostic tests show significant positive spatial autocorrelation. The Bayesian hierarchical model incorporates both a tract-level random effect and spatial clustering effect. Model coefficients and fit are examined and residuals show good fit. The document demonstrates spatial modeling and diagnostics in R for unemployment data.
This demo was featured at the Grace Hopper Conference in October 2016. Members of the Cisco IoT Engineering team designed, sourced, and built the demo from concept to finish.
Application of Multivariate Regression Analysis and Analysis of VarianceKalaivanan Murthy
Β
The work is done as part of graduate coursework at University of Florida. The author studied master's in environmental engineering sciences during the making of the presentation.
As part of the OESON Data Science internship program OGTIP Oeson, I completed my first project. The goal of the project was to conduct a statistical analysis of the stock values of three well-known companies using Advanced Excel. I used descriptive statistics to analyze the data, created charts to visualize the trends and built regression models for each company.
When fitting loss data (insurance) to a distribution, often the parameters that provide a good overall fit will understate the density in the tail.
This method allows one to split the distribution into 2 portions, and use a Pareto distribution to fit the tail.
Presented at the CAS Spring Meeting in Seattle, May 2016.
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docxdirkrplav
Β
Instructions:
View CAAE Stormwater video "Too Big for Our Ditches"
http://www.ncsu.edu/wq/videos/stormwater%20video/SWvideo.html
Explain how impermeable surfaces in the urban environment impact the stream network in a river basin. Why is watershed management an important consideration in urban planning? Unload you essay (200-400 words).
Neal.LarryBUS457A7.docx
Question 1
Problem:
It is not certain about the relationship between age, Y, as a function of systolic blood pressure.
Goal:
To establish the relationship between age Y, as a function of systolic blood pressure.
Finding/Conclusion:
Based on the available data, the relationship is obtained and shown below:
Regression Analysis: Age versus SBP
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 2933 2933.1 21.33 0.000
SBP 1 2933 2933.1 21.33 0.000
Error 28 3850 137.5
Lack-of-Fit 21 2849 135.7 0.95 0.575
Pure Error 7 1002 143.1
Total 29 6783
Model Summary
S R-sq R-sq(adj) R-sq(pred)
11.7265 43.24% 41.21% 3.85%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant -18.3 13.9 -1.32 0.198
SBP 0.4454 0.0964 4.62 0.000 1.00
Regression Equation
Age = -18.3 +Β 0.4454Β SBP
It is found that there is an outlier in the dataset, which significantly affect the regression equation. As a result, the outlier is removed, and the regression analysis is run again.
Regression Analysis: Age versus SBP
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 4828.5 4828.47 66.81 0.000
SBP 1 4828.5 4828.47 66.81 0.000
Error 27 1951.4 72.27
Lack-of-Fit 20 949.9 47.49 0.33 0.975
Pure Error 7 1001.5 143.07
Total 28 6779.9
Model Summary
S R-sq R-sq(adj) R-sq(pred)
8.50139 71.22% 70.15% 66.89%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant -59.9 12.9 -4.63 0.000
SBP 0.7502 0.0918 8.17 0.000 1.00
Regression Equation
Age = -59.9 +Β 0.7502Β SBP
The p-value for the model is 0.000, which implies that the model is significant in the prediction of Age. The R-square of the model is 70.2%, implies that 70.2% of variation in age can be explained by the model
Recommendation:
The regression model Age = -59.9 +0.7502 SBP can be used to predict the Age, such that over 70% of variation in Age can be explained by the model.
Question 2
Problem:
It is not sure that whether the factors X1 to X4 which represents four different success factors have any influences on the annual savings as a result of CRM implementation.
Goal:
To determine which of the success factors are most significant in the prediction of a successful CRM program, and develop the corresponding model for the prediction of CRM savings.
Finding/Conclusion:
Based on the available da.
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...aurkoiitk
Β
The objective of this study
was to develop an economic indicator system for the US
economy that will help to forecast the turning points in the
aggregate level of economic activity. Our primary concern
is to study the short run relationship between the major
economic indicators of US economy (eg: GDP, Money
Supply, Unemployment Rate, Inflation rate, Federal Fund
Rate, Exchange Rate, Government Expenditure &
Receipt, Crude Oil Price, Net Import & Export).
This demo was featured at the Grace Hopper Conference in October 2016. Members of the Cisco IoT Engineering team designed, sourced, and built the demo from concept to finish.
Application of Multivariate Regression Analysis and Analysis of VarianceKalaivanan Murthy
Β
The work is done as part of graduate coursework at University of Florida. The author studied master's in environmental engineering sciences during the making of the presentation.
As part of the OESON Data Science internship program OGTIP Oeson, I completed my first project. The goal of the project was to conduct a statistical analysis of the stock values of three well-known companies using Advanced Excel. I used descriptive statistics to analyze the data, created charts to visualize the trends and built regression models for each company.
When fitting loss data (insurance) to a distribution, often the parameters that provide a good overall fit will understate the density in the tail.
This method allows one to split the distribution into 2 portions, and use a Pareto distribution to fit the tail.
Presented at the CAS Spring Meeting in Seattle, May 2016.
InstructionsView CAAE Stormwater video Too Big for Our Ditches.docxdirkrplav
Β
Instructions:
View CAAE Stormwater video "Too Big for Our Ditches"
http://www.ncsu.edu/wq/videos/stormwater%20video/SWvideo.html
Explain how impermeable surfaces in the urban environment impact the stream network in a river basin. Why is watershed management an important consideration in urban planning? Unload you essay (200-400 words).
Neal.LarryBUS457A7.docx
Question 1
Problem:
It is not certain about the relationship between age, Y, as a function of systolic blood pressure.
Goal:
To establish the relationship between age Y, as a function of systolic blood pressure.
Finding/Conclusion:
Based on the available data, the relationship is obtained and shown below:
Regression Analysis: Age versus SBP
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 2933 2933.1 21.33 0.000
SBP 1 2933 2933.1 21.33 0.000
Error 28 3850 137.5
Lack-of-Fit 21 2849 135.7 0.95 0.575
Pure Error 7 1002 143.1
Total 29 6783
Model Summary
S R-sq R-sq(adj) R-sq(pred)
11.7265 43.24% 41.21% 3.85%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant -18.3 13.9 -1.32 0.198
SBP 0.4454 0.0964 4.62 0.000 1.00
Regression Equation
Age = -18.3 +Β 0.4454Β SBP
It is found that there is an outlier in the dataset, which significantly affect the regression equation. As a result, the outlier is removed, and the regression analysis is run again.
Regression Analysis: Age versus SBP
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 4828.5 4828.47 66.81 0.000
SBP 1 4828.5 4828.47 66.81 0.000
Error 27 1951.4 72.27
Lack-of-Fit 20 949.9 47.49 0.33 0.975
Pure Error 7 1001.5 143.07
Total 28 6779.9
Model Summary
S R-sq R-sq(adj) R-sq(pred)
8.50139 71.22% 70.15% 66.89%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant -59.9 12.9 -4.63 0.000
SBP 0.7502 0.0918 8.17 0.000 1.00
Regression Equation
Age = -59.9 +Β 0.7502Β SBP
The p-value for the model is 0.000, which implies that the model is significant in the prediction of Age. The R-square of the model is 70.2%, implies that 70.2% of variation in age can be explained by the model
Recommendation:
The regression model Age = -59.9 +0.7502 SBP can be used to predict the Age, such that over 70% of variation in Age can be explained by the model.
Question 2
Problem:
It is not sure that whether the factors X1 to X4 which represents four different success factors have any influences on the annual savings as a result of CRM implementation.
Goal:
To determine which of the success factors are most significant in the prediction of a successful CRM program, and develop the corresponding model for the prediction of CRM savings.
Finding/Conclusion:
Based on the available da.
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...aurkoiitk
Β
The objective of this study
was to develop an economic indicator system for the US
economy that will help to forecast the turning points in the
aggregate level of economic activity. Our primary concern
is to study the short run relationship between the major
economic indicators of US economy (eg: GDP, Money
Supply, Unemployment Rate, Inflation rate, Federal Fund
Rate, Exchange Rate, Government Expenditure &
Receipt, Crude Oil Price, Net Import & Export).
Marketing Research - Hypothetical Work-Life Balance App, presented at XIMBSomak Ghosh
Β
Presented at XIMB, this presentation talks about the development of a hypothetical app that balances your work and life. It takes into account the responses by the students at XIMB and then based on analyses on SPSS, a marketing strategy is developed to leverage on the target groups through appropriate positioning. Linear Regression, Factor, Cluster, Chi square and Conjoint Analyses have been used to identify the target segments based on attitudinal and demographic segmentation and the factors that influence their preferences.
Enjoy! (:
2. Example Data:
All 499 NM Census Tracts
American Community Survey
5-year file 2008-2012
3. Table Variable Covariate Name
DP02 HC01_VC138 PFB Percent Foreign-born population
DP02 HC03_VC94 PBH Percent with bachelor's degree or higher
DP03 HC03_VC128 PHI Percent with health insurance coverage
DP03 HC03_VC166 PBP Percent all families and people whose income in
the last 12 months is below the poverty level
DP03 HC01_VC85 MHI Median household income (dollars)
DP03 HC03_VC13 PUN Percent Unemployed
2008-2012 American Community Survey 5-Year Estiamtes
Example Data: Variables
4. Diagnostics for Spatial Autocorrelation:
Moranβs I (Global) & Gearyβs C
β’ Queenβs case contiguity weights used throughout
β’ Moranβs I value of 0.22 is relatively large and positive, suggesting
positive spatial autocorrelation (similar areas are near similar areas).
β’ The Gearyβs C value of 0.719 is less than one, again suggesting
positive spatial autocorrelation.
Moran I statistic Expectation Variance
0.224313204 -0.002008032 0.000727293
Geary C statistic Expectation Variance
0.719028491 1 0.001482037
5. MoranβI Plot
The positive slope in the line
suggests that high
unemployment values are
surrounded by high
unemployment values, and that
low unemployment values are
surrounded by low
unemployment values.
6. Simultaneous Auto-Regressive Model (SAR)
β’ Also known as a Spatial Error Model (SEM)
β’ The residuals for an area might be affected by residuals in
neighboring areas.
β’ π = πΏπ· + πΌ, πΌ = ππΎπΌ + π
β’ Here, πππ = π½ π + π½1 ππΉπ΅ + π½2 ππ΅π» + π½3 ππ»πΌ + π½4 ππ΅π + π½5 ππ»πΌ + U
8. Simultaneous Auto-Regressive Model (SAR)
β’ Results:
Summaries Values
Lambda 0.053237653
LR test value 21.8437643
p-value 2.95776E-06
Numerical Hessian standard error of lambda 0.010688213
Log likelihood of spatial regression fit -1450.219342
Log likelihood of OLS fit y -1461.141224
ML residual variance (sigma squared) 19.21910134
AIC 2916.438683
Significant likelihood
ratio test value
11. Conditional Auto-Regressive Model (CAR)
β’ Also known as a Spatial Lag Model (SLM)
β’ Also known as a Spatial Auto-Regressive Model (SAR)
β’ The response values for an area might be affected by response values
in neighboring areas.
β’ π = πΏπ· + ππΎπ + π
β’ Here, πππ = π½ π + π½1 ππΉπ΅ + π½2 ππ΅π» + π½3 ππ»πΌ + π½4 ππ΅π + π½5 ππ»πΌ +
πππππ
13. Summaries Values
Lambda 0.096754472
LR test value 22.65191588
p-value 1.94167E-06
Numerical Hessian standard error of lambda 0.016037543
Log likelihood of spatial regression fit -1449.815266
Log likelihood of OLS fit y -1461.141224
ML residual variance (sigma squared) 18.85880446
AIC 2915.630532
Conditional Auto-Regressive Model (CAR)
β’ Results:
Significant likelihood
ratio test value
16. Kelejian-Prucha Model
β’ Combination of the Conditional Auto-Regressive Model and
Simultaneous Auto-Regressive Model.
β’ The response values for an area might be effected by response values
in neighboring areas. In addition, the residual values for an area might
be effected the residuals from neighboring areas.
β’ π = πΏπ· + ππΎπ + πΌ, πΌ = ππΎπΌ + π
β’ Here, πππ = π½ π + π½1 ππΉπ΅ + π½2 ππ΅π» + π½3 ππ»πΌ + π½4 ππ΅π + π½5 ππ»πΌ +
πππππ + π, π = πππ + π
18. Summaries Values
Lambda 0.096754472
LR test value 22.65191588
p-value 1.94167E-06
Numerical Hessian standard error of lambda 0.016037543
Log likelihood of spatial regression fit -1449.815266
Log likelihood of OLS fit y -1461.141224
ML residual variance (sigma squared) 18.85880446
AIC 2915.630532
Kelejian-Prucha Model
β’ Results:
Significant likelihood
ratio test value
23. Bayesian Hierarchical Model
β’ Ran 2,010,000 iterations
β’ The autocorrelations looked bad. Most looked like this:
β’ So I set thin to 1,000
β’ There didnβt look like there needed to be a burn-in period
alpha
lag
0 50
autocorrelation
-1.00.01.0
24. Bayesian Hierarchical Model
β’ After thinning auto-correlations look good. This was the worst:
alpha
lag
0 50
autocorrelation
-1.00.01.0
25. Bayesian Hierarchical Model
The estimates of the
coefficients resemble
those from the other
models. The percentiles
are all close to zero
with the exception of %
below poverty, which
resembles output from
the K-P model.
mean sd MC_error val2.5pc median val97.5pc start sample
(intercept) 16.6 2.416 0.1302 11.98 16.57 21.5 1 2010
PFB -2.924 1.008 0.03448 -4.933 -2.943 -0.9007 1 2010
PBH -0.05336 0.02364 7.66E-04 -0.1004 -0.05351 -0.00592 1 2010
PHI -0.1174 0.02802 0.001481 -0.1733 -0.1171 -0.05995 1 2010
PBP 0.1858 0.02868 0.00126 0.1325 0.1853 0.2441 1 2010
MHI 1.35E-05 1.96E-05 8.16E-07 -2.47E-05 1.35E-05 5.16E-05 1 2010
sd.c 3.177 0.2925 0.01048 2.571 3.193 3.692 1 2010
sd.h 1.945 1.341 0.09144 0.03881 2.415 3.62 1 2010
tau 58.25 226.3 8.05 0.07692 0.2208 680.7 1 2010
tau.c 0.03751 0.01082 3.88E-04 0.02289 0.03528 0.06281 1 2010
tau.h 56.71 230 8.661 0.07441 0.1695 671.9 1 2010
alpha 0.6693 0.1985 0.01349 0.4283 0.5742 0.9876 1 2010
WinBUGS Stats
26. Little Bit extra
β’ I use ArcGIS on a regular basis, it would be nice to have R libraries
accessible.
β’ ArcGIS supports Python scripting with the arcpy module.
β’ Python has a module (rpy2) to execute R code.
β’ I used Python to run all of my code, implementing both R and ArcGIS
in one place.