Modeling Unemployment Spatial Data in R

Unemployment Areal
Data Modeling
Nate Crouse
STAT 579

Example Data:
All 499 NM Census Tracts
American Community Survey
5-year file 2008-2012

Table Variable Covariate Name
DP02 HC01_VC138 PFB Percent Foreign-born population
DP02 HC03_VC94 PBH Percent with bachelor's degree or higher
DP03 HC03_VC128 PHI Percent with health insurance coverage
DP03 HC03_VC166 PBP Percent all families and people whose income in
the last 12 months is below the poverty level
DP03 HC01_VC85 MHI Median household income (dollars)
DP03 HC03_VC13 PUN Percent Unemployed
2008-2012 American Community Survey 5-Year Estiamtes
Example Data: Variables

Diagnostics for Spatial Autocorrelation:
Moran’s I (Global) & Geary’s C
• Queen’s case contiguity weights used throughout
• Moran’s I value of 0.22 is relatively large and positive, suggesting
positive spatial autocorrelation (similar areas are near similar areas).
• The Geary’s C value of 0.719 is less than one, again suggesting
positive spatial autocorrelation.
Moran I statistic Expectation Variance
0.224313204 -0.002008032 0.000727293
Geary C statistic Expectation Variance
0.719028491 1 0.001482037

Moran’I Plot
The positive slope in the line
suggests that high
unemployment values are
surrounded by high
unemployment values, and that
low unemployment values are
surrounded by low
unemployment values.

Simultaneous Auto-Regressive Model (SAR)
• Also known as a Spatial Error Model (SEM)
• The residuals for an area might be affected by residuals in
neighboring areas.
• 𝒀 = 𝑿𝜷 + 𝑼, 𝑼 = 𝝀𝑾𝑼 + 𝝐
• Here, 𝑃𝑈𝑁 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 + U

• Coefficient Results:
Coef Estimate Std. Error z value Pr(>|z|)
(Intercept) 15.9635698716377 2.29041910369421 6.96971564980841 3.17590398424272e-12
PFB -2.02732578978909 0.885500277621368 -2.28946940054598 0.0220520938320465
PBH -0.0337436866790786 0.0222016332845491 -1.51987406721838 0.12854262926175
PHI -0.107084711089037 0.0259939510806068 -4.11960116247697 3.79528769129944e-05
PBP 0.179038274578327 0.0279026253098457 6.41653868014891 1.39407596577712e-10
MHI -9.20557385226511e-07 1.90081538690724e-05 -0.0484296050824967 0.961373865536281
Simultaneous Auto-Regressive Coefficients
Significant
values

• Results:
Summaries Values
Lambda 0.053237653
LR test value 21.8437643
p-value 2.95776E-06
Numerical Hessian standard error of lambda 0.010688213
Log likelihood of spatial regression fit -1450.219342
Log likelihood of OLS fit y -1461.141224
ML residual variance (sigma squared) 19.21910134
AIC 2916.438683
Significant likelihood
ratio test value

Simultaneous
Auto-Regressive
Model (SAR)
•Fitted Values

Simultaneous
Auto-Regressive
Model (SAR)
•Residuals

Conditional Auto-Regressive Model (CAR)
• Also known as a Spatial Lag Model (SLM)
• Also known as a Spatial Auto-Regressive Model (SAR)
• The response values for an area might be affected by response values
in neighboring areas.
• 𝒀 = 𝑿𝜷 + 𝝀𝑾𝒀 + 𝝐
• Here, 𝑃𝑈𝑁 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 +
𝜆𝑊𝑃𝑈𝑁

Coef Estimate Std. Error z value Pr(>|z|)
(Intercept) 16.0348462652796 2.28821371923029 7.00758243450853 2.42472708578134e-12
PFB -2.02106503254527 0.88861850958791 -2.27438997808241 0.022942549481165
PBH -0.0346381487009861 0.0225380455563367 -1.53687455349239 0.124324032016593
PHI -0.106859132302071 0.0260480825623346 -4.10237997543009 4.08921993686473e-05
PBP 0.17801730487754 0.0279010503819254 6.38030835544676 1.76731740353375e-10
MHI -1.70454040261034e-06 1.91435701286846e-05 -0.0890398390243973 0.92905024887799
Conditional Auto-Regressive Coefficients
Significant
values

Summaries Values
Lambda 0.096754472
p-value 1.94167E-06
AIC 2915.630532
• Results:
ratio test value

Conditional
Auto-Regressive
Model (CAR)
•Fitted Values

Conditional
Auto-Regressive
Model (CAR)
•Residuals

Kelejian-Prucha Model
• Combination of the Conditional Auto-Regressive Model and
Simultaneous Auto-Regressive Model.
• The response values for an area might be effected by response values
in neighboring areas. In addition, the residual values for an area might
be effected the residuals from neighboring areas.
• 𝒀 = 𝑿𝜷 + 𝝀𝑾𝒀 + 𝑼, 𝑼 = 𝝆𝑾𝑼 + 𝝐
• Here, 𝑃𝑈𝑁 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 +
𝜌𝑊𝑃𝑈𝑁 + 𝑈, 𝑈 = 𝜆𝑊𝑈 + 𝜖

Coef Estimate Std. Error t-value Pr(>|t|)
(Intercept) 15.8122543384093 6.99601002284485 2.26018177315009 0.0238099716099654
PFB -1.91413326885597 1.09399953435159 -1.74966552430068 0.0801760459542833
PBH -0.0360085981199082 0.0247849208594767 -1.45284297351872 0.146267357298311
PHI -0.0991376121332399 0.0691860134985431 -1.43291406919012 0.15188239607163
PBP 0.182615679894103 0.0581323064593756 3.14138025852664 0.0016815355282694
MHI -1.97936356793019e-06 1.98205252941203e-05 -0.0998643344996188 0.920452031708173
lambda -0.011382527071069 0.00995577364684958 -1.14330914651429 0.252910258786229
rho 0.0713149246212653 0.0131636728143204 5.41755523911866 6.04194857914954e-08
Kelejian-Prucha Model Coefficients
• 𝒀 = 𝑿𝜷 + 𝝀𝑾𝒀 + 𝑼, 𝑼 = 𝝆𝑾𝑼 + 𝝐
Significant
values

Summaries Values
Lambda 0.096754472
p-value 1.94167E-06
AIC 2915.630532
• Results:
ratio test value

Kelejian-Prucha
Model
•Fitted Values

Kelejian-Prucha
Model
•Residuals

Bayesian Hierarchical Model
𝑃𝑈𝑁𝑖|𝜇𝑖 ∼ 𝑁 𝜇𝑖, 𝜏
𝜇𝑖 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 + 𝜃𝑖 + 𝜙𝑖
𝜏~ Γ 0.001, 0.001
𝜃𝑖~𝑁 0, 𝜏ℎ
𝜏ℎ ∼ Γ 0.001, 0.001
𝜙𝑖~𝐶𝐴𝑅 𝜏 𝑐
𝜏 𝑐~Γ 0.001, 0.001

𝑃𝑈𝑁𝑖|𝜇𝑖 ∼ 𝑁 𝜇𝑖, 𝜏
𝜇𝑖 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 + 𝜃𝑖 + 𝜙𝑖
𝜃𝑖 are the hierarchical Census tract-level effect
𝜙𝑖 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑠𝑝𝑎𝑡𝑖𝑎𝑙 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑖𝑛𝑔 𝑒𝑓𝑓𝑒𝑐𝑡
Added an 𝛼 =
𝑠𝑑 𝜏 𝑐
𝑠𝑑 𝜏 𝑐 +𝑠𝑑 𝜏ℎ
term

• Ran 2,010,000 iterations
• The autocorrelations looked bad. Most looked like this:
• So I set thin to 1,000
• There didn’t look like there needed to be a burn-in period
alpha
lag
0 50
autocorrelation
-1.00.01.0

• After thinning auto-correlations look good. This was the worst:
alpha
lag
0 50
autocorrelation
-1.00.01.0

The estimates of the
coefficients resemble
those from the other
models. The percentiles
are all close to zero
with the exception of %
below poverty, which
resembles output from
the K-P model.
mean sd MC_error val2.5pc median val97.5pc start sample
(intercept) 16.6 2.416 0.1302 11.98 16.57 21.5 1 2010
PFB -2.924 1.008 0.03448 -4.933 -2.943 -0.9007 1 2010
PBH -0.05336 0.02364 7.66E-04 -0.1004 -0.05351 -0.00592 1 2010
PHI -0.1174 0.02802 0.001481 -0.1733 -0.1171 -0.05995 1 2010
PBP 0.1858 0.02868 0.00126 0.1325 0.1853 0.2441 1 2010
MHI 1.35E-05 1.96E-05 8.16E-07 -2.47E-05 1.35E-05 5.16E-05 1 2010
sd.c 3.177 0.2925 0.01048 2.571 3.193 3.692 1 2010
sd.h 1.945 1.341 0.09144 0.03881 2.415 3.62 1 2010
tau 58.25 226.3 8.05 0.07692 0.2208 680.7 1 2010
tau.c 0.03751 0.01082 3.88E-04 0.02289 0.03528 0.06281 1 2010
tau.h 56.71 230 8.661 0.07441 0.1695 671.9 1 2010
alpha 0.6693 0.1985 0.01349 0.4283 0.5742 0.9876 1 2010
WinBUGS Stats

Little Bit extra
• I use ArcGIS on a regular basis, it would be nice to have R libraries
accessible.
• ArcGIS supports Python scripting with the arcpy module.
• Python has a module (rpy2) to execute R code.
• I used Python to run all of my code, implementing both R and ArcGIS
in one place.

Example: Simultaneous Autoregressive Model
(SAR)

Example Data: R output added to a shapefile

Modeling Unemployment Spatial Data in R

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Modeling Unemployment Spatial Data in R

Similar to Modeling Unemployment Spatial Data in R (20)

Modeling Unemployment Spatial Data in R