2. Example Data:
All 499 NM Census Tracts
American Community Survey
5-year file 2008-2012
3. Table Variable Covariate Name
DP02 HC01_VC138 PFB Percent Foreign-born population
DP02 HC03_VC94 PBH Percent with bachelor's degree or higher
DP03 HC03_VC128 PHI Percent with health insurance coverage
DP03 HC03_VC166 PBP Percent all families and people whose income in
the last 12 months is below the poverty level
DP03 HC01_VC85 MHI Median household income (dollars)
DP03 HC03_VC13 PUN Percent Unemployed
2008-2012 American Community Survey 5-Year Estiamtes
Example Data: Variables
4. Diagnostics for Spatial Autocorrelation:
Moran’s I (Global) & Geary’s C
• Queen’s case contiguity weights used throughout
• Moran’s I value of 0.22 is relatively large and positive, suggesting
positive spatial autocorrelation (similar areas are near similar areas).
• The Geary’s C value of 0.719 is less than one, again suggesting
positive spatial autocorrelation.
Moran I statistic Expectation Variance
0.224313204 -0.002008032 0.000727293
Geary C statistic Expectation Variance
0.719028491 1 0.001482037
5. Moran’I Plot
The positive slope in the line
suggests that high
unemployment values are
surrounded by high
unemployment values, and that
low unemployment values are
surrounded by low
unemployment values.
6. Simultaneous Auto-Regressive Model (SAR)
• Also known as a Spatial Error Model (SEM)
• The residuals for an area might be affected by residuals in
neighboring areas.
• 𝒀 = 𝑿𝜷 + 𝑼, 𝑼 = 𝝀𝑾𝑼 + 𝝐
• Here, 𝑃𝑈𝑁 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 + U
8. Simultaneous Auto-Regressive Model (SAR)
• Results:
Summaries Values
Lambda 0.053237653
LR test value 21.8437643
p-value 2.95776E-06
Numerical Hessian standard error of lambda 0.010688213
Log likelihood of spatial regression fit -1450.219342
Log likelihood of OLS fit y -1461.141224
ML residual variance (sigma squared) 19.21910134
AIC 2916.438683
Significant likelihood
ratio test value
11. Conditional Auto-Regressive Model (CAR)
• Also known as a Spatial Lag Model (SLM)
• Also known as a Spatial Auto-Regressive Model (SAR)
• The response values for an area might be affected by response values
in neighboring areas.
• 𝒀 = 𝑿𝜷 + 𝝀𝑾𝒀 + 𝝐
• Here, 𝑃𝑈𝑁 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 +
𝜆𝑊𝑃𝑈𝑁
13. Summaries Values
Lambda 0.096754472
LR test value 22.65191588
p-value 1.94167E-06
Numerical Hessian standard error of lambda 0.016037543
Log likelihood of spatial regression fit -1449.815266
Log likelihood of OLS fit y -1461.141224
ML residual variance (sigma squared) 18.85880446
AIC 2915.630532
Conditional Auto-Regressive Model (CAR)
• Results:
Significant likelihood
ratio test value
16. Kelejian-Prucha Model
• Combination of the Conditional Auto-Regressive Model and
Simultaneous Auto-Regressive Model.
• The response values for an area might be effected by response values
in neighboring areas. In addition, the residual values for an area might
be effected the residuals from neighboring areas.
• 𝒀 = 𝑿𝜷 + 𝝀𝑾𝒀 + 𝑼, 𝑼 = 𝝆𝑾𝑼 + 𝝐
• Here, 𝑃𝑈𝑁 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 +
𝜌𝑊𝑃𝑈𝑁 + 𝑈, 𝑈 = 𝜆𝑊𝑈 + 𝜖
18. Summaries Values
Lambda 0.096754472
LR test value 22.65191588
p-value 1.94167E-06
Numerical Hessian standard error of lambda 0.016037543
Log likelihood of spatial regression fit -1449.815266
Log likelihood of OLS fit y -1461.141224
ML residual variance (sigma squared) 18.85880446
AIC 2915.630532
Kelejian-Prucha Model
• Results:
Significant likelihood
ratio test value
22. Bayesian Hierarchical Model
𝑃𝑈𝑁𝑖|𝜇𝑖 ∼ 𝑁 𝜇𝑖, 𝜏
𝜇𝑖 = 𝛽 𝑜 + 𝛽1 𝑃𝐹𝐵 + 𝛽2 𝑃𝐵𝐻 + 𝛽3 𝑃𝐻𝐼 + 𝛽4 𝑃𝐵𝑃 + 𝛽5 𝑀𝐻𝐼 + 𝜃𝑖 + 𝜙𝑖
𝜃𝑖 are the hierarchical Census tract-level effect
𝜙𝑖 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑠𝑝𝑎𝑡𝑖𝑎𝑙 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑖𝑛𝑔 𝑒𝑓𝑓𝑒𝑐𝑡
Added an 𝛼 =
𝑠𝑑 𝜏 𝑐
𝑠𝑑 𝜏 𝑐 +𝑠𝑑 𝜏ℎ
term
23. Bayesian Hierarchical Model
• Ran 2,010,000 iterations
• The autocorrelations looked bad. Most looked like this:
• So I set thin to 1,000
• There didn’t look like there needed to be a burn-in period
alpha
lag
0 50
autocorrelation
-1.00.01.0
24. Bayesian Hierarchical Model
• After thinning auto-correlations look good. This was the worst:
alpha
lag
0 50
autocorrelation
-1.00.01.0
25. Bayesian Hierarchical Model
The estimates of the
coefficients resemble
those from the other
models. The percentiles
are all close to zero
with the exception of %
below poverty, which
resembles output from
the K-P model.
mean sd MC_error val2.5pc median val97.5pc start sample
(intercept) 16.6 2.416 0.1302 11.98 16.57 21.5 1 2010
PFB -2.924 1.008 0.03448 -4.933 -2.943 -0.9007 1 2010
PBH -0.05336 0.02364 7.66E-04 -0.1004 -0.05351 -0.00592 1 2010
PHI -0.1174 0.02802 0.001481 -0.1733 -0.1171 -0.05995 1 2010
PBP 0.1858 0.02868 0.00126 0.1325 0.1853 0.2441 1 2010
MHI 1.35E-05 1.96E-05 8.16E-07 -2.47E-05 1.35E-05 5.16E-05 1 2010
sd.c 3.177 0.2925 0.01048 2.571 3.193 3.692 1 2010
sd.h 1.945 1.341 0.09144 0.03881 2.415 3.62 1 2010
tau 58.25 226.3 8.05 0.07692 0.2208 680.7 1 2010
tau.c 0.03751 0.01082 3.88E-04 0.02289 0.03528 0.06281 1 2010
tau.h 56.71 230 8.661 0.07441 0.1695 671.9 1 2010
alpha 0.6693 0.1985 0.01349 0.4283 0.5742 0.9876 1 2010
WinBUGS Stats
26. Little Bit extra
• I use ArcGIS on a regular basis, it would be nice to have R libraries
accessible.
• ArcGIS supports Python scripting with the arcpy module.
• Python has a module (rpy2) to execute R code.
• I used Python to run all of my code, implementing both R and ArcGIS
in one place.