5. Census 2.0
Methodology
Legend
Operation
Data
Web Feeds
Database
consolidation
and verification
Census 2.0
Database
Address
Matching
Census 2.0
Web Application
Geocoded Points
Spatial
Join
Census County
/ Tracts
Census 2.0
County / Tracts
Surnames
Texas Zip Codes
People Search
Engine
Population
Difference
Chow, T. E., Y. Lin, and W. D. Chan. 2011, The Development of a Web-based Demographic
Data Extraction Tool for Population Monitoring, Transactions in GIS, 15(4): 479-494. 5
6. Census 2.0
Record linkage (i.e. duplicate removal)
Name: first, M.I., last
Address, DOB, Phone…
Tobler’s law?
Zipf’s law?
d ↓ similarity ↑
f ↑ rank ↓ (i.e. higher ranking)
6
1st person? 2nd person?
3rd person?
4th person?
x 2
7. Research Questions
Given records with identical names,
Are near records more likely to be the
same person than distant records?
Do records with frequent address more
likely to be the most recent update than
infrequent address?
Is local migration more frequent than
distant migration?
7
9. Methodology
Data collection
2009*
2010
2012
VA: 210,913
(Census 2010)
WhitePages Addresses Zabasearch Total
Raw 74,733 82,214 100,187 257,134
(29.06%) (31.97%) (38.96%)
Valid 53,313 61,259 80,089 194,861
(27.46%) (31.44%) (41.10%)
* Chow, T. E., Y. Lin, N.T. Huynh, and J. Davis, 2012. Using Web Demographics to Model Population
Change of Vietnamese-Americans in Texas between 2000-2009, GeoJournal. 77(1): 119-134.
9
10. Methodology
Labeling migration
Record linkage
First name, last name + middle initial
Same vs different addresses
Auxiliary data (e.g. last update)
10
13. Validation
Methodology
Name and/or address matching
Frequency
13
From Address To Address
Yes
No
Address Matching
Name Matching*
* Footnote:
Yes = perfect match
Yes? = Same last name but with slight deviation
No? = Different but Vietnamese last name
To/From Address?
15. Results
Are near records more likely to be the
same person than distant records?
15
Distance of Old/New Addresses of
25
20
15
10
5
0
the Same Person
Different Person(s) with the Same Name
25
20
15
Distance of Addresses between
0-10 11-20 21-30 31-40 41-50
Frequency
10
Distance (km)
5
0
0-50 51-100 101-150 151-200 201-250 251-300 301-350
Frequency
Distance (km)
16. Are near records more likely to be the
same person than distant records?
H1: D same person = D different person
p < 0.01
Results
16
350.000
300.000
250.000
200.000
150.000
100.000
50.000
0.000
Same Person Different Person
Distance (km)
Paired Distance of Addresses with the Same
Name
17. Results
Do records with frequent address more
likely to be more up-to-date than infrequent
address?
17
63
Frequency vs Updatedness of
Addresses with the Same Name
5 7 6
0 0
76
3 2 0 0 0
69
8
2 0 1 1
80
70
60
50
40
30
20
10
0
1 2 3 4 5 6
Number of Addresses
Frequency
2010
Before 2010
Different Person
18. Do records with frequent address more
likely to be more up-to-date than infrequent
address?
H2: F 2010 = F Before 2010 = F Different
p < 0.01
Results
18
7
6
5
4
3
2
1
0
2010 Before 2010 Different Person
Frequency
Paired Frequency of Addresses
with the Same Name
19. Results
Is local migration more frequent than
distant migration?
Intra-city: 6833 (86.5%)
Inter-city: 1061 (13.5%)
19
VA Migration in 2010
4745
2088
199
713
145 4
5000
4000
3000
2000
1000
0
0-10 11-50 51-100 101-500 501-1000 1000+
Number of Individuals
Distance (km)
21. Conclusion
What are the spatial and frequency
characteristics of web demographics for
record linkage?
Are near records more likely to be the
same person than distant records?
Do records with frequent address more
likely to be more up-to-date than infrequent
address?
2010 & Before 2010
2010 & Different
Is local migration more frequent than
distant migration?
Distance decay
22. Implications
Remarks
Tobler’s law
Zipf’s law in surname ranking
Yes? vs No?
Spatio-demographic
patterns
Migration hubs/corridors?
Who moved?
30-40 & 65+
25-54: ↓d ↑A
55-65: ↑d ↑A
Spatiotemporal migration
Temporal record linkage
Longitudinal tracking
Geovisualization
23. Research Agenda
Census 2.0
Record linkage
Monitor population change
Census coverage estimation
Demographic analysis
Age structure
Ethnic enclaves
Migration
…
23
24. aJunfang Chen, aChristian Richardson,
aYan Lin, aKhila Dahal, aNathaniel Dede-Bamfo,
aKumudan Grubh, bJohn Davis, cNiem Huynh,
? a Department of Geography
b Department of Psychology
c American Association of Geographer
chow@txstate.edu
24