My name is Adrienne Allegretti and I’m the GIS Project manager at blue RasterContract with NCES to build and maintain web map applications and tools such as the SDDS map viewerI’m here today to talk about Geocoding and NCES’s initiatives in this area.
The process of finding associated geographic coordinates (latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes).Addresses are great for the postal service but not for doing statistical analysis.
This method makes use of data from a street GIS where the street network is already mapped within the geographic coordinate space. Each street segment is attributed with address ranges (e.g. house numbers from one segment to the next). Geocoding takes an address, matches it to a street and specific segment (such as a block). Geocoding then interpolates the position of the address, within the range along the segment.
Other Techniques include locating a point at the centroid of a land parcel, using a GPS to map a location, or using a street intersection or midpoint along a street centerline.
THEN: Geocoding millions of addresses could take Days or Weeks
Now it just takes hoursIn addition, the technology for parsing address has greatly increased and systems have thus become much smarter about delineating the address from the city and state
To support spatial analysis and provide geographic context to a schools location in particular to Census geographies. Geocoded schools can be overlaid with other geographic layers such as:School DistrictsLocalesCountiesCongressional DistrictsAnd more…
Where schools are matters - so that we can better our children’s educationso that there is adequate response in the event of an emergency For instance, NCES collaborates with Homeland Security in determining where daytime populations are in order to aid disaster relief and response.They also work with EPA and FEMA for similar reasons. Dept of Ed is looked at as the authoritative source for Federal gov in terms of knowing where schools are. So, knowing where the administrative office of a school is fine but at the end of the day we need to know where the students really are.
The accuracy of geocoded data has a large bearing on the quality of research that can be done using that data. A school geocoded to the wrong location could have implications for that school - such as a loss of funding opportunities.It could also have an impact on the planning of appropriate emergency response.
IN addition to investing in your data quality you want to make sure you format the data in a style the goeoder can handle and be sure that you are using locational addresses and not the PO Box or the Administrative Address of a school.
Here is an example of a Bad address that when entered into ArcGIS online we only get the centroid of the zip code because the street address can’t be found.
If we remove the place name and add in the vowels to the road address, we get an address that the geocoder can find.
So now you’re ready to geocode, how do you go about choosing an appropriate geocoding solution.Well, you need to take into consideration a few thingsDo you have a small budget or a large budget with the ability to purchase your own reference dataDo you have a single record, hundreds or thousands of records, or do you have millionsAre you just bring an Excel file to a web browser or do you have an IT department with enterprise servers and cloud deployment. Or, do you have ArcGIS for Desktop or have developers that can create your own webservice with your data4) Are you the only one that needs to access the geocoder or is there whole team of individuals?5) And lastly, how publicly accessible data is your data? You might need to run the geocoding behind a firewall if it’s related to Financial Loan informationNo matter your situation, there is an approach for you.
So, if your budget allows to make the initial investment in the data and you expect to be geocoding millions of addresses and If your address lists are private and need to be secure behind a firewall - You should consider setting up an in-house solution. For millions of addresses, you will want to geocode via desktop (ArcMap) than over the web which will be much faster – but there is some prerequisites you need for this.
ArcGIS desktop software is the best option and offers geocoding capabilities without custom development. Geocoding on the desktop also assumes you have prepared your address data, but there are tools to help with that including a “Standardize Addresses Tool”Reference Data is the crucial part. These are the actual road network and address information used to determine the lat, lng ‘s of addresses. We are using Street Map Premium data which gives you pre-built networks optimized for geocoding and routing. If you cannot access this data, you can build your own networks off of Census TIGER/LINE data, but this can be a significant effort especially for national level geocoding.With reference data in hand, you now must build your own address locator.
One of the most powerful built-in features of ArcGIS Desktop is the use of composite locators. A composite locator is made from multiple locators used in a cascading fashions. While you may not be able to get the best match possible for every address, it does ensure that some match will occur provide the address has information for the last locator (zip code). This can also help in identifying problem areas in data.
Here we see the ArcGIS Desktop User Interface with Geocoding toolbar enabled. I have selected a composite locator, and on the bottom of the screen we see the address table I want to code.
Here we see the configuration screen for the “geocode addresses tool”. We select the name of the field in our table which corresponds to the street, city, state, and zip of the addresses. We have additional options to tweak spelling sensitivity, the minimum score of an address against reference data to signify a “match” and what data fields will be added to the output table.We generally just go with the default options. Tweaking these would be more for the advanced user.
Here is what the results look like, which provide the % matched and the ability to make edits to your data and retry if needed.When do this with millions of records, you can’t possibly sit here and interactively match.So you have to determine your threshold and what data you’re trying to match against. Over time, investing in our data quality will make the unmatched addresses much smaller.
If you’re doing this often, it may be great to get your own reference dataBut you will need to maintain that dataSupport – good to be able to call in someplace and get helpBenefit is of not having to do custom script and development – accessible to non-programmer typesBut not everyone is geocoding millions of records and has the budget for this type of arrangement – so what are their options
MillionsFastest to geocode on your desktop or your own server infrastruce of web service to meet your own needsThousandsWebservicesFor millions of addresses, faster to geocode via desktop (ArcMap) than over the webAddress lists private and need to be secure behind a firewall?Use ArcGIS Desktop, along with either StreetMap Premium, ArcGIS Data Appliance, or Address Coder, is the best option. Want to deploy a public-facing Web applications or manage small- to medium-sized databases in which address security is not a main concern?Use ArcGIS Online geocoding services, along with the World Street Map service, or other geocoding APIs like Google, Bing, Yahoo or Mapquest.
To do small batches and single addresses and if you don’t have the ability to do it over the deskotp, here are a few web Services that are available
Using ArcGIS online as an example of how it worksLimitations are 250 records at a time for the free service and you really just are given the visualization tools of seeing your points on the map – not a returned dataset with the addresses lats and longs. But here is demo regardless, so you can see how the single address and multiple address geocoding works in action in a public web service
Minimum of 250 w/o subscription
Want to deploy a public-facing Web applications or manage small- to medium-sized databases in which address security is not a main concern?Use ArcGIS Online geocoding services, along with the World Street Map service, or other geocoding APIs like Google, Bing, Yahoo or Mapquest.
ArcGIS OnlineYahooBingGoogleMapQuestGeonamesBe sure to read the terms of service and know the limitations. For example, when using Google, you must display your results on a Google map. Google limits requests to 2,500 addresses, etc. be sure to read and follow the terms of service be aware of the licensing agreements.Also, most don’t allow you to persist the lat/long either
The goal is to always get the best match which is the Premise or Roof Top of the address.
Using a composite locators as an example, we ran some tests to show how the level of accuracy can work.In the first example, the geocoder apparently could not find the address, street address, nor the street names or zips.Many reasons why the composite can fail:In both situations the zip code is wrongThis is your extra homework for investing in data quality
If your just aggregating data at the state level perhaps this is an ok match for you but if you’re trying to determine emergency shelters, well this isn’t going to do it for you
This is a little better than the zip centroid but probably only as useful
Interpolated – conventional geocoder as we discussed in the beginning
Available from Navteq and know the actual physical location. This is a somewhat recent advancement and mostly available in urban areas where businesses have invested more heavily in the data
Placefinders provide the location of places within a certain vicinity. Such as a search on Starbucks when zoomed in to Downtown DC in Google
Reverse Geocoding finds the address of a lat/long
we’re working on a prototype tool to help increase the ability of NCES to geocode schools and assign locale codes which I’d like to now demo
Zones developed by the Census Beurue for NCES of how urban or rural an area isIt matters where schools are because the Locale Codes can drive funding
The urban-centric locale code system classifies territory into four major types: city, suburban, town, and rural. Each type has three subcategories. For city and suburb, these are gradations of size – large, midsize, and small. Towns and rural areas are further distinguished by their distance from an urbanized area. They can be characterized as fringe, distant, or remote.
The School Attendance Boundary Project seeks to make school boundaries readily available and allow for linkages of demographic data of populations living within those boundaries.So far have collected 350 largest districts 1000+ collected by the end of the yearThat’s over 50% of students
1. NCES STATS-DC 2012 – July 13, 2012Presented by Tai Phan & Adrienne AllegrettiNCES, Blue RasterGEOCODING OUR NATION’S SCHOOLS
2. PRESENTATION OUTLINE Introduction Overview of Geocoding Data Preparation Selecting a Geocode Solution Live Demonstration Overview of SDDS Updates Summary/Q & A
3. WHAT IS GEOCODING? The process of finding associated geographic coordinates (latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes).
4. HOW DOES GEOCODING WORK?Address Interpolation:
5. HOW DOES GEOCODING WORK?Other Techniques:
6. ADVANCEMENTS IN GEOCODING THEN
7. ADVANCEMENTS IN GEOCODING NOW
8. WHY DOES NCES GEOCODE SCHOOLADDRESSES? To support spatial analysis and provide geographic context to a schools location Geocoded schools can be overlaid with other geographic layers such as: School Districts Locale Codes Counties Congressional Districts And more…
9. LOCATION MATTERS
10. HOW TO PREPARE DATAInvest in your address data quality and check for errors before you begin: Spelling Numeric address ranges Missing information
11. IMPLICATIONS OF GEOCODING ERRORS Accuracy of geocoding dictates quality of research Schools geocoded incorrectly can mean lost funding Unable to respond to an emergency appropriately
12. HOW TO PREPARE DATAFormat it in a style that the geocoder can handleUse the locational address, not the PO Box orAdministrative address of the school
13. HOW TO PREPARE DATABad: FMLY CHLD CTR 1411 LNCLNWY W, MISHAWAKA, IN,46544
14. HOW TO PREPARE DATAGood: 1411 LINCOLN WAY WEST, MISHAWAKA, IN, 46544
15. SELECTING A GEOCODE SOLUTION Things to consider: Budget Number of Records Infrastructure Number of Individuals Needing Access Frequency of Geocoding Sensitivity of Data
16. 1ST APPROACH: LOCAL/INTERNAL Datacenter Economy of and/or Scale Cloud Services Millions of Sensitive addresses Data In House
17. GEOCODING ON THE DESKTOP Prerequisites ArcGIS Desktop Basic Prepared Address Table Reference Data (NAVTEQ, TOM TOM, TIGER) Address Locator (Composite)
18. ON THE DESKTOP – COMPOSITE LOCATOR Multiple locators used in a cascading fashion Address • Best Match Match Street • Decent Centroid Match Match Zip Code • Acceptable Match Match
19. ON THE DESKTOP – USER INTERFACE
20. ON THE DESKTOP - OPTIONS
21. ON THE DESKTOP - OUTPUT
22. ON THE DESKTOP SUMMARY Great way to go if software and reference data is already in-house Can geocode more than 350,000 addresses per hour Data maintenance and support No custom development required
23. 2ND APPROACH: GEOCODING WEBSITES Excel and Points on a Web Map Browser Hundreds Publicly of address available at a time data Geocoding Websites
29. 3RD APPROACH: WEB APIS Publicly Accessible Data Hundreds to Custom Thousands of Development addresses at Required a time Web APIs
30. GEOCODING APIS AVAILABLE FORCUSTOM DEVELOPMENT ArcGIS Online Yahoo Bing Google MapQuest Geonames Be sure to read the terms of service and know the limitations. For example, when using Google, you must display your results on a Google map. Google limits requests to 2,500 addresses, etc.
31. GEOCODE RESULTS The Hit Rate is the number of addresses that are geocodable The Match Scores tell you what level of accuracy a particular address was geocoded. - Country - Region (state, province, prefecture, etc.) - Sub-region (county, municipality, etc.) - Town (city, village) - Post code (zip code) - Street - Intersection - Address - Premise or Roof Top
32. HIT RATES AND MATCH SCORES No Matches Found
33. HIT RATES AND MATCH SCORES Zip Code Centroid
34. HIT RATES AND MATCH SCORES Street Name Found
35. HIT RATES AND MATCH SCORES Street Address Found
36. HIT RATES AND MATCH SCORES Address Point Found (Highest Accuracy!)
37. OTHER OPTIONS - PLACEFINDERS
38. OTHER OPTIONS – REVERSE GEOCODING
39. DEMO – SDDS GEOCODING PROTOTYPE
40. WHAT ARE LOCALE CODES?
41. WHAT ARE LOCALE CODES? Read more: http://nces.ed.gov/ccd/rural_locales.asp
42. SCHOOL BOUNDARY PROJECT Project Background 350largest districts already collected 1000+ collected by the end of the year That’s over 50% of students
43. SCHOOL BOUNDARY PROJECTWhere to find the data Can be downloaded from SDDS Standard Map Viewer.
44. WHAT’S NEW IN SDDS MAPVIEWER Map Viewer Standard Census 2010 ACS 5yr Estimate 2006- 2010 Public School Boundaries Middle Elementary High Promise Neighborhood Schools Map Viewer Mobile Migration to GEOCLOUD http://nces.ed.gov/surveys/sdds/index.aspx
46. FOR MORE INFORMATION: Tai Phan Adrienne Allegretti Tai.Phan@ed.gov email@example.com 202-502-7431 703-842-0171 www.blueraster.com blog.blueraster.com