John Fagan - The Black Art of Geocoding


Published on

Mapping/LBS applications require 3 core engines, namely Mapping, Routing and Geocoding. The latter is often overlooked, but Geocoding is the fundamental component of all Mapping and LBS applications. If you don’t have a lat/lon, then how do you find a map, how do you get from a to b, how do you plot your data?
This paper will give a whistlestop tour of the basics of mapping and routing engines and then do a deep dive on Geocoding. It will suggest that we have solved routing and mapping, but we have a lot of work to do with Geocoding.

1 Comment
  • This presentation was originally uploaded on the Geocommunity conference site
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

John Fagan - The Black Art of Geocoding

  1. 1. The Black Art of GeocodingFinding that elusive lat/lon<br />John Fagan, Microsoft<br />
  2. 2. The Black Art of GeocodingFinding that elusive lat/lon<br />John Fagan<br />Program Manager<br />Microsoft Corporation<br />@johnbfagan<br />
  3. 3. We been making maps for 1000’s of years<br />
  4. 4. Well known and established standards/principles<br />
  5. 5. Lots of experience in building software to create bitmaps from vector and raster data<br />
  6. 6. Data availability & Simple data model<br />
  7. 7. Mapping easy to scale<br />
  8. 8. ...and so is routing<br />1000’s years experience in wayfinding<br />Over 50 years experience in routing algorithms<br />Dijkstra&apos;s shortest path algorithm (1959)<br />
  9. 9. Data availability & Simple data model<br />
  10. 10. Routing, easy to scale<br />
  11. 11. Geocoding not so easy<br />20 years experience<br />10 years of global Geocoding<br />5 years exposing geocoding to the mass consumer<br />No standard algorithms<br />Very few databases purpose built (maybe GNAF)<br />Very hard to scale<br />
  12. 12. Geocoding is fundamental<br />Cant get a map without a geocode<br />Cant get a route without a geocode<br />Cant view your data without a geocode<br />80% of all information contains a geographic element. <br />
  13. 13. It used to be easier<br />
  14. 14. Now its hard<br />
  15. 15. User expectations change with unstructured input<br />67 hill veiw road, s61 2bn in the 1850&apos;s<br />1.5 hours from Nice<br />exact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok.<br />10 mile radius from se20 7ua<br />how long would it take me to walk around cancun<br />how to get to m13 gb from g83 9le by car<br />do bearded dragons bite?<br />
  16. 16. But ......Geocoding NOT about Search<br />
  17. 17. 52.19157,-1.70415<br />
  18. 18. The reason it&apos;s called &apos;I&apos;m Feeling Lucky,&apos; is of course that&apos;s a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you&apos;re going to try that with one go,&quot; tried to explain Sergey Brin.<br />
  19. 19. Why is it hard (2 reasons)<br />
  20. 20. Parsing: Hard to understand unstructured input<br />
  21. 21. Finding Stratford-upon-Avon<br />stratford<br />stratford upon avon<br />Stratford upon haven<br />StratfordUponAvon<br />Stratford-Upon-Avon<br />stratford on avon<br />stratford-on-avon<br />stratford 0n avon<br />stratford - upon-avon<br />stratford on avaon<br />stratfordaponavon<br />stratford upon aavon<br />stratfordupponavon<br />
  22. 22. Finding Stratford-upon-Avon<br />
  23. 23. Parsing<br />In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.<br /><br />
  24. 24. Old way of Parsing – Rules based<br />A rules based approach (mainly done with regular expressions) <br />
  25. 25. Probabilistic approach <br />Machine learned<br />Requires you to “train” the engine<br />Requires truth sets of training data<br /><br />
  26. 26. Probabilistic approach: Hidden Markov Model <br />input --&gt; 165 fleet street london EC4A 2DY <br />output --&gt; <br /> address { <br /> street number : 165<br /> street : fleet street <br /> city : london<br /> postcode : EC4A 2DY <br /> } <br />
  27. 27. Multimap stats<br />
  28. 28. Parsing has its limitations<br />Parsing failures<br />Multimap/Bing Maps (standrewsscotland)<br />Google (uk near Boston, MA, USA)<br />All fail - House number plus postcode (165, EC4A 2DY)<br />
  29. 29. Parsing using a Spatial Engine<br /><br />
  30. 30. Why is it hard (Data)<br />
  31. 31. Hard to match input with reference database<br />
  32. 32. [OSM-talk] Baghdad maps<br />I am informed that any road may have up to 4 names (which may be the same or different): <br />The pre-Saddam name <br />The Saddam-era name. <br />The &quot;public&quot; name - What the people who live there call it. <br />The &quot;Official&quot; name - What the new Government calls it. <br />This situation is further complicated by language and social issues: Language <br />The roads are names in Arabic.<br />There is no fixed translation between the Arabic and Latin alphabets. <br />Social Issues: <br />1) Sunnis tend to use the Saddam-era names <br />Shia tend to rename streets and won&apos;t acknowledge Saddam-era names. <br />Ethnic cleansing is changing the neighbourhoods and hence the names. <br />Names (such as 14th July Bridge) will change later. <br />My translator&apos;s opinion is that street names are going to take at least 2-3 years to settle down.<br /><br />
  33. 33. Don&apos;t throw away your data<br />Multimap have always kept old postcodes<br />10% of Multimap’s postcode database is of “dead” postcodes<br />This might not work for routing and mapping, but very valuable for Geocoding<br />
  34. 34. EC4A 1HE – Postcode of vintage 2002<br />
  35. 35. Lash data and enrich<br />Stratford-upon-Avon<br />
  36. 36.
  37. 37. Future = Real time Geocoding?<br />
  38. 38. Summary<br />Mapping and Routing – FIXED<br />Geocoding – Must Try Harder<br />Parsing <br />Data<br />
  39. 39. thanksjohn<br />