John Fagan: The Black Art of Geocoding


Published on

John Fagan of Microsoft talks about the challenges of Geocoding in consumer online mapping services

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

John Fagan: The Black Art of Geocoding

  1. 1. The Black Art of GeocodingFinding that elusive lat/lon<br />John Fagan, Microsoft<br />
  2. 2. The Black Art of GeocodingFinding that elusive lat/lon<br />John Fagan<br />Program Manager<br />Microsoft Corporation<br />@johnbfagan<br />
  3. 3. We been making maps for 1000’s of years<br />
  4. 4. Well known and established standards/principles<br />
  5. 5. Lots of experience in building software to create bitmaps from vector and raster data<br />
  6. 6. Data availability & Simple data model<br />
  7. 7. Mapping easy to scale<br />
  8. 8. ...and so is routing<br />1000’s years experience in wayfinding<br />Over 50 years experience in routing algorithms<br />Dijkstra&apos;s shortest path algorithm (1959)<br />
  9. 9. Data availability & Simple data model<br />
  10. 10. Routing, easy to scale<br />
  11. 11. Geocoding not so easy<br />20 years experience<br />10 years of global Geocoding<br />5 years exposing geocoding to the mass consumer<br />No standard algorithms<br />Very few databases purpose built (maybe GNAF)<br />Very hard to scale<br />
  12. 12. Geocoding is fundamental<br />Cant get a map without a geocode<br />Cant get a route without a geocode<br />Cant view your data without a geocode<br />80% of all information contains a geographic element. <br />
  13. 13. It used to be easier<br />
  14. 14. Now its hard<br />
  15. 15. User expectations change with unstructured input<br />67 hill veiw road, s61 2bn in the 1850&apos;s<br />1.5 hours from Nice<br />exact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok.<br />10 mile radius from se20 7ua<br />how long would it take me to walk around cancun<br />how to get to m13 gb from g83 9le by car<br />do bearded dragons bite?<br />
  16. 16. But ......Geocoding NOT about Search<br />
  17. 17. 52.19157,-1.70415<br />
  18. 18. The reason it&apos;s called &apos;I&apos;m Feeling Lucky,&apos; is of course that&apos;s a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you&apos;re going to try that with one go,&quot; tried to explain Sergey Brin.<br />
  19. 19. Why is it hard (2 reasons)<br />
  20. 20. Parsing: Hard to understand unstructured input<br />
  21. 21. Finding Stratford-upon-Avon<br />stratford<br />stratford upon avon<br />Stratford upon haven<br />StratfordUponAvon<br />Stratford-Upon-Avon<br />stratford on avon<br />stratford-on-avon<br />stratford 0n avon<br />stratford - upon-avon<br />stratford on avaon<br />stratfordaponavon<br />stratford upon aavon<br />stratfordupponavon<br />
  22. 22. Parsing<br />In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.<br /><br />
  23. 23. Old way of Parsing – Rules based<br />A rules based approach (mainly done with regular expressions) <br />
  24. 24. Probabilistic approach <br />Machine learned<br />Requires you to “train” the engine<br />Requires truth sets of training data<br /><br />
  25. 25. Probabilistic approach: Hidden Markov Model <br />input --&gt; 165 fleet street london EC4A 2DY <br />output --&gt; <br /> address { <br /> street number : 165<br /> street : fleet street <br /> city : london<br /> postcode : EC4A 2DY <br /> } <br />
  26. 26. Multimap stats<br />
  27. 27. Parsing has its limitations<br />Parsing failures<br />Multimap/Bing Maps (standrewsscotland)<br />Google (uk near Boston, MA, USA)<br />All fail - House number plus postcode (165, EC4A 2DY)<br />
  28. 28. Parsing using a Spatial Engine<br /><br />
  29. 29. Why is it hard (Data)<br />
  30. 30. Hard to match input with reference database<br />
  31. 31. [OSM-talk] Baghdad maps<br />I am informed that any road may have up to 4 names (which may be the same or different): <br />The pre-Saddam name <br />The Saddam-era name. <br />The &quot;public&quot; name - What the people who live there call it. <br />The &quot;Official&quot; name - What the new Government calls it. <br />This situation is further complicated by language and social issues: Language <br />The roads are names in Arabic.<br />There is no fixed translation between the Arabic and Latin alphabets. <br />Social Issues: <br />1) Sunnis tend to use the Saddam-era names <br />Shia tend to rename streets and won&apos;t acknowledge Saddam-era names. <br />Ethnic cleansing is changing the neighbourhoods and hence the names. <br />Names (such as 14th July Bridge) will change later. <br />My translator&apos;s opinion is that street names are going to take at least 2-3 years to settle down.<br /><br />
  32. 32. Don&apos;t throw away your data<br />Multimap have always kept old postcodes<br />10% of Multimap’s postcode database is of “dead” postcodes<br />This might not work for routing and mapping, but very valuable for Geocoding<br />
  33. 33. EC4A 1HE – Postcode of vintage 2002<br />
  34. 34. Lash data and enrich<br />Stratford-upon-Avon<br />
  35. 35. Future = Real time Geocoding?<br />
  36. 36. Summary<br />Mapping and Routing – FIXED<br />Geocoding – Must Try Harder<br />Parsing <br />Data<br />
  37. 37. thanksjohn<br />