• Save
John Fagan: The Black Art of Geocoding
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

John Fagan: The Black Art of Geocoding

on

  • 3,480 views

John Fagan of Microsoft talks about the challenges of Geocoding in consumer online mapping services

John Fagan of Microsoft talks about the challenges of Geocoding in consumer online mapping services

Statistics

Views

Total Views
3,480
Views on SlideShare
2,397
Embed Views
1,083

Actions

Likes
0
Downloads
0
Comments
0

4 Embeds 1,083

http://www.ubergeo.com 1077
http://www.slideshare.net 4
http://search.mywebsearch.com 1
http://webcache.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

John Fagan: The Black Art of Geocoding Presentation Transcript

  • 1. The Black Art of GeocodingFinding that elusive lat/lon
    John Fagan, Microsoft
  • 2. The Black Art of GeocodingFinding that elusive lat/lon
    John Fagan
    Program Manager
    Microsoft Corporation
    @johnbfagan
  • 3. We been making maps for 1000’s of years
  • 4. Well known and established standards/principles
  • 5. Lots of experience in building software to create bitmaps from vector and raster data
  • 6. Data availability & Simple data model
  • 7. Mapping easy to scale
  • 8. ...and so is routing
    1000’s years experience in wayfinding
    Over 50 years experience in routing algorithms
    Dijkstra's shortest path algorithm (1959)
  • 9. Data availability & Simple data model
  • 10. Routing, easy to scale
  • 11. Geocoding not so easy
    20 years experience
    10 years of global Geocoding
    5 years exposing geocoding to the mass consumer
    No standard algorithms
    Very few databases purpose built (maybe GNAF)
    Very hard to scale
  • 12. Geocoding is fundamental
    Cant get a map without a geocode
    Cant get a route without a geocode
    Cant view your data without a geocode
    80% of all information contains a geographic element.
  • 13. It used to be easier
  • 14. Now its hard
  • 15. User expectations change with unstructured input
    67 hill veiw road, s61 2bn in the 1850's
    1.5 hours from Nice
    exact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok.
    10 mile radius from se20 7ua
    how long would it take me to walk around cancun
    how to get to m13 gb from g83 9le by car
    do bearded dragons bite?
  • 16. But ......Geocoding NOT about Search
  • 17. 52.19157,-1.70415
  • 18. The reason it's called 'I'm Feeling Lucky,' is of course that's a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you're going to try that with one go," tried to explain Sergey Brin.
  • 19. Why is it hard (2 reasons)
  • 20. Parsing: Hard to understand unstructured input
  • 21. Finding Stratford-upon-Avon
    stratford
    stratford upon avon
    Stratford upon haven
    StratfordUponAvon
    Stratford-Upon-Avon
    stratford on avon
    stratford-on-avon
    stratford 0n avon
    stratford - upon-avon
    stratford on avaon
    stratfordaponavon
    stratford upon aavon
    stratfordupponavon
  • 22. Parsing
    In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.
    http://en.wikipedia.org/wiki/Parsing
  • 23. Old way of Parsing – Rules based
    A rules based approach (mainly done with regular expressions)
  • 24. Probabilistic approach
    Machine learned
    Requires you to “train” the engine
    Requires truth sets of training data
    http://en.wikipedia.org/wiki/Hidden_Markov_model
  • 25. Probabilistic approach: Hidden Markov Model
    input --> 165 fleet street london EC4A 2DY
    output -->
    address {
    street number : 165
    street : fleet street
    city : london
    postcode : EC4A 2DY
    }
  • 26. Multimap stats
  • 27. Parsing has its limitations
    Parsing failures
    Multimap/Bing Maps (standrewsscotland)
    Google (uk near Boston, MA, USA)
    All fail - House number plus postcode (165, EC4A 2DY)
  • 28. Parsing using a Spatial Engine
    http://research.microsoft.com/en-us/people/josephj/acm_gis_2007_robust_location_search.pdf
  • 29. Why is it hard (Data)
  • 30. Hard to match input with reference database
  • 31. [OSM-talk] Baghdad maps
    I am informed that any road may have up to 4 names (which may be the same or different):
    The pre-Saddam name
    The Saddam-era name.
    The "public" name - What the people who live there call it.
    The "Official" name - What the new Government calls it.
    This situation is further complicated by language and social issues: Language
    The roads are names in Arabic.
    There is no fixed translation between the Arabic and Latin alphabets.
    Social Issues:
    1) Sunnis tend to use the Saddam-era names
    Shia tend to rename streets and won't acknowledge Saddam-era names.
    Ethnic cleansing is changing the neighbourhoods and hence the names.
    Names (such as 14th July Bridge) will change later.
    My translator's opinion is that street names are going to take at least 2-3 years to settle down.
    http://lists.openstreetmap.org/pipermail/talk/2007-February/011273.html
  • 32. Don't throw away your data
    Multimap have always kept old postcodes
    10% of Multimap’s postcode database is of “dead” postcodes
    This might not work for routing and mapping, but very valuable for Geocoding
  • 33. EC4A 1HE – Postcode of vintage 2002
  • 34. Lash data and enrich
    Stratford-upon-Avon
  • 35. Future = Real time Geocoding?
  • 36. Summary
    Mapping and Routing – FIXED
    Geocoding – Must Try Harder
    Parsing
    Data
  • 37. thanksjohn faganubergeo.com@johnbfagan