SlideShare a Scribd company logo
The Black Art of GeocodingFinding that elusive lat/lon John Fagan, Microsoft
The Black Art of GeocodingFinding that elusive lat/lon John Fagan Program Manager Microsoft Corporation @johnbfagan
We been making maps for 1000’s of years
Well known and established standards/principles
Lots of experience in building software to create bitmaps from vector and raster data
Data availability & Simple data model
Mapping easy to scale
...and so is routing 1000’s years experience in wayfinding Over 50 years experience in routing algorithms Dijkstra's shortest path algorithm (1959)
Data availability & Simple data model
Routing, easy to scale
Geocoding not so easy 20 years experience 10 years of global Geocoding 5 years exposing geocoding to the mass consumer No standard algorithms Very few databases purpose built (maybe GNAF) Very hard to scale
Geocoding is fundamental Cant get a map without a geocode Cant get a route without a geocode Cant view your data without a geocode 80% of all information contains a geographic element.
It used to be easier
Now its hard
User expectations change with unstructured input 67 hill veiw road, s61 2bn in the 1850's 1.5 hours from Nice exact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok. 10 mile radius from se20 7ua how long would it take me to walk around cancun how to get to m13 gb from g83 9le by car do bearded dragons bite?
But ......Geocoding NOT about Search
52.19157,-1.70415
The reason it's called 'I'm Feeling Lucky,' is of course that's a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you're going to try that with one go," tried to explain Sergey Brin.
Why is it hard (2 reasons)
Parsing: Hard to understand unstructured input
Finding Stratford-upon-Avon stratford stratford upon avon Stratford upon haven StratfordUponAvon Stratford-Upon-Avon stratford on avon stratford-on-avon stratford 0n avon stratford - upon-avon stratford on avaon stratfordaponavon stratford upon aavon stratfordupponavon
Finding Stratford-upon-Avon
Parsing In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. http://en.wikipedia.org/wiki/Parsing
Old way of Parsing – Rules based A rules based approach (mainly done with regular expressions)
Probabilistic approach  Machine learned Requires you to “train” the engine Requires truth sets of training data http://en.wikipedia.org/wiki/Hidden_Markov_model
Probabilistic approach: Hidden Markov Model  input --> 165 fleet street london EC4A 2DY  output -->  	address {  		street number : 165 		street : fleet street  		city : london 		postcode : EC4A 2DY  		}
Multimap stats
Parsing has its limitations Parsing failures Multimap/Bing Maps (standrewsscotland) Google (uk near Boston, MA, USA) All fail - House number plus postcode (165, EC4A 2DY)
Parsing using a Spatial Engine http://research.microsoft.com/en-us/people/josephj/acm_gis_2007_robust_location_search.pdf
Why is it hard (Data)
Hard to match input with reference database
[OSM-talk] Baghdad maps I am informed that any road may have up to 4 names (which may be the same or different):  The pre-Saddam name  The Saddam-era name.  The "public" name - What the people who live there call it.  The "Official" name - What the new Government calls it.  This situation is further complicated by language and social issues: Language  The roads are names in Arabic. There is no fixed translation between the Arabic and Latin alphabets.  Social Issues:  1) Sunnis tend to use the Saddam-era names  Shia tend to rename streets and won't acknowledge Saddam-era names.  Ethnic cleansing is changing the neighbourhoods and hence the names.  Names (such as 14th July Bridge) will change later.  My translator's opinion is that street names are going to take at least 2-3 years to settle down. http://lists.openstreetmap.org/pipermail/talk/2007-February/011273.html
Don't throw away your data Multimap have always kept old postcodes 10% of Multimap’s postcode database is of “dead” postcodes This might not work for routing and mapping, but very valuable for Geocoding
EC4A 1HE – Postcode of vintage 2002
Lash data and enrich Stratford-upon-Avon
Future = Real time Geocoding?
Summary Mapping and Routing – FIXED Geocoding – Must Try Harder Parsing  Data
thanksjohn faganubergeo.com@johnbfagan

More Related Content

More from AGI Geocommunity

Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08
Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08
Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08AGI Geocommunity
 
Addy Pope: Go-Geo! and GeoDoc
Addy Pope: Go-Geo! and GeoDocAddy Pope: Go-Geo! and GeoDoc
Addy Pope: Go-Geo! and GeoDocAGI Geocommunity
 
PBBI presents: Not the Day Job
PBBI presents: Not the Day JobPBBI presents: Not the Day Job
PBBI presents: Not the Day JobAGI Geocommunity
 
Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”
Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”
Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”AGI Geocommunity
 
Ian Painter: Behind every great Neogeographer is a Paleotard
Ian Painter: Behind every great Neogeographer is a PaleotardIan Painter: Behind every great Neogeographer is a Paleotard
Ian Painter: Behind every great Neogeographer is a PaleotardAGI Geocommunity
 
Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'
Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'
Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'AGI Geocommunity
 
Steven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESSSteven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESSAGI Geocommunity
 
Peter Batty: The grass is always greener … in defence of the Ordnance Survey
Peter Batty: The grass is always greener … in defence of the Ordnance SurveyPeter Batty: The grass is always greener … in defence of the Ordnance Survey
Peter Batty: The grass is always greener … in defence of the Ordnance SurveyAGI Geocommunity
 
Niall Carter: Breaking down the silos
Niall Carter: Breaking down the silosNiall Carter: Breaking down the silos
Niall Carter: Breaking down the silosAGI Geocommunity
 
Sarah James: Data licensing eed not be a problem
Sarah James: Data licensing eed not be a problemSarah James: Data licensing eed not be a problem
Sarah James: Data licensing eed not be a problemAGI Geocommunity
 
Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?AGI Geocommunity
 
Tim Martin: Using OS OpenSpace and Bing Maps Together
Tim Martin: Using OS OpenSpace and Bing Maps TogetherTim Martin: Using OS OpenSpace and Bing Maps Together
Tim Martin: Using OS OpenSpace and Bing Maps TogetherAGI Geocommunity
 
Terry Jones: Using FluidDB for storage in location-aware software
Terry Jones: Using FluidDB for storage in location-aware softwareTerry Jones: Using FluidDB for storage in location-aware software
Terry Jones: Using FluidDB for storage in location-aware softwareAGI Geocommunity
 
Steven Ramage: UK SDI: Lifecycle
Steven Ramage: UK SDI: LifecycleSteven Ramage: UK SDI: Lifecycle
Steven Ramage: UK SDI: LifecycleAGI Geocommunity
 
Steven Eglinton: Geo-Enabling Local Communities in Brazil
Steven Eglinton: Geo-Enabling Local Communities in BrazilSteven Eglinton: Geo-Enabling Local Communities in Brazil
Steven Eglinton: Geo-Enabling Local Communities in BrazilAGI Geocommunity
 
Tracey Stone: Beyond Visualisation
Tracey Stone: Beyond VisualisationTracey Stone: Beyond Visualisation
Tracey Stone: Beyond VisualisationAGI Geocommunity
 
Steve Calder: Business Benefits of GIS: An ROI Approach
Steve Calder: Business Benefits of GIS: An ROI ApproachSteve Calder: Business Benefits of GIS: An ROI Approach
Steve Calder: Business Benefits of GIS: An ROI ApproachAGI Geocommunity
 
Seppe Cassettari: New directions in Mapping Place
Seppe Cassettari: New directions in Mapping PlaceSeppe Cassettari: New directions in Mapping Place
Seppe Cassettari: New directions in Mapping PlaceAGI Geocommunity
 

More from AGI Geocommunity (20)

Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08
Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08
Simon Lewis & John Fagan: 15 “geoweb” innovations since AGI Geocommunity 08
 
Addy Pope: Go-Geo! and GeoDoc
Addy Pope: Go-Geo! and GeoDocAddy Pope: Go-Geo! and GeoDoc
Addy Pope: Go-Geo! and GeoDoc
 
Chris Osborne: ito!
Chris Osborne: ito!Chris Osborne: ito!
Chris Osborne: ito!
 
PBBI presents: Not the Day Job
PBBI presents: Not the Day JobPBBI presents: Not the Day Job
PBBI presents: Not the Day Job
 
Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”
Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”
Gary Gale: “Neo this” and “paleo that”, it’s all just “Geo”
 
Ian Painter: Behind every great Neogeographer is a Paleotard
Ian Painter: Behind every great Neogeographer is a PaleotardIan Painter: Behind every great Neogeographer is a Paleotard
Ian Painter: Behind every great Neogeographer is a Paleotard
 
Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'
Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'
Andrew Larcombe: Serious (geo) play, or 'why we need to be open to innovate'
 
Chris Parker: GeoVation
Chris Parker: GeoVationChris Parker: GeoVation
Chris Parker: GeoVation
 
Steven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESSSteven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESS
 
Peter Batty: The grass is always greener … in defence of the Ordnance Survey
Peter Batty: The grass is always greener … in defence of the Ordnance SurveyPeter Batty: The grass is always greener … in defence of the Ordnance Survey
Peter Batty: The grass is always greener … in defence of the Ordnance Survey
 
Niall Carter: Breaking down the silos
Niall Carter: Breaking down the silosNiall Carter: Breaking down the silos
Niall Carter: Breaking down the silos
 
Sarah James: Data licensing eed not be a problem
Sarah James: Data licensing eed not be a problemSarah James: Data licensing eed not be a problem
Sarah James: Data licensing eed not be a problem
 
Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?Tim Warr: Cloud Computing and GIS – all hype or something useful?
Tim Warr: Cloud Computing and GIS – all hype or something useful?
 
Tim Martin: Using OS OpenSpace and Bing Maps Together
Tim Martin: Using OS OpenSpace and Bing Maps TogetherTim Martin: Using OS OpenSpace and Bing Maps Together
Tim Martin: Using OS OpenSpace and Bing Maps Together
 
Terry Jones: Using FluidDB for storage in location-aware software
Terry Jones: Using FluidDB for storage in location-aware softwareTerry Jones: Using FluidDB for storage in location-aware software
Terry Jones: Using FluidDB for storage in location-aware software
 
Steven Ramage: UK SDI: Lifecycle
Steven Ramage: UK SDI: LifecycleSteven Ramage: UK SDI: Lifecycle
Steven Ramage: UK SDI: Lifecycle
 
Steven Eglinton: Geo-Enabling Local Communities in Brazil
Steven Eglinton: Geo-Enabling Local Communities in BrazilSteven Eglinton: Geo-Enabling Local Communities in Brazil
Steven Eglinton: Geo-Enabling Local Communities in Brazil
 
Tracey Stone: Beyond Visualisation
Tracey Stone: Beyond VisualisationTracey Stone: Beyond Visualisation
Tracey Stone: Beyond Visualisation
 
Steve Calder: Business Benefits of GIS: An ROI Approach
Steve Calder: Business Benefits of GIS: An ROI ApproachSteve Calder: Business Benefits of GIS: An ROI Approach
Steve Calder: Business Benefits of GIS: An ROI Approach
 
Seppe Cassettari: New directions in Mapping Place
Seppe Cassettari: New directions in Mapping PlaceSeppe Cassettari: New directions in Mapping Place
Seppe Cassettari: New directions in Mapping Place
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 

John Fagan: The Black Art of Geocoding

  • 1. The Black Art of GeocodingFinding that elusive lat/lon John Fagan, Microsoft
  • 2. The Black Art of GeocodingFinding that elusive lat/lon John Fagan Program Manager Microsoft Corporation @johnbfagan
  • 3. We been making maps for 1000’s of years
  • 4. Well known and established standards/principles
  • 5. Lots of experience in building software to create bitmaps from vector and raster data
  • 6. Data availability & Simple data model
  • 8. ...and so is routing 1000’s years experience in wayfinding Over 50 years experience in routing algorithms Dijkstra's shortest path algorithm (1959)
  • 9. Data availability & Simple data model
  • 11. Geocoding not so easy 20 years experience 10 years of global Geocoding 5 years exposing geocoding to the mass consumer No standard algorithms Very few databases purpose built (maybe GNAF) Very hard to scale
  • 12. Geocoding is fundamental Cant get a map without a geocode Cant get a route without a geocode Cant view your data without a geocode 80% of all information contains a geographic element.
  • 13. It used to be easier
  • 15. User expectations change with unstructured input 67 hill veiw road, s61 2bn in the 1850's 1.5 hours from Nice exact directions from Bangkok Patana School to Suvanapumi Airport in Bangkok. 10 mile radius from se20 7ua how long would it take me to walk around cancun how to get to m13 gb from g83 9le by car do bearded dragons bite?
  • 16. But ......Geocoding NOT about Search
  • 18. The reason it's called 'I'm Feeling Lucky,' is of course that's a pretty damn ambitious goal. I mean to get the exact right one thing without even giving you a list of choices, and so you have to feel a little bit lucky if you're going to try that with one go," tried to explain Sergey Brin.
  • 19. Why is it hard (2 reasons)
  • 20. Parsing: Hard to understand unstructured input
  • 21. Finding Stratford-upon-Avon stratford stratford upon avon Stratford upon haven StratfordUponAvon Stratford-Upon-Avon stratford on avon stratford-on-avon stratford 0n avon stratford - upon-avon stratford on avaon stratfordaponavon stratford upon aavon stratfordupponavon
  • 23. Parsing In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. http://en.wikipedia.org/wiki/Parsing
  • 24. Old way of Parsing – Rules based A rules based approach (mainly done with regular expressions)
  • 25. Probabilistic approach Machine learned Requires you to “train” the engine Requires truth sets of training data http://en.wikipedia.org/wiki/Hidden_Markov_model
  • 26. Probabilistic approach: Hidden Markov Model input --> 165 fleet street london EC4A 2DY output --> address { street number : 165 street : fleet street city : london postcode : EC4A 2DY }
  • 28. Parsing has its limitations Parsing failures Multimap/Bing Maps (standrewsscotland) Google (uk near Boston, MA, USA) All fail - House number plus postcode (165, EC4A 2DY)
  • 29. Parsing using a Spatial Engine http://research.microsoft.com/en-us/people/josephj/acm_gis_2007_robust_location_search.pdf
  • 30. Why is it hard (Data)
  • 31. Hard to match input with reference database
  • 32. [OSM-talk] Baghdad maps I am informed that any road may have up to 4 names (which may be the same or different): The pre-Saddam name The Saddam-era name. The "public" name - What the people who live there call it. The "Official" name - What the new Government calls it. This situation is further complicated by language and social issues: Language The roads are names in Arabic. There is no fixed translation between the Arabic and Latin alphabets. Social Issues: 1) Sunnis tend to use the Saddam-era names Shia tend to rename streets and won't acknowledge Saddam-era names. Ethnic cleansing is changing the neighbourhoods and hence the names. Names (such as 14th July Bridge) will change later. My translator's opinion is that street names are going to take at least 2-3 years to settle down. http://lists.openstreetmap.org/pipermail/talk/2007-February/011273.html
  • 33. Don't throw away your data Multimap have always kept old postcodes 10% of Multimap’s postcode database is of “dead” postcodes This might not work for routing and mapping, but very valuable for Geocoding
  • 34. EC4A 1HE – Postcode of vintage 2002
  • 35. Lash data and enrich Stratford-upon-Avon
  • 36.
  • 37. Future = Real time Geocoding?
  • 38. Summary Mapping and Routing – FIXED Geocoding – Must Try Harder Parsing Data