Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Geocoding Overview

2,034 views

Published on

Presented by Marc Tobias Metten of the OpenCage Geocoder team at FOSSGIS2015 in Münster, Germany on 11th March 2015

Published in: Technology
  • Be the first to comment

Geocoding Overview

  1. 1. OpenCage FOSSGIS 2015 http://worldwideberlin.com/
  2. 2. OpenCage FOSSGIS 2015 Overview I. place name disambiguation (homonyms) – with & without spellcheck II. Nominatim III. other (open data) geocoders – 2015 trends – opportunities to share data, config, tests IV. shared ranking/scoring data
  3. 3. OpenCage FOSSGIS 2015 OpenCage Geocoder
  4. 4. OpenCage FOSSGIS 2015 Welches Münster meinen sie?
  5. 5. OpenCage FOSSGIS 2015 Nominatim geocoder
  6. 6. OpenCage FOSSGIS 2015
  7. 7. OpenCage FOSSGIS 2015 Mühlheim vs Mülheim
  8. 8. OpenCage FOSSGIS 2015 “eifelturm”
  9. 9. OpenCage FOSSGIS 2015 “eiffel turm”
  10. 10. OpenCage FOSSGIS 2015 “eiffeltower” => no result
  11. 11. OpenCage FOSSGIS 2015 “eifel tower” => fair ground, Varna Bulgaria (fixed last week)
  12. 12. OpenCage FOSSGIS 2015 “eiffel tower” => one in Paris => replicas around the world => restaurants around the world
  13. 13. OpenCage FOSSGIS 2015
  14. 14. OpenCage FOSSGIS 2015 http://www.openstreetmap.org/#map=17/39.80885/116.28163
  15. 15. OpenCage FOSSGIS 2015
  16. 16. OpenCage FOSSGIS 2015
  17. 17. OpenCage FOSSGIS 2015 Nominatim ● OSM data, minutely updates ● + UK postal codes, TIGER ● 1TB PostGIS ● import in C, setup scripts in PHP, Postgres stored procedures, PHP frontend, Python&PHP test suite ● autocomplete if you add Photon geocoder ● no spellcheck
  18. 18. OpenCage FOSSGIS 2015 regression/blackbox tests
  19. 19. OpenCage FOSSGIS 2015 other geocoders Closed source Open source, high resources Open source, low resources Google Maps Mapzen “Pelias” OpenStreetMap “Nominatim” Bing/Yahoo Mapbox “Carmen” OpenCage (multiple) Mapquest Mapquest open (Nominatim) geonames ESRI/ArcGIS Online Foursquare “Quattroshapes” geocod.io (Tiger data) Baidu Scout Photon (Nominatim) Yandex Cloudmade geo.io (Nominatim) TomTom DSTK (Tiger, geonames) Amazon (Android only) SmartyStreets Telenav ... Nokia/Ovi/Here Apple (iOS only) ...
  20. 20. OpenCage FOSSGIS 2015 trends ● SSD ● Add commercial sources ● Full builds, downloadable index ● High parallel (map/reduce, nodejs), cloud scaling, noSQL ● Community building, guidelines ● Test suites
  21. 21. OpenCage FOSSGIS 2015 typical features to improve ● horizontal scaling ● autocomplete ● spellcheck ● improve text parsing (App 3, 111-113b) ● crossings (Main & 2nd N, New Orleans) ● “4km north of $cityname on the N6” ● tests for non-latin alphabets ● postal code boundaries ● localsearch/POIs
  22. 22. OpenCage FOSSGIS 2015 what should be shared ● aka. don't reinvent everything ● standard test suite to compare geocoders ● hierarchy data ● address parsing ● address formatting ● language configuration ● data parsing, e.g. OSM tags
  23. 23. OpenCage FOSSGIS 2015
  24. 24. OpenCage FOSSGIS 2015
  25. 25. OpenCage FOSSGIS 2015 openaddresses.io ● 110m addresses ● 10GB of text files 1174 SMITH CREEK WAY, BRASSFIELD, WAKE FOREST, NC 27587 732 STEWARTS ROAD, LANEXA, VA 23124
  26. 26. OpenCage FOSSGIS 2015 address formatting https://github.com/lokku/address-formatting/ – configuration – test cases for 33 countries – reference implementation in Perl { country_code: 'dk', village: 'Ærøskøbing', county: 'Ærø Municipality', house_number: '17A', neighbourhood: 'Paradiset', postcode: '5970', road: 'Baggårde', state: 'Region of Southern Denmark' } Baggårde 17A, 5970 Ærøskøbing, Denmark Adama Asnyka 1, 59-700 Bolesławiec, Poland CAI, Cerrito 1250, Retiro, C1010AAZ Buenos Aires, Argentina
  27. 27. OpenCage FOSSGIS 2015 wikipedia data
  28. 28. OpenCage FOSSGIS 2015 core geocoding logic 1. tokenize 2. filter • fixed bounding box, browser window, country • OSM tags/POI search • min-max admin 3. search 4. rank • country bias • language bias (client, explicit) • location boost (client, explicit, history) • maybe: spellcheck • maybe: retry/failover/remove phrases • importance boost
  29. 29. OpenCage FOSSGIS 2015 http://blog.mayflower.de/755-Schnelle-Volltextsuche-mit-Solr.html
  30. 30. OpenCage FOSSGIS 2015 map to hierachy (ranks) http://wiki.openstreetmap.org/wiki/Nominatim/Development_overview
  31. 31. OpenCage FOSSGIS 2015 names, names, names
  32. 32. OpenCage FOSSGIS 2015 name is one of many factors ranking examples: ● Altona – type: suburb vs train station vs town ins US/Canada ● Germany – admin_level=2 (country) vs island ● Mt everest – importance: viewpoint vs peak vs island ● Oktoberfest – actually a alt_name of Theresienwiese ● Königsberg – 10x a peak, 1x old_name of Kaliningrad ● Hitlerberg – old_name:1934-1945 of Heigelkopf
  33. 33. OpenCage FOSSGIS 2015 status on wikipedia_articles.bin ● version 1: wikipedia pageview logs – https://en.wikipedia.org/wiki/Wikipedia:Notability ● version 2 (current): parsing wikipedia articles and count links – last updated 2013 – 80m wikipedia entries + 15m redirects – 0.6m places in OSM have wikipedia tag set (2013: 0.4m) ● Version 3 (TBD): parsing wikipedia geo exports – http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Haupts eite/Wikipedia-World/en – 3.4m entries, more languages, regular dumps, new documentaton ● version 4 (?) - used wikidata exports - used by multiple geocoders
  34. 34. OpenCage FOSSGIS 2015 what can mappers do? ● add wikipedia tags ● fix administrative levels ● don't add wrong names (typos) ● file bugs (github) http://nominatim.openstreetmap.org/
  35. 35. OpenCage FOSSGIS 2015 … and if all fails: rename city
  36. 36. OpenCage FOSSGIS 2015 Questions ? mtm@opencagedata.com

×