Geocoding Overview

OpenCage FOSSGIS 2015
http://worldwideberlin.com/

Overview
I. place name disambiguation (homonyms)
– with & without spellcheck
II. Nominatim
III. other (open data) geocoders
– 2015 trends
– opportunities to share data, config, tests
IV. shared ranking/scoring data

OpenCage Geocoder

Welches Münster meinen sie?

Nominatim geocoder

Mühlheim vs Mülheim

“eifelturm”

“eiffel turm”

“eiffeltower” => no result

“eifel tower”
=> fair ground, Varna Bulgaria (fixed last week)

“eiffel tower”
=> one in Paris
=> replicas around the world
=> restaurants around the world

http://www.openstreetmap.org/#map=17/39.80885/116.28163

Nominatim
●
OSM data, minutely updates
●
+ UK postal codes, TIGER
●
1TB PostGIS
●
import in C, setup scripts in PHP, Postgres stored
procedures, PHP frontend, Python&PHP test suite
●
autocomplete if you add Photon geocoder
●
no spellcheck

regression/blackbox tests

other geocoders
Closed source Open source, high resources Open source, low resources
Google Maps Mapzen “Pelias” OpenStreetMap “Nominatim”
Bing/Yahoo Mapbox “Carmen” OpenCage (multiple)
Mapquest Mapquest open (Nominatim) geonames
ESRI/ArcGIS Online Foursquare “Quattroshapes” geocod.io (Tiger data)
Baidu Scout Photon (Nominatim)
Yandex Cloudmade geo.io (Nominatim)
TomTom DSTK (Tiger, geonames)
Amazon (Android only) SmartyStreets
Telenav ...
Nokia/Ovi/Here
Apple (iOS only)
...

trends
●
SSD
●
Add commercial sources
●
Full builds, downloadable index
●
High parallel (map/reduce, nodejs), cloud scaling,
noSQL
●
Community building, guidelines
●
Test suites

typical features to improve
●
horizontal scaling
●
autocomplete
●
spellcheck
●
improve text parsing (App 3, 111-113b)
●
crossings (Main & 2nd N, New Orleans)
●
“4km north of $cityname on the N6”
●
tests for non-latin alphabets
●
postal code boundaries
●
localsearch/POIs

what should be shared
●
aka. don't reinvent everything
●
standard test suite to compare geocoders
●
hierarchy data
●
address parsing
●
address formatting
●
language configuration
●
data parsing, e.g. OSM tags

openaddresses.io
●
110m addresses
●
10GB of text files
1174 SMITH CREEK WAY, BRASSFIELD, WAKE FOREST, NC 27587
732 STEWARTS ROAD, LANEXA, VA 23124

address formatting
https://github.com/lokku/address-formatting/
– configuration
– test cases for 33 countries
– reference implementation in Perl
{ country_code: 'dk', village: 'Ærøskøbing', county: 'Ærø
Municipality', house_number: '17A', neighbourhood: 'Paradiset',
postcode: '5970', road: 'Baggårde', state: 'Region of Southern Denmark'
}
Baggårde 17A, 5970 Ærøskøbing, Denmark
Adama Asnyka 1, 59-700 Bolesławiec, Poland
CAI, Cerrito 1250, Retiro, C1010AAZ Buenos Aires, Argentina

wikipedia data

core geocoding logic
1. tokenize
2. filter
•
fixed bounding box, browser window, country
•
OSM tags/POI search
•
min-max admin
3. search
4. rank
•
country bias
•
language bias (client, explicit)
•
location boost (client, explicit, history)
•
maybe: spellcheck
•
maybe: retry/failover/remove phrases
•
importance boost

http://blog.mayflower.de/755-Schnelle-Volltextsuche-mit-Solr.html

map to hierachy (ranks)
http://wiki.openstreetmap.org/wiki/Nominatim/Development_overview

names, names, names

name is one of many factors
ranking examples:
●
Altona
– type: suburb vs train station vs town ins US/Canada
●
Germany
– admin_level=2 (country) vs island
●
Mt everest
– importance: viewpoint vs peak vs island
●
Oktoberfest
– actually a alt_name of Theresienwiese
●
Königsberg
– 10x a peak, 1x old_name of Kaliningrad
●
Hitlerberg
– old_name:1934-1945 of Heigelkopf

status on wikipedia_articles.bin
●
version 1: wikipedia pageview logs
– https://en.wikipedia.org/wiki/Wikipedia:Notability
●
version 2 (current): parsing wikipedia articles and count links
– last updated 2013
– 80m wikipedia entries + 15m redirects
– 0.6m places in OSM have wikipedia tag set (2013: 0.4m)
●
Version 3 (TBD): parsing wikipedia geo exports
– http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Haupts
eite/Wikipedia-World/en
– 3.4m entries, more languages, regular dumps, new documentaton
●
version 4 (?)
- used wikidata exports
- used by multiple geocoders

what can mappers do?
●
add wikipedia tags
●
fix administrative levels
●
don't add wrong names (typos)
●
file bugs (github)
http://nominatim.openstreetmap.org/

… and if all fails: rename city

Questions ?
mtm@opencagedata.com

Geocoding Overview

Recommended

Recommended

More Related Content

More from lokku

More from lokku (20)

Recently uploaded

Recently uploaded (20)

Geocoding Overview