SlideShare a Scribd company logo
OpenCage FOSSGIS 2015
http://worldwideberlin.com/
OpenCage FOSSGIS 2015
Overview
I. place name disambiguation (homonyms)
– with & without spellcheck
II. Nominatim
III. other (open data) geocoders
– 2015 trends
– opportunities to share data, config, tests
IV. shared ranking/scoring data
OpenCage FOSSGIS 2015
OpenCage Geocoder
OpenCage FOSSGIS 2015
Welches Münster meinen sie?
OpenCage FOSSGIS 2015
Nominatim geocoder
OpenCage FOSSGIS 2015
OpenCage FOSSGIS 2015
Mühlheim vs Mülheim
OpenCage FOSSGIS 2015
“eifelturm”
OpenCage FOSSGIS 2015
“eiffel turm”
OpenCage FOSSGIS 2015
“eiffeltower” => no result
OpenCage FOSSGIS 2015
“eifel tower”
=> fair ground, Varna Bulgaria (fixed last week)
OpenCage FOSSGIS 2015
“eiffel tower”
=> one in Paris
=> replicas around the world
=> restaurants around the world
OpenCage FOSSGIS 2015
OpenCage FOSSGIS 2015
http://www.openstreetmap.org/#map=17/39.80885/116.28163
OpenCage FOSSGIS 2015
OpenCage FOSSGIS 2015
OpenCage FOSSGIS 2015
Nominatim
●
OSM data, minutely updates
●
+ UK postal codes, TIGER
●
1TB PostGIS
●
import in C, setup scripts in PHP, Postgres stored
procedures, PHP frontend, Python&PHP test suite
●
autocomplete if you add Photon geocoder
●
no spellcheck
OpenCage FOSSGIS 2015
regression/blackbox tests
OpenCage FOSSGIS 2015
other geocoders
Closed source Open source, high resources Open source, low resources
Google Maps Mapzen “Pelias” OpenStreetMap “Nominatim”
Bing/Yahoo Mapbox “Carmen” OpenCage (multiple)
Mapquest Mapquest open (Nominatim) geonames
ESRI/ArcGIS Online Foursquare “Quattroshapes” geocod.io (Tiger data)
Baidu Scout Photon (Nominatim)
Yandex Cloudmade geo.io (Nominatim)
TomTom DSTK (Tiger, geonames)
Amazon (Android only) SmartyStreets
Telenav ...
Nokia/Ovi/Here
Apple (iOS only)
...
OpenCage FOSSGIS 2015
trends
●
SSD
●
Add commercial sources
●
Full builds, downloadable index
●
High parallel (map/reduce, nodejs), cloud scaling,
noSQL
●
Community building, guidelines
●
Test suites
OpenCage FOSSGIS 2015
typical features to improve
●
horizontal scaling
●
autocomplete
●
spellcheck
●
improve text parsing (App 3, 111-113b)
●
crossings (Main & 2nd N, New Orleans)
●
“4km north of $cityname on the N6”
●
tests for non-latin alphabets
●
postal code boundaries
●
localsearch/POIs
OpenCage FOSSGIS 2015
what should be shared
●
aka. don't reinvent everything
●
standard test suite to compare geocoders
●
hierarchy data
●
address parsing
●
address formatting
●
language configuration
●
data parsing, e.g. OSM tags
OpenCage FOSSGIS 2015
OpenCage FOSSGIS 2015
OpenCage FOSSGIS 2015
openaddresses.io
●
110m addresses
●
10GB of text files
1174 SMITH CREEK WAY, BRASSFIELD, WAKE FOREST, NC 27587
732 STEWARTS ROAD, LANEXA, VA 23124
OpenCage FOSSGIS 2015
address formatting
https://github.com/lokku/address-formatting/
– configuration
– test cases for 33 countries
– reference implementation in Perl
{ country_code: 'dk', village: 'Ærøskøbing', county: 'Ærø
Municipality', house_number: '17A', neighbourhood: 'Paradiset',
postcode: '5970', road: 'Baggårde', state: 'Region of Southern Denmark'
}
Baggårde 17A, 5970 Ærøskøbing, Denmark
Adama Asnyka 1, 59-700 Bolesławiec, Poland
CAI, Cerrito 1250, Retiro, C1010AAZ Buenos Aires, Argentina
OpenCage FOSSGIS 2015
wikipedia data
OpenCage FOSSGIS 2015
core geocoding logic
1. tokenize
2. filter
•
fixed bounding box, browser window, country
•
OSM tags/POI search
•
min-max admin
3. search
4. rank
•
country bias
•
language bias (client, explicit)
•
location boost (client, explicit, history)
•
maybe: spellcheck
•
maybe: retry/failover/remove phrases
•
importance boost
OpenCage FOSSGIS 2015
http://blog.mayflower.de/755-Schnelle-Volltextsuche-mit-Solr.html
OpenCage FOSSGIS 2015
map to hierachy (ranks)
http://wiki.openstreetmap.org/wiki/Nominatim/Development_overview
OpenCage FOSSGIS 2015
names, names, names
OpenCage FOSSGIS 2015
name is one of many factors
ranking examples:
●
Altona
– type: suburb vs train station vs town ins US/Canada
●
Germany
– admin_level=2 (country) vs island
●
Mt everest
– importance: viewpoint vs peak vs island
●
Oktoberfest
– actually a alt_name of Theresienwiese
●
Königsberg
– 10x a peak, 1x old_name of Kaliningrad
●
Hitlerberg
– old_name:1934-1945 of Heigelkopf
OpenCage FOSSGIS 2015
status on wikipedia_articles.bin
●
version 1: wikipedia pageview logs
– https://en.wikipedia.org/wiki/Wikipedia:Notability
●
version 2 (current): parsing wikipedia articles and count links
– last updated 2013
– 80m wikipedia entries + 15m redirects
– 0.6m places in OSM have wikipedia tag set (2013: 0.4m)
●
Version 3 (TBD): parsing wikipedia geo exports
– http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Haupts
eite/Wikipedia-World/en
– 3.4m entries, more languages, regular dumps, new documentaton
●
version 4 (?)
- used wikidata exports
- used by multiple geocoders
OpenCage FOSSGIS 2015
what can mappers do?
●
add wikipedia tags
●
fix administrative levels
●
don't add wrong names (typos)
●
file bugs (github)
http://nominatim.openstreetmap.org/
OpenCage FOSSGIS 2015
… and if all fails: rename city
OpenCage FOSSGIS 2015
Questions ?
mtm@opencagedata.com

More Related Content

More from lokku

Geo-search-location-based-results-for-site-search
Geo-search-location-based-results-for-site-searchGeo-search-location-based-results-for-site-search
Geo-search-location-based-results-for-site-search
lokku
 
Geocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR event
Geocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR eventGeocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR event
Geocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR event
lokku
 
Nestoria new design
Nestoria new designNestoria new design
Nestoria new design
lokku
 
CSS::SpriteMaker in action!
CSS::SpriteMaker in action!CSS::SpriteMaker in action!
CSS::SpriteMaker in action!
lokku
 
Reducing the technical hurdle - why we started OpenCage Data
Reducing the technical hurdle - why we started OpenCage DataReducing the technical hurdle - why we started OpenCage Data
Reducing the technical hurdle - why we started OpenCage Data
lokku
 
Css sprite_maker-1
Css  sprite_maker-1Css  sprite_maker-1
Css sprite_maker-1
lokku
 
Geo-Data for Search Marketing SEM & SEO
Geo-Data for Search Marketing SEM & SEOGeo-Data for Search Marketing SEM & SEO
Geo-Data for Search Marketing SEM & SEO
lokku
 
Making using OSM data simpler - OpenCage Data
Making using OSM data simpler - OpenCage Data Making using OSM data simpler - OpenCage Data
Making using OSM data simpler - OpenCage Data
lokku
 
What’s next in mapping for portals? ppw2012
What’s next in mapping for portals? ppw2012What’s next in mapping for portals? ppw2012
What’s next in mapping for portals? ppw2012
lokku
 
How Nestoria switched to OpenStreetMap maps
How Nestoria switched to OpenStreetMap mapsHow Nestoria switched to OpenStreetMap maps
How Nestoria switched to OpenStreetMap maps
lokku
 
Remote Geocoding
Remote GeocodingRemote Geocoding
Remote Geocoding
lokku
 
Lessons learned in doing lots with few people
Lessons learned in  doing lots with few peopleLessons learned in  doing lots with few people
Lessons learned in doing lots with few people
lokku
 
Mapstraction
MapstractionMapstraction
Mapstraction
lokku
 
Bar Camp London 7
Bar Camp London 7Bar Camp London 7
Bar Camp London 7
lokku
 
How People Search For Locations
How People Search For LocationsHow People Search For Locations
How People Search For Locations
lokku
 
Arbyte - A modular, flexible, scalable job queing and execution system
Arbyte - A modular, flexible, scalable job queing and execution systemArbyte - A modular, flexible, scalable job queing and execution system
Arbyte - A modular, flexible, scalable job queing and execution system
lokku
 
Planning for Debugging
Planning for DebuggingPlanning for Debugging
Planning for Debugging
lokku
 
YAPC::Europe 2008 - Mike Astle - Profiling
YAPC::Europe 2008 - Mike Astle - ProfilingYAPC::Europe 2008 - Mike Astle - Profiling
YAPC::Europe 2008 - Mike Astle - Profiling
lokku
 
SOTM08
SOTM08SOTM08
SOTM08
lokku
 
LPW 2007 - Perl Plumbing
LPW 2007 - Perl PlumbingLPW 2007 - Perl Plumbing
LPW 2007 - Perl Plumbing
lokku
 

More from lokku (20)

Geo-search-location-based-results-for-site-search
Geo-search-location-based-results-for-site-searchGeo-search-location-based-results-for-site-search
Geo-search-location-based-results-for-site-search
 
Geocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR event
Geocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR eventGeocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR event
Geocoding India - talk delivered on 31 Jan 2014 at the Bangalore goeBLR event
 
Nestoria new design
Nestoria new designNestoria new design
Nestoria new design
 
CSS::SpriteMaker in action!
CSS::SpriteMaker in action!CSS::SpriteMaker in action!
CSS::SpriteMaker in action!
 
Reducing the technical hurdle - why we started OpenCage Data
Reducing the technical hurdle - why we started OpenCage DataReducing the technical hurdle - why we started OpenCage Data
Reducing the technical hurdle - why we started OpenCage Data
 
Css sprite_maker-1
Css  sprite_maker-1Css  sprite_maker-1
Css sprite_maker-1
 
Geo-Data for Search Marketing SEM & SEO
Geo-Data for Search Marketing SEM & SEOGeo-Data for Search Marketing SEM & SEO
Geo-Data for Search Marketing SEM & SEO
 
Making using OSM data simpler - OpenCage Data
Making using OSM data simpler - OpenCage Data Making using OSM data simpler - OpenCage Data
Making using OSM data simpler - OpenCage Data
 
What’s next in mapping for portals? ppw2012
What’s next in mapping for portals? ppw2012What’s next in mapping for portals? ppw2012
What’s next in mapping for portals? ppw2012
 
How Nestoria switched to OpenStreetMap maps
How Nestoria switched to OpenStreetMap mapsHow Nestoria switched to OpenStreetMap maps
How Nestoria switched to OpenStreetMap maps
 
Remote Geocoding
Remote GeocodingRemote Geocoding
Remote Geocoding
 
Lessons learned in doing lots with few people
Lessons learned in  doing lots with few peopleLessons learned in  doing lots with few people
Lessons learned in doing lots with few people
 
Mapstraction
MapstractionMapstraction
Mapstraction
 
Bar Camp London 7
Bar Camp London 7Bar Camp London 7
Bar Camp London 7
 
How People Search For Locations
How People Search For LocationsHow People Search For Locations
How People Search For Locations
 
Arbyte - A modular, flexible, scalable job queing and execution system
Arbyte - A modular, flexible, scalable job queing and execution systemArbyte - A modular, flexible, scalable job queing and execution system
Arbyte - A modular, flexible, scalable job queing and execution system
 
Planning for Debugging
Planning for DebuggingPlanning for Debugging
Planning for Debugging
 
YAPC::Europe 2008 - Mike Astle - Profiling
YAPC::Europe 2008 - Mike Astle - ProfilingYAPC::Europe 2008 - Mike Astle - Profiling
YAPC::Europe 2008 - Mike Astle - Profiling
 
SOTM08
SOTM08SOTM08
SOTM08
 
LPW 2007 - Perl Plumbing
LPW 2007 - Perl PlumbingLPW 2007 - Perl Plumbing
LPW 2007 - Perl Plumbing
 

Recently uploaded

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 

Recently uploaded (20)

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 

Geocoding Overview