• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
OpenStreetMap Geocoder Based on Solr
 

OpenStreetMap Geocoder Based on Solr

on

  • 1,535 views

Presented by Ishan Chattopadhyaya, LucidWorks ...

Presented by Ishan Chattopadhyaya, LucidWorks

This talk is on the technical aspects of a new OpenStreetMap geocoder based on Apache Solr & Lucene. Recent changes to Apache Lucene and Apache Solr (4.0 and onwards) have seen a marked improvement in the spatial search capabilities. Also, its improved support for distributed storage and search, via the SolrCloud mode, makes applications using Solr scale easily. OpenStreetMap's current geocoder, Nomainatim, is based on Postgresql/PostGis. Some benefits of using Solr (as compared to a database system like Postgres) for building a geocoder, is robust partial text search, analysis in various languages (stemming, tokenization, stop words etc.), spell check, faceting, highlighting etc. Through this presentation, the author intends to bring out an appreciation for a Solr based geocoder.

Statistics

Views

Total Views
1,535
Views on SlideShare
1,266
Embed Views
269

Actions

Likes
1
Downloads
19
Comments
0

6 Embeds 269

http://www.lucenerevolution.org 199
http://lanyrd.com 63
http://webcache.googleusercontent.com 3
http://lucenerevolution.org 2
https://twitter.com 1
http://feeds.feedburner.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    OpenStreetMap Geocoder Based on Solr OpenStreetMap Geocoder Based on Solr Presentation Transcript

    • Ishan Chattopadhyaya LucidWorks OpenStreetMap Foundation Twitter: @ichattopadhyaya, OSM: chatman
    • What is OpenStreetMap? ● ● Wikipedia of GeoData OpenStreetMap is a project aimed squarely at creating and providing free geographic data such as street maps to anyone who wants them.
    • State of OSM ● Commercial competitors Google Maps – Bing Maps http://tools.geofabrik.de/mc/ – ●
    • The OpenStreetMap Software Stack
    • What is a Geocoder? ● ● Input: raw query Output: geocoordinates
    • Nominatim ● http://nominatim.openstreetmap.org/
    • Goals for the new Geocoder ● Search for: Cities and towns – Streets – Address points – Places of Interest, Businesses, Amenities, Attractions etc. Reverse geocoding – ● ● Support for fuzzy queries
    • Good changes in Lucene/Solr 4.x ● Support for indexing polygons – ● RecursivePrefixTree indexing Special spatial search predicates Contains – IsWithin – Intersects – Etc. Reference: David Smiley's LuceneRevolution presentation – ● ● SolrCloud mode for distributed indexing/searching
    • Architecture www. Geocoder. in Indexer Planet dumps Solr API Layer
    • Indexing: OSM Data format ● Node “A node defines a single geospatial point using a latitude and longitude.” Way – ● “A way is an ordered list of between 2 and 2,000 nodes. Ways are used to represent linear features (vectors), such as rivers or roads.” Relation – ● – “A Relation is an all-purpose data structure that documents a relationship between two or more other objects.”
    • Indexing: Facts and figures ● Number of OSM Nodes in the database = 2071039612 ● Number of OSM Ways in the database = 202570637 ● Number of OSM Relations in the database = 2217240
    • Indexing: Schema name Ireland geo Landsdowne Street admin2 level s <shape> admi n3 admin admin5 admin6 4 Dublin County Dublin popularity admin7 street st_type Ballsbridge Lansdo wne Street
    • Indexing: Schema name Ireland geo popularity Dublin admin2 level 6 <shape> 1 admi n3 admin admin5 admin6 4 Dublin County Dublin admin7 street st_type
    • Indexing: Schema (POIs) name Ireland geo Ballsbridge Hotel admin2 category hotel <shape> admi n3 admin admin5 admin6 4 Dublin County Dublin admin7 Ballsbridge street st_type
    • Searching Raw query Classifier Classifications Validator Valid classifications Structured location + geocodes Geocoder (lookup)
    • Searching: Classification Query Tokenizer Shingles Bloom Filters Classifications
    • Searching: Classification ● ● ● Query Tokenizer Shingles Bloom Filters Classifications Query= “hotels near lansdowne rd dublin” Shingles: hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, lansdowne rd, rd dublin, .., hotels near lansdowne rd dublin
    • Searching: Classification ● ● Query Tokenizer Shingles Bloom Filters Classifications hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, .. hotels Cat Match A2 A4 A5 Streets
    • Searching: Classification ● ● Query Tokenizer Shingles Bloom Filters Classifications hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, .. dublin Cat A2 A4 A5 Streets Match Match
    • Searching: Classification ● ● Query Tokenizer Shingles Bloom Filters Classifications hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, .. lansdowne Cat A2 A4 A5 Streets Match Match
    • Searching: Classifications ● ● Query = “hotels near lansdowne rd dublin” Classifications: hotels = category lansdowne = admin5 lansdowne = street dublin = admin5 dublin = street
    • Searching: Classifications ● ● ● Query = “hotels near lansdowne rd dublin” Classifications: hotels = category lansdowne = admin5 lansdowne = street dublin = admin5 dublin = street Possible permutations: C.5.5 C.S.5 C.5.S C...5 C.5.. etc.
    • Searching: Solr Query ● ● Query = “hotels near lansdowne rd dublin” Possible permutations: C.5.5: +level:5 +admin5:lansdowne +admin5:dublin C.S.5: +level:s +street:lansdowne +admin5:dublin C.5.S: +level:s +street:dublin +admin5:lansdowne C...5: +level:5 +admin5:dublin C.5..: +level:5 +admin5:lansdowne etc.
    • Searching: Solr Query ● ● Query = “hotels near lansdowne rd dublin” Possible permutations: C.5.5: +level:5 +admin5:lansdowne +admin5:dublin C.S.5: +level:s +street:lansdowne +admin5:dublin C.5.S: +level:s +street:dublin +admin5:lansdowne C...5: +level:5 +admin5:dublin C.5..: +level:5 +admin5:lansdowne etc.
    • Searching: Solr Query ● ● Query = “hotels near lansdowne rd dublin” Possible permutations: C.5.5: +level:5 +admin5:lansdowne +admin5:dublin C.S.5: +level:s +street:lansdowne +admin5:dublin C.5.S: +level:s +street:dublin +admin5:lansdowne C...5: +level:5 +admin5:dublin C.5..: +level:5 +admin5:lansdowne etc. "POINT (-6.232063,53.333833)"
    • Searching: Searching for POIs ● ● ● Query = “hotels near lansdowne rd dublin” Query = “hotels near” near "POINT (-6.232063,53.333833)" Solr query: fl=*,score sort=score asc q={!geofilt score=distance filter=false sfield=geo pt= 53.333833,-6.232063 d=10} fq=+category:hotel
    • Searching: Searching for POIs
    • Challenges: Indexing ● Street Associativity ● Incomplete polygons
    • Challenges ● Handling Updates ● Data validation
    • Distributed Search ● Need for distributed search? ● Geographical partitioning
    • Conclusion ● http://www.geocoder.in/ ● Twitter: @ichattopadhyaya