Ishan Chattopadhyaya

LucidWorks
OpenStreetMap Foundation
Twitter: @ichattopadhyaya, OSM: chatman
What is OpenStreetMap?
●

●

Wikipedia of GeoData
OpenStreetMap is a project aimed squarely
at creating and providing free...
State of OSM
●

Commercial competitors
Google Maps
– Bing Maps
http://tools.geofabrik.de/mc/
–

●
The OpenStreetMap Software Stack
What is a Geocoder?
●
●

Input: raw query
Output: geocoordinates
Nominatim
●

http://nominatim.openstreetmap.org/
Goals for the new Geocoder
●

Search for:
Cities and towns
– Streets
– Address points
– Places of Interest, Businesses, Am...
Good changes in Lucene/Solr 4.x
●

Support for indexing polygons
–

●

RecursivePrefixTree indexing

Special spatial searc...
Architecture
www.
Geocoder.
in

Indexer

Planet dumps

Solr

API Layer
Indexing: OSM Data format
●

Node
“A node defines a single geospatial point using a latitude
and longitude.”
Way
–

●

“A ...
Indexing: Facts and figures
●

Number of OSM Nodes in the database = 2071039612

●

Number of OSM Ways in the database = 2...
Indexing: Schema

name

Ireland

geo

Landsdowne Street

admin2

level
s

<shape>

admi
n3

admin admin5 admin6
4
Dublin
C...
Indexing: Schema

name

Ireland

geo

popularity

Dublin
admin2

level
6

<shape>

1

admi
n3

admin admin5 admin6
4
Dubli...
Indexing: Schema (POIs)

name

Ireland

geo

Ballsbridge Hotel

admin2

category
hotel

<shape>

admi
n3

admin admin5 adm...
Searching

Raw query

Classifier

Classifications

Validator
Valid classifications

Structured location + geocodes

Geocod...
Searching: Classification

Query

Tokenizer

Shingles

Bloom Filters

Classifications
Searching: Classification
●
●
●

Query

Tokenizer

Shingles

Bloom Filters

Classifications

Query= “hotels near lansdowne...
Searching: Classification
●
●

Query

Tokenizer

Shingles

Bloom Filters

Classifications

hotels, near, lansdowne, rd, du...
Searching: Classification
●
●

Query

Tokenizer

Shingles

Bloom Filters

Classifications

hotels, near, lansdowne, rd, du...
Searching: Classification
●
●

Query

Tokenizer

Shingles

Bloom Filters

Classifications

hotels, near, lansdowne, rd, du...
Searching: Classifications
●

●

Query = “hotels near lansdowne rd dublin”
Classifications:
hotels = category
lansdowne = ...
Searching: Classifications
●

●

●

Query = “hotels near lansdowne rd dublin”
Classifications:
hotels = category
lansdowne...
Searching: Solr Query
●

●

Query = “hotels near lansdowne rd dublin”
Possible permutations:
C.5.5: +level:5 +admin5:lansd...
Searching: Solr Query
●

●

Query = “hotels near lansdowne rd dublin”
Possible permutations:
C.5.5: +level:5 +admin5:lansd...
Searching: Solr Query
●

●

Query = “hotels near lansdowne rd dublin”
Possible permutations:
C.5.5: +level:5 +admin5:lansd...
Searching: Searching for POIs
●

●
●

Query = “hotels near lansdowne rd dublin”
Query = “hotels near” near "POINT (-6.2320...
Searching: Searching for POIs
Challenges: Indexing
●

Street Associativity

●

Incomplete polygons
Challenges
●

Handling Updates

●

Data validation
Distributed Search
●

Need for distributed search?

●

Geographical partitioning
Conclusion
●

http://www.geocoder.in/

●

Twitter: @ichattopadhyaya
OpenStreetMap Geocoder Based on Solr
Upcoming SlideShare
Loading in …5
×

OpenStreetMap Geocoder Based on Solr

4,078 views

Published on

Presented by Ishan Chattopadhyaya, LucidWorks

This talk is on the technical aspects of a new OpenStreetMap geocoder based on Apache Solr & Lucene. Recent changes to Apache Lucene and Apache Solr (4.0 and onwards) have seen a marked improvement in the spatial search capabilities. Also, its improved support for distributed storage and search, via the SolrCloud mode, makes applications using Solr scale easily. OpenStreetMap's current geocoder, Nomainatim, is based on Postgresql/PostGis. Some benefits of using Solr (as compared to a database system like Postgres) for building a geocoder, is robust partial text search, analysis in various languages (stemming, tokenization, stop words etc.), spell check, faceting, highlighting etc. Through this presentation, the author intends to bring out an appreciation for a Solr based geocoder.

Published in: Technology, Travel
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,078
On SlideShare
0
From Embeds
0
Number of Embeds
409
Actions
Shares
0
Downloads
54
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

OpenStreetMap Geocoder Based on Solr

  1. 1. Ishan Chattopadhyaya LucidWorks OpenStreetMap Foundation Twitter: @ichattopadhyaya, OSM: chatman
  2. 2. What is OpenStreetMap? ● ● Wikipedia of GeoData OpenStreetMap is a project aimed squarely at creating and providing free geographic data such as street maps to anyone who wants them.
  3. 3. State of OSM ● Commercial competitors Google Maps – Bing Maps http://tools.geofabrik.de/mc/ – ●
  4. 4. The OpenStreetMap Software Stack
  5. 5. What is a Geocoder? ● ● Input: raw query Output: geocoordinates
  6. 6. Nominatim ● http://nominatim.openstreetmap.org/
  7. 7. Goals for the new Geocoder ● Search for: Cities and towns – Streets – Address points – Places of Interest, Businesses, Amenities, Attractions etc. Reverse geocoding – ● ● Support for fuzzy queries
  8. 8. Good changes in Lucene/Solr 4.x ● Support for indexing polygons – ● RecursivePrefixTree indexing Special spatial search predicates Contains – IsWithin – Intersects – Etc. Reference: David Smiley's LuceneRevolution presentation – ● ● SolrCloud mode for distributed indexing/searching
  9. 9. Architecture www. Geocoder. in Indexer Planet dumps Solr API Layer
  10. 10. Indexing: OSM Data format ● Node “A node defines a single geospatial point using a latitude and longitude.” Way – ● “A way is an ordered list of between 2 and 2,000 nodes. Ways are used to represent linear features (vectors), such as rivers or roads.” Relation – ● – “A Relation is an all-purpose data structure that documents a relationship between two or more other objects.”
  11. 11. Indexing: Facts and figures ● Number of OSM Nodes in the database = 2071039612 ● Number of OSM Ways in the database = 202570637 ● Number of OSM Relations in the database = 2217240
  12. 12. Indexing: Schema name Ireland geo Landsdowne Street admin2 level s <shape> admi n3 admin admin5 admin6 4 Dublin County Dublin popularity admin7 street st_type Ballsbridge Lansdo wne Street
  13. 13. Indexing: Schema name Ireland geo popularity Dublin admin2 level 6 <shape> 1 admi n3 admin admin5 admin6 4 Dublin County Dublin admin7 street st_type
  14. 14. Indexing: Schema (POIs) name Ireland geo Ballsbridge Hotel admin2 category hotel <shape> admi n3 admin admin5 admin6 4 Dublin County Dublin admin7 Ballsbridge street st_type
  15. 15. Searching Raw query Classifier Classifications Validator Valid classifications Structured location + geocodes Geocoder (lookup)
  16. 16. Searching: Classification Query Tokenizer Shingles Bloom Filters Classifications
  17. 17. Searching: Classification ● ● ● Query Tokenizer Shingles Bloom Filters Classifications Query= “hotels near lansdowne rd dublin” Shingles: hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, lansdowne rd, rd dublin, .., hotels near lansdowne rd dublin
  18. 18. Searching: Classification ● ● Query Tokenizer Shingles Bloom Filters Classifications hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, .. hotels Cat Match A2 A4 A5 Streets
  19. 19. Searching: Classification ● ● Query Tokenizer Shingles Bloom Filters Classifications hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, .. dublin Cat A2 A4 A5 Streets Match Match
  20. 20. Searching: Classification ● ● Query Tokenizer Shingles Bloom Filters Classifications hotels, near, lansdowne, rd, dublin, hotels near, near lansdowne, .. lansdowne Cat A2 A4 A5 Streets Match Match
  21. 21. Searching: Classifications ● ● Query = “hotels near lansdowne rd dublin” Classifications: hotels = category lansdowne = admin5 lansdowne = street dublin = admin5 dublin = street
  22. 22. Searching: Classifications ● ● ● Query = “hotels near lansdowne rd dublin” Classifications: hotels = category lansdowne = admin5 lansdowne = street dublin = admin5 dublin = street Possible permutations: C.5.5 C.S.5 C.5.S C...5 C.5.. etc.
  23. 23. Searching: Solr Query ● ● Query = “hotels near lansdowne rd dublin” Possible permutations: C.5.5: +level:5 +admin5:lansdowne +admin5:dublin C.S.5: +level:s +street:lansdowne +admin5:dublin C.5.S: +level:s +street:dublin +admin5:lansdowne C...5: +level:5 +admin5:dublin C.5..: +level:5 +admin5:lansdowne etc.
  24. 24. Searching: Solr Query ● ● Query = “hotels near lansdowne rd dublin” Possible permutations: C.5.5: +level:5 +admin5:lansdowne +admin5:dublin C.S.5: +level:s +street:lansdowne +admin5:dublin C.5.S: +level:s +street:dublin +admin5:lansdowne C...5: +level:5 +admin5:dublin C.5..: +level:5 +admin5:lansdowne etc.
  25. 25. Searching: Solr Query ● ● Query = “hotels near lansdowne rd dublin” Possible permutations: C.5.5: +level:5 +admin5:lansdowne +admin5:dublin C.S.5: +level:s +street:lansdowne +admin5:dublin C.5.S: +level:s +street:dublin +admin5:lansdowne C...5: +level:5 +admin5:dublin C.5..: +level:5 +admin5:lansdowne etc. "POINT (-6.232063,53.333833)"
  26. 26. Searching: Searching for POIs ● ● ● Query = “hotels near lansdowne rd dublin” Query = “hotels near” near "POINT (-6.232063,53.333833)" Solr query: fl=*,score sort=score asc q={!geofilt score=distance filter=false sfield=geo pt= 53.333833,-6.232063 d=10} fq=+category:hotel
  27. 27. Searching: Searching for POIs
  28. 28. Challenges: Indexing ● Street Associativity ● Incomplete polygons
  29. 29. Challenges ● Handling Updates ● Data validation
  30. 30. Distributed Search ● Need for distributed search? ● Geographical partitioning
  31. 31. Conclusion ● http://www.geocoder.in/ ● Twitter: @ichattopadhyaya

×