Presented by Ishan Chattopadhyaya, LucidWorks
This talk is on the technical aspects of a new OpenStreetMap geocoder based on Apache Solr & Lucene. Recent changes to Apache Lucene and Apache Solr (4.0 and onwards) have seen a marked improvement in the spatial search capabilities. Also, its improved support for distributed storage and search, via the SolrCloud mode, makes applications using Solr scale easily. OpenStreetMap's current geocoder, Nomainatim, is based on Postgresql/PostGis. Some benefits of using Solr (as compared to a database system like Postgres) for building a geocoder, is robust partial text search, analysis in various languages (stemming, tokenization, stop words etc.), spell check, faceting, highlighting etc. Through this presentation, the author intends to bring out an appreciation for a Solr based geocoder.
3. What is OpenStreetMap?
●
●
Wikipedia of GeoData
OpenStreetMap is a project aimed squarely
at creating and providing free geographic
data such as street maps to anyone who
wants them.
8. Goals for the new Geocoder
●
Search for:
Cities and towns
– Streets
– Address points
– Places of Interest, Businesses, Amenities, Attractions etc.
Reverse geocoding
–
●
●
Support for fuzzy queries
9. Good changes in Lucene/Solr 4.x
●
Support for indexing polygons
–
●
RecursivePrefixTree indexing
Special spatial search predicates
Contains
– IsWithin
– Intersects
– Etc.
Reference: David Smiley's LuceneRevolution presentation
–
●
●
SolrCloud mode for distributed indexing/searching
11. Indexing: OSM Data format
●
Node
“A node defines a single geospatial point using a latitude
and longitude.”
Way
–
●
“A way is an ordered list of between 2 and 2,000 nodes.
Ways are used to represent linear features (vectors), such
as rivers or roads.”
Relation
–
●
–
“A Relation is an all-purpose data structure that documents
a relationship between two or more other objects.”
12. Indexing: Facts and figures
●
Number of OSM Nodes in the database = 2071039612
●
Number of OSM Ways in the database = 202570637
●
Number of OSM Relations in the database = 2217240