Published on

An overview of how to handle Geo in DBMS form a NoSQL point of view
Hibernate Search spatial module

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • (1 table points, 1 table lines, 1 table polylines, …)
  • (1 table points, 1 table lines, 1 table polylines, …)(no standard indexing)(range queries)(severalhundred of Gb for the whole world roads)
  • , location centric calculations
  • (new keywords, new request syntax)(more than 200 functions to implement)
  • Location Based SystemMapQuest, Mappy, Google Maps
  • No(Geo)SQL

    1. 1. No(Geo)SQL Geographic search in (No)SQL
    2. 2. 8 years as platform architect and deputy CTO Founding partner of NovaCodex since 2008 @NHelleringerMe
    3. 3. Geo in databases What the point ?Why
    4. 4. Geo in databases challenges  Data is complex to store in SQL  Data is bi dimensional  Data is dense  Data is hugeWhy
    5. 5. Multiples dimensions but B-trees sort on one Query dependent index sorting calculation New data structures and algorithms to handle dimensions A two phases search : select and then filterOrigin (challenge)
    6. 6. Geographic Information Systems handling of geometric objects The origins of geography in the information systems are in the needs administrations had to handle data of the real world :  Geology / Geography  Roads, administrative areas for cadastral surveys  Census data  Infrastructure elements (water delivery network, electrical delivery network, communication network) Other needs came when the data became available and use the same tools :  Geo marketing (market areas)Origin (needs)
    7. 7. All you ever hated about SQL … and more !  Complex SQL additions  Full size complex normalized API  Vendor dependent implementations  Not scalableHow
    8. 8. The Open Geospatial Consortium edits a norm : OpenGIS Oracle SQL Server Quad Trees / R-Trees 4 level Grid Index Oracle 4 side dev (1984) Since 2008 version (2007) integrated in Oracle 7 (1992) PostgreSQL Spatialite R-tree-over-GiST R-Trees since PostGIS 1.0 for 8.0 since 3.6.0 (Mar 2008) (Apr 2005) MySQL since Feb 2005, DB2 Spatial Extender since July 2006, Ingres added support very recently Hibernate Spatial is a generic access to OpenGis implementations GIS Software as ESRI, MapInfo, GeoConcept, QuantumGIS use this standard to access dataCurrent Implementations (traditionnal SGBD)
    9. 9. Do we need all this ? Is Geo only for geo centric companies ?Puzzled ?
    10. 10. LBS changed everything !  Maps, geocoding & route planning available  Platforms handle millions of hits/day  Available through multiples APIs  Often for freeHow
    11. 11. MAPS GEOCODING Data is huge and complex Data is huge objects Indexing is geo Not a geo problem Processing capabilities required Expertise extremely valued Provided Provided ROUTE PLANNING POI SEARCH Data is huge Data is less huge (your Not a geo problem business size) Not shard able Indexing is geo May shard Provided Less relevantHow
    12. 12. Location aware data handling of data associated with a latitude/longitude tuple Location became a search criterion :  Geo search The map/the geography is the center of the search process  Proximity search The location is one in many criteria to refine a searchOrigin (needs)
    13. 13. Does NoSQL help ?New Solutions ?
    14. 14. Why does Geo fits a NoSQL approach ? Geo does not fit in traditional ‘pure’ DBMS : First normal form (1NF), many dimensions in one column break the rules (48,23) <?> (47,25) Geo Objects hard to be strictly defined by SQL types : they are fickle Tim Anglade ‘No SQL for fun and profit’ : Geo/hierarchical is one of seven forms of NoSQL to dateGeo as a NoSQL Technology
    15. 15. Extensions to SQL or NoSQL data stores  Quad-trees  R-treesGeo as a NoSQL Technology
    16. 16. quad-tree
    17. 17. Search steps 1) Select  Compute level  Compute boxes ids  Fetch boxes 2) Filter  Compute distance  Select result set Limits  High levelsHow does it work ?
    18. 18. r-tree
    19. 19. Spatial Lucene/Solr, Elastic Search  Quad tree labels in Lucene tokens  Tile indices or GeoHash labels GeoCouch  R-tree in Erlang Neo4J Spatial  R-tree & quad-tree  Object can be stored as graph elementsCurrent Implementations (NoSQL databases)
    20. 20. MongoDb  Geo hashes into MongoDB B-trees  Shard support incoming  Spherical model since 1.7 Pincaster  In memory quad treeCurrent Implementations (NoSQL databases)
    21. 21. How do I build PoI search ?How
    22. 22. Do it in pure SQL !! Use a clustered long, lat index : o Select is done by the cluster on longitude (whish is more selective than latitude !) o Bounding box requests are handled on the index level as latitude is included o Filter with distance calculation can be done by a stored procedure on the database side or in application codePOI Search
    23. 23. Lucene via Hibernate Search o Available in 4.2 beta 1 o Annotation based o Simple to step in o Refine by usage o DSL supportedPOI Search
    24. 24. @Indexed @Spatial public class Hotel { @Latitude Double latitude; @Longitude Double longitude; [...]Sample indexation code
    25. 25. QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( PoI.class ).get(); double centerLatitude= 24; double centerLongitude= 31.5; Query luceneQuery = builder.spatial() .onCoordinates( PoI.class.getName() ) .within( 50, Unit.KM ) .ofLatitude( centerLatitude ) .andLongitude( centerLongitude ) .createQuery();Sample search code
    26. 26. Thank you for listening !End !
    27. 27. couchdb:2008-10-26:en,CouchDB,Python,geo ailable