No(Geo)SQL
  Geographic search in (No)SQL
8 years mappy.com as platform architect
     and deputy CTO



     Founding partner of NovaCodex since 2008

     @NHelleringer




Me
Geo in databases
      What the point ?




Why
Geo in databases challenges

       Data is complex to store in SQL

       Data is bi dimensional

       Data is dense

       Data is huge




Why
Multiples dimensions but B-trees sort on one
      Query dependent index sorting calculation




      New data structures and algorithms to handle dimensions
      A two phases search : select and then filter




Origin (challenge)
Geographic Information Systems
                                handling of geometric objects

     The origins of geography in the information systems are in the needs
     administrations had to handle data of the real world :
           Geology / Geography
           Roads, administrative areas for cadastral surveys
           Census data
           Infrastructure elements (water delivery network, electrical delivery
            network, communication network)

     Other needs came when the data became available and use the same
     tools :
         Geo marketing (market areas)




Origin (needs)
All you ever hated about SQL … and more !

       Complex SQL additions

       Full size complex normalized API

       Vendor dependent implementations

       Not scalable




How
The Open Geospatial Consortium edits a norm : OpenGIS
                  Oracle
                                                         SQL Server
           Quad Trees / R-Trees
                                                      4 level Grid Index
         Oracle 4 side dev (1984)
                                                  Since 2008 version (2007)
       integrated in Oracle 7 (1992)
                                                         PostgreSQL
                 Spatialite
                                                       R-tree-over-GiST
                  R-Trees
                                                   since PostGIS 1.0 for 8.0
           since 3.6.0 (Mar 2008)
                                                          (Apr 2005)

    MySQL since Feb 2005, DB2 Spatial Extender since July 2006, Ingres added
    support very recently

    Hibernate Spatial is a generic access to OpenGis implementations

    GIS Software as ESRI, MapInfo, GeoConcept, QuantumGIS use this standard to
    access data




Current Implementations (traditionnal SGBD)
Do we need all this ?

            Is Geo only for geo
            centric companies ?



Puzzled ?
LBS changed everything !

       Maps, geocoding & route planning available

       Platforms handle millions of hits/day

       Available through multiples APIs

       Often for free




How
MAPS                          GEOCODING

  Data is huge and complex           Data is huge
  objects
  Indexing is geo
                                     Not a geo problem
  Processing capabilities required   Expertise extremely valued

             Provided                           Provided
        ROUTE PLANNING                        POI SEARCH

  Data is huge                       Data is less huge (your
  Not a geo problem                  business size)
  Not shard able                     Indexing is geo
                                     May shard
             Provided                          Less relevant

How
Location aware data
                handling of data associated with a latitude/longitude tuple


     Location became a search criterion :
         Geo search
          The map/the geography is the center of the search process
         Proximity search
           The location is one in many criteria to refine a search




Origin (needs)
Does NoSQL
      help ?


New Solutions ?
Why does Geo fits a NoSQL approach ?

   Geo does not fit in traditional ‘pure’ DBMS : First normal form
   (1NF), many dimensions in one column break the rules
                        (48,23) <?> (47,25)

   Geo Objects hard to be strictly defined by SQL types : they are
   fickle


   Tim Anglade ‘No SQL for fun and profit’ : Geo/hierarchical is
   one of seven forms of NoSQL to date




Geo as a NoSQL Technology
Extensions to SQL or NoSQL data stores
      Quad-trees
      R-trees




Geo as a NoSQL Technology
quad-tree
Search steps
     1) Select
           Compute level
           Compute boxes ids
           Fetch boxes

     2) Filter
           Compute distance
           Select result set


   Limits
      High levels




How does it work ?
r-tree
Spatial Lucene/Solr, Elastic Search
       Quad tree labels in Lucene tokens
       Tile indices or GeoHash labels




    GeoCouch
       R-tree in Erlang


    Neo4J Spatial
       R-tree & quad-tree
       Object can be stored as graph elements




Current Implementations (NoSQL databases)
MongoDb
       Geo hashes into MongoDB B-trees
       Shard support incoming
       Spherical model since 1.7




    Pincaster
       In memory quad tree




Current Implementations (NoSQL databases)
How do I build PoI search ?




How
Do it in pure SQL !!

   Use a clustered long, lat index :
     o Select is done by the cluster on longitude
        (whish is more selective than latitude !)
     o Bounding box requests are handled on the
        index level as latitude is included
     o Filter with distance calculation can be
        done by a stored procedure on the
        database side or in application code




POI Search
Lucene via Hibernate Search

     o   Available in 4.2 beta 1
     o   Annotation based
     o   Simple to step in
     o   Refine by usage
     o   DSL supported




POI Search
@Indexed
   @Spatial
   public class Hotel {
      @Latitude
      Double latitude;
      @Longitude
      Double longitude;
      [...]




Sample indexation code
QueryBuilder builder =
   fullTextSession.getSearchFactory()
      .buildQueryBuilder().forEntity( PoI.class ).get();

   double centerLatitude= 24;
   double centerLongitude= 31.5;

   Query luceneQuery = builder.spatial()
      .onCoordinates( PoI.class.getName() )
      .within( 50, Unit.KM )
      .ofLatitude( centerLatitude )
      .andLongitude( centerLongitude )
      .createQuery();




Sample search code
Thank you for listening !




End !
http://www.slideshare.net/timanglade/nosql-for-fun-profit
      http://en.wikipedia.org/wiki/First_normal_form
      http://en.wikipedia.org/wiki/Quadtree
      http://technet.microsoft.com/en-us/library/bb964712.aspx
      http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html
      http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-
      couchdb:2008-10-26:en,CouchDB,Python,geo
      http://wiki.neo4j.org/content/Neo4j_Spatial
      http://www.osgeo.org/
      http://relation.to/Bloggers/SpatialQueriesFirstBetaForHibernateSearch42IsAv
      ailable

      http://www.novacodex.net/




Ref

No(Geo)SQL

  • 1.
    No(Geo)SQL Geographicsearch in (No)SQL
  • 2.
    8 years mappy.comas platform architect and deputy CTO Founding partner of NovaCodex since 2008 @NHelleringer Me
  • 3.
    Geo in databases What the point ? Why
  • 4.
    Geo in databaseschallenges  Data is complex to store in SQL  Data is bi dimensional  Data is dense  Data is huge Why
  • 5.
    Multiples dimensions butB-trees sort on one Query dependent index sorting calculation New data structures and algorithms to handle dimensions A two phases search : select and then filter Origin (challenge)
  • 6.
    Geographic Information Systems handling of geometric objects The origins of geography in the information systems are in the needs administrations had to handle data of the real world :  Geology / Geography  Roads, administrative areas for cadastral surveys  Census data  Infrastructure elements (water delivery network, electrical delivery network, communication network) Other needs came when the data became available and use the same tools :  Geo marketing (market areas) Origin (needs)
  • 7.
    All you everhated about SQL … and more !  Complex SQL additions  Full size complex normalized API  Vendor dependent implementations  Not scalable How
  • 8.
    The Open GeospatialConsortium edits a norm : OpenGIS Oracle SQL Server Quad Trees / R-Trees 4 level Grid Index Oracle 4 side dev (1984) Since 2008 version (2007) integrated in Oracle 7 (1992) PostgreSQL Spatialite R-tree-over-GiST R-Trees since PostGIS 1.0 for 8.0 since 3.6.0 (Mar 2008) (Apr 2005) MySQL since Feb 2005, DB2 Spatial Extender since July 2006, Ingres added support very recently Hibernate Spatial is a generic access to OpenGis implementations GIS Software as ESRI, MapInfo, GeoConcept, QuantumGIS use this standard to access data Current Implementations (traditionnal SGBD)
  • 10.
    Do we needall this ? Is Geo only for geo centric companies ? Puzzled ?
  • 11.
    LBS changed everything!  Maps, geocoding & route planning available  Platforms handle millions of hits/day  Available through multiples APIs  Often for free How
  • 12.
    MAPS GEOCODING Data is huge and complex Data is huge objects Indexing is geo Not a geo problem Processing capabilities required Expertise extremely valued Provided Provided ROUTE PLANNING POI SEARCH Data is huge Data is less huge (your Not a geo problem business size) Not shard able Indexing is geo May shard Provided Less relevant How
  • 13.
    Location aware data handling of data associated with a latitude/longitude tuple Location became a search criterion :  Geo search The map/the geography is the center of the search process  Proximity search The location is one in many criteria to refine a search Origin (needs)
  • 15.
    Does NoSQL help ? New Solutions ?
  • 16.
    Why does Geofits a NoSQL approach ? Geo does not fit in traditional ‘pure’ DBMS : First normal form (1NF), many dimensions in one column break the rules (48,23) <?> (47,25) Geo Objects hard to be strictly defined by SQL types : they are fickle Tim Anglade ‘No SQL for fun and profit’ : Geo/hierarchical is one of seven forms of NoSQL to date Geo as a NoSQL Technology
  • 17.
    Extensions to SQLor NoSQL data stores  Quad-trees  R-trees Geo as a NoSQL Technology
  • 18.
  • 19.
    Search steps 1) Select  Compute level  Compute boxes ids  Fetch boxes 2) Filter  Compute distance  Select result set Limits  High levels How does it work ?
  • 20.
  • 21.
    Spatial Lucene/Solr, ElasticSearch  Quad tree labels in Lucene tokens  Tile indices or GeoHash labels GeoCouch  R-tree in Erlang Neo4J Spatial  R-tree & quad-tree  Object can be stored as graph elements Current Implementations (NoSQL databases)
  • 22.
    MongoDb  Geo hashes into MongoDB B-trees  Shard support incoming  Spherical model since 1.7 Pincaster  In memory quad tree Current Implementations (NoSQL databases)
  • 23.
    How do Ibuild PoI search ? How
  • 24.
    Do it inpure SQL !! Use a clustered long, lat index : o Select is done by the cluster on longitude (whish is more selective than latitude !) o Bounding box requests are handled on the index level as latitude is included o Filter with distance calculation can be done by a stored procedure on the database side or in application code POI Search
  • 25.
    Lucene via HibernateSearch o Available in 4.2 beta 1 o Annotation based o Simple to step in o Refine by usage o DSL supported POI Search
  • 26.
    @Indexed @Spatial public class Hotel { @Latitude Double latitude; @Longitude Double longitude; [...] Sample indexation code
  • 27.
    QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( PoI.class ).get(); double centerLatitude= 24; double centerLongitude= 31.5; Query luceneQuery = builder.spatial() .onCoordinates( PoI.class.getName() ) .within( 50, Unit.KM ) .ofLatitude( centerLatitude ) .andLongitude( centerLongitude ) .createQuery(); Sample search code
  • 28.
    Thank you forlistening ! End !
  • 29.
    http://www.slideshare.net/timanglade/nosql-for-fun-profit http://en.wikipedia.org/wiki/First_normal_form http://en.wikipedia.org/wiki/Quadtree http://technet.microsoft.com/en-us/library/bb964712.aspx http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with- couchdb:2008-10-26:en,CouchDB,Python,geo http://wiki.neo4j.org/content/Neo4j_Spatial http://www.osgeo.org/ http://relation.to/Bloggers/SpatialQueriesFirstBetaForHibernateSearch42IsAv ailable http://www.novacodex.net/ Ref

Editor's Notes

  • #2 (1 table points, 1 table lines, 1 table polylines, …)
  • #5 (1 table points, 1 table lines, 1 table polylines, …)(no standard indexing)(range queries)(severalhundred of Gb for the whole world roads)
  • #6 , location centric calculations
  • #8 (new keywords, new request syntax)(more than 200 functions to implement)
  • #12 Location Based SystemMapQuest, Mappy, Google Maps