Multi-Thematic Spatial Databases
        Experience designing and implementing




                          Dr. Conor Mc Elhinney
                                  Dr. Paul Lewis
                               Postdoctoral Researcher
                                Mobile Mapping Group
What we do
     Store
What we do
     Store

     Access
What we do
     Store

     Access

     Process
What we do
     Store

     Access

     Process

     Visualise
What we do
     Store

     Access
                  Voluminous
                 Geospatial Data
     Process

     Visualise
Mobile Mapping Systems Group
     1 Senior Researcher

     2 Post-docs

     2 PhDs

     Computer Science, GIS, Surveying
Mobile Mapping Systems Group
Mobile Mapping Systems Group
Mobile Mapping Systems Group
Mobile Mapping Systems Group
i2maps Group
     1 Senior Researcher

     1 Post-doc

     2 PhDs         +1 PHD GeoCrowd

     Computer Science, Maths, GIS
     Dr. Alexei Pozdnoukhov, Dr. Christian Kaisler
     Fergal Walsh
Learning from data Streams
       i2maps What we do at NCG:
        Scalable methods of spatial analytics
Learning from data Streams
       i2maps What we do at NCG:
        Scalable methods of spatial analytics
        Machine learning and data mining
Learning from data Streams
                       i2maps What we do at NCG:
                         Scalable methods of spatial analytics
                         Machine learning and data mining



      Stream Handler
                                              {x, y}             M
                                                         f (.)         i   K (., xi )
       Analysis and                                              i 1
        Modeling
                                                                             new

                                                       { i}                (.), ( xi )   H

  Dictionary of models                   MapReduce
                                                                             xOLD
Learning from data Streams
                       i2maps What we do at NCG:
                         Scalable methods of spatial analytics
                         Machine learning and data mining
                         Distributed approaches to spatial statistics

      Stream Handler
                                               {x, y}             M
                                                          f (.)         i   K (., xi )
       Analysis and                                               i 1
        Modeling
                                                                              new

                                                        { i}                (.), ( xi )   H

  Dictionary of models                    MapReduce
                                                                              xOLD
What we have experience with
     Handling and proccessing of TBs of
What we have experience with
     Handling and proccessing of TBs of

          Multi-thematic data
What we have experience with
     Handling and proccessing of TBs of

          Multi-thematic data
          Temporal data
What we have experience with
     Handling and proccessing of TBs of

          Multi-thematic data
          Temporal data
          Multi-sensor data
What we have experience with
     Handling and proccessing of TBs of

          Multi-thematic data
          Temporal data
          Multi-sensor data

     using spatial information
Data Handling
Data Handling
LiDAR
        A laser scanning tech, laser radar
LiDAR
        A laser scanning tech, laser radar

        > 20Gbs an hour
LiDAR
        A laser scanning tech, laser radar

        > 30Gbs an hour after geocoding
LiDAR
        A laser scanning tech, laser radar

        > 30Gbs an hour after geocoding

        > 6 attributes output from scanner
LiDAR
        A laser scanning tech, laser radar

        > 30Gbs an hour after geocoding

        > 6 attributes output from scanner

        Between 1 pt/m2 and 2000 pt/m2
LiDAR
LiDAR
LiDAR
LiDAR
LiDAR
Imagery
     Developing trend to store as videos
     / blobs
Imagery
     Developing trend to store as videos
     / blobs

     Store metadata in SDB
Imagery
     Developing trend to store as videos
     / blobs

     Store metadata in SDB

     Using HTML5 or queries can link
     to frames
User Generated
  Comments
User Generated
  Comments       Video / Imagery
User Generated
  Comments       Video / Imagery




                     Opinion
User Generated
  Comments       Video / Imagery




                     Opinion
Twitter
Twitter
Twitter

         25%
      contain links
Facebook


      500 Million Active
           Users
Facebook


      250 Million Active
        Mobile Users
Facebook


    10 Million pieces of
      content per day
Hard Drive Capacity
Hard Drive Capacity

  Capacity is increasing
         linearly
CPU vs HD speed




             CS111 UCLA 2006
CPU vs HD speed

  We can process more
   than we can store

             CS111 UCLA 2006
CPU vs HD speed
IDC - The Diverse and Exploding Digital Universe




                            CS111 UCLA 2006
What next
     Process the data in real time
What next
     Process the data in real time

     Extract or compress to find a
     model for the relevant data
What next
     Process the data in real time

     Extract or compress to find a
     model for the relevant data

     Store this model for future
     processing
What next
     Process the data in real time

     Extract or compress to find a
     model for the relevant data

     Store this model for future
     processing

     Leads to the problem of what
     do we model and store?
Enabling GeoSpatial Data
Processing data streams
     Human activity on geo-referenced
     communication networks
Processing data streams
     Human activity on geo-referenced
     communication networks
Processing data streams
     Human activity on geo-referenced
     communication networks
     At least two categories we need to
     understand:
Processing data streams
     Human activity on geo-referenced
     communication networks
     At least two categories we need to
     understand:

            Dynamics of links
Processing data streams
     Human activity on geo-referenced
     communication networks
     At least two categories we need to
     understand:

            Dynamics of links

            Activity level at nodes
Enabling data speak for themselves
                      Air Quality Sensor         Weather Measurements                  VGI Feed (e.g. Twitter)   Surveillance Camera

                          SMS                     Web Page                                 XML                       Video

                                  Push                           Polling                             Stream                  Stream

                          Data Receiver                  Data Crawler                      Stream Handler           Stream Handler




Static Data
 Static Data
   Static Data                                                                                            Analysis and
                                           Spatial Database
                                                                                                           Modeling

                                                      Spatio-Temporal Data
                    i2maps
                                              Web Service

                 KML/CSV/etc                   GeoJSON                     Spatio-Temporal Queries


                                  Interactive Spatio-Temporal
                                     Information Visualiser
Enabling data speak for themselves
                      Air Quality Sensor      Weather Measurements                  VGI Feed (e.g. Twitter)   Surveillance Camera

                          SMS                  Web Page                                 XML                       Video

                                  Push                        Polling                             Stream                  Stream

                          Data Receiver               Data Crawler                      Stream Handler           Stream Handler




 Spatial
Static Data
 Static Data
   Static Data                                                                                         Analysis and
Database                                Spatial Database
                                     Dictionary of models                                               Modeling

                                                   Spatio-Temporal Data
                    i2maps
                                           Web Service

                 KML/CSV/etc                GeoJSON                     Spatio-Temporal Queries


                                  Interactive Spatio-Temporal
                                     Information Visualiser
Storage
What exists
     Files / DBs / SDBs
What exists
     Files / DBs / SDBs

     Files still extremely common
What exists
     Files / DBs / SDBs

     Files still extremely common

     SDBs are what is needed
What exists
     Files / DBs / SDBs

     Files still extremely common

     SDBs are what is needed

     Multi-source, sensor, type data
Our Aims
     Unified approach to storing multi-
     thematic data
Our Aims
     Unified approach to storing multi-
     thematic data
     Efficient data upload / access/
     storage
Our Aims
     Unified approach to storing multi-
     thematic data
     Efficient data upload / access/
     storage
     Searchable in Time/ Space / by
     Attributes
Our Aims
     Unified approach to storing multi-
     thematic data
     Efficient data upload / access/
     storage
     Searchable in Time/ Space / by
     Attributes
     Incorporating Visualisations into
     all solutions
Our hardware
          3 Processing Servers

          8 Intel Xeons, 2.1- 2.8 GHz

          72 GBs RAM
Our hardware
          3 Processing Servers

          8 Intel Xeons, 2.1- 2.8 GHz

          72 GBs RAM

          1 Storage Server

          7TBs Raided Drives
Our Developed Systems
                        LiDAR / Image
                          based SDB
Our Developed Systems
                        LiDAR / Image
                          based SDB




GeoComputation
   Platform
Database storage experience


Optimisation of upload of large (GBs)
        spatial files to SDB.
Database storage experience


Database optimisation to suit system
           architecture
Database storage experience


      Storage of multiple data
           types/sources
Watch out for
     Spatial Index size V RAM
Watch out for
     Spatial Index size V RAM

     Expected no. of concurrent users
Watch out for
     Spatial Index size V RAM

     Expected no. of concurrent users

     HD capacity V daily data
     throughput

Multi-thematic spatial databases

  • 1.
    Multi-Thematic Spatial Databases Experience designing and implementing Dr. Conor Mc Elhinney Dr. Paul Lewis Postdoctoral Researcher Mobile Mapping Group
  • 2.
  • 3.
    What we do Store Access
  • 4.
    What we do Store Access Process
  • 5.
    What we do Store Access Process Visualise
  • 6.
    What we do Store Access Voluminous Geospatial Data Process Visualise
  • 7.
    Mobile Mapping SystemsGroup 1 Senior Researcher 2 Post-docs 2 PhDs Computer Science, GIS, Surveying
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    i2maps Group 1 Senior Researcher 1 Post-doc 2 PhDs +1 PHD GeoCrowd Computer Science, Maths, GIS Dr. Alexei Pozdnoukhov, Dr. Christian Kaisler Fergal Walsh
  • 13.
    Learning from dataStreams i2maps What we do at NCG: Scalable methods of spatial analytics
  • 14.
    Learning from dataStreams i2maps What we do at NCG: Scalable methods of spatial analytics Machine learning and data mining
  • 15.
    Learning from dataStreams i2maps What we do at NCG: Scalable methods of spatial analytics Machine learning and data mining Stream Handler {x, y} M f (.) i K (., xi ) Analysis and i 1 Modeling new { i} (.), ( xi ) H Dictionary of models MapReduce xOLD
  • 16.
    Learning from dataStreams i2maps What we do at NCG: Scalable methods of spatial analytics Machine learning and data mining Distributed approaches to spatial statistics Stream Handler {x, y} M f (.) i K (., xi ) Analysis and i 1 Modeling new { i} (.), ( xi ) H Dictionary of models MapReduce xOLD
  • 17.
    What we haveexperience with Handling and proccessing of TBs of
  • 18.
    What we haveexperience with Handling and proccessing of TBs of Multi-thematic data
  • 19.
    What we haveexperience with Handling and proccessing of TBs of Multi-thematic data Temporal data
  • 20.
    What we haveexperience with Handling and proccessing of TBs of Multi-thematic data Temporal data Multi-sensor data
  • 21.
    What we haveexperience with Handling and proccessing of TBs of Multi-thematic data Temporal data Multi-sensor data using spatial information
  • 22.
  • 23.
  • 24.
    LiDAR A laser scanning tech, laser radar
  • 25.
    LiDAR A laser scanning tech, laser radar > 20Gbs an hour
  • 26.
    LiDAR A laser scanning tech, laser radar > 30Gbs an hour after geocoding
  • 27.
    LiDAR A laser scanning tech, laser radar > 30Gbs an hour after geocoding > 6 attributes output from scanner
  • 28.
    LiDAR A laser scanning tech, laser radar > 30Gbs an hour after geocoding > 6 attributes output from scanner Between 1 pt/m2 and 2000 pt/m2
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    Imagery Developing trend to store as videos / blobs
  • 35.
    Imagery Developing trend to store as videos / blobs Store metadata in SDB
  • 36.
    Imagery Developing trend to store as videos / blobs Store metadata in SDB Using HTML5 or queries can link to frames
  • 37.
  • 38.
    User Generated Comments Video / Imagery
  • 39.
    User Generated Comments Video / Imagery Opinion
  • 40.
    User Generated Comments Video / Imagery Opinion
  • 41.
  • 42.
  • 43.
    Twitter 25% contain links
  • 44.
    Facebook 500 Million Active Users
  • 45.
    Facebook 250 Million Active Mobile Users
  • 46.
    Facebook 10 Million pieces of content per day
  • 47.
  • 48.
    Hard Drive Capacity Capacity is increasing linearly
  • 49.
    CPU vs HDspeed CS111 UCLA 2006
  • 50.
    CPU vs HDspeed We can process more than we can store CS111 UCLA 2006
  • 51.
    CPU vs HDspeed IDC - The Diverse and Exploding Digital Universe CS111 UCLA 2006
  • 52.
    What next Process the data in real time
  • 53.
    What next Process the data in real time Extract or compress to find a model for the relevant data
  • 54.
    What next Process the data in real time Extract or compress to find a model for the relevant data Store this model for future processing
  • 55.
    What next Process the data in real time Extract or compress to find a model for the relevant data Store this model for future processing Leads to the problem of what do we model and store?
  • 56.
  • 57.
    Processing data streams Human activity on geo-referenced communication networks
  • 58.
    Processing data streams Human activity on geo-referenced communication networks
  • 59.
    Processing data streams Human activity on geo-referenced communication networks At least two categories we need to understand:
  • 60.
    Processing data streams Human activity on geo-referenced communication networks At least two categories we need to understand: Dynamics of links
  • 61.
    Processing data streams Human activity on geo-referenced communication networks At least two categories we need to understand: Dynamics of links Activity level at nodes
  • 62.
    Enabling data speakfor themselves Air Quality Sensor Weather Measurements VGI Feed (e.g. Twitter) Surveillance Camera SMS Web Page XML Video Push Polling Stream Stream Data Receiver Data Crawler Stream Handler Stream Handler Static Data Static Data Static Data Analysis and Spatial Database Modeling Spatio-Temporal Data i2maps Web Service KML/CSV/etc GeoJSON Spatio-Temporal Queries Interactive Spatio-Temporal Information Visualiser
  • 63.
    Enabling data speakfor themselves Air Quality Sensor Weather Measurements VGI Feed (e.g. Twitter) Surveillance Camera SMS Web Page XML Video Push Polling Stream Stream Data Receiver Data Crawler Stream Handler Stream Handler Spatial Static Data Static Data Static Data Analysis and Database Spatial Database Dictionary of models Modeling Spatio-Temporal Data i2maps Web Service KML/CSV/etc GeoJSON Spatio-Temporal Queries Interactive Spatio-Temporal Information Visualiser
  • 64.
  • 65.
    What exists Files / DBs / SDBs
  • 66.
    What exists Files / DBs / SDBs Files still extremely common
  • 67.
    What exists Files / DBs / SDBs Files still extremely common SDBs are what is needed
  • 68.
    What exists Files / DBs / SDBs Files still extremely common SDBs are what is needed Multi-source, sensor, type data
  • 69.
    Our Aims Unified approach to storing multi- thematic data
  • 70.
    Our Aims Unified approach to storing multi- thematic data Efficient data upload / access/ storage
  • 71.
    Our Aims Unified approach to storing multi- thematic data Efficient data upload / access/ storage Searchable in Time/ Space / by Attributes
  • 72.
    Our Aims Unified approach to storing multi- thematic data Efficient data upload / access/ storage Searchable in Time/ Space / by Attributes Incorporating Visualisations into all solutions
  • 73.
    Our hardware 3 Processing Servers 8 Intel Xeons, 2.1- 2.8 GHz 72 GBs RAM
  • 74.
    Our hardware 3 Processing Servers 8 Intel Xeons, 2.1- 2.8 GHz 72 GBs RAM 1 Storage Server 7TBs Raided Drives
  • 75.
    Our Developed Systems LiDAR / Image based SDB
  • 76.
    Our Developed Systems LiDAR / Image based SDB GeoComputation Platform
  • 77.
    Database storage experience Optimisationof upload of large (GBs) spatial files to SDB.
  • 78.
    Database storage experience Databaseoptimisation to suit system architecture
  • 79.
    Database storage experience Storage of multiple data types/sources
  • 80.
    Watch out for Spatial Index size V RAM
  • 81.
    Watch out for Spatial Index size V RAM Expected no. of concurrent users
  • 82.
    Watch out for Spatial Index size V RAM Expected no. of concurrent users HD capacity V daily data throughput