The Geographic Information System (GIS), industry is booming, especially with the continued reliance on online maps and the rise of location-aware mobile devices. GIS tech can be one of the key players in the mobile internet, big data, and the internet of things, and is an essential tool for the next generation of the global IT industry.
Yet, the GIS community is not prepared. With all the data available, GIS experts lack an off-the-shelf solutions to manage the growing volume of spatial data. Relational spatial databases (RSDB) were the leader in this field for decades, but RSDBs have failed to innovate to handle massive volumes of data coming in at high velocity.
Fortunately, MongoDB a useful tool for this challenge, but needs some tooling to create a connector to the GIS tech ecosystem. In order to bridge the gap, we built a pipeline to comply with the architecture of the Geospatial Data Abstraction Library (GDAL), so that MongoDB can work with most of popular GIS tools such as OpenLayers, Mapserver, GeoServer, QGIS, ArcGIS and others with ease. In this talk, I'll go through this pipeline tool and showcase some examples of how you can use this in your next application.
Giving MongoDB a Way to Play with the GIS Community
1. Giving MongoDB the way to play with the GIS community
To make GIScience directly supported by the NoSQL Technology, so prepared for BIG DATA ERA
Jiangsu Key Laboratory of Geographical Information Technology, Nanjing University.
Cyber-Infrastructure and Geospatial Information Laboratory (CIGI),
Department of Geography, School of Earth, Society and Environment,
National Center for Supercomputing Applications (NCSA),
University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Jun. 25, 2014
Hanson Shuai Zhang
shuai@illinois.edu
2. Spatial Pyramid – View the world with multiple spatiotemporal scales
1
Real world example - Spatial Pyramid
Challenges with PostGIS
Handling with MongoDB cluster
5. Spatial Pyramid | Generator Architecture
Spatial Pyramid Generator Architecture
Data Server
Spatial
Pyramid
Generator
PostGIS
HPC Cluster
Pyrimad Model
Python
OGR, MPI
Postgre
SQL
What is ArcSDE 8?
2.3 hours !!
6. Spatial Pyramid | MongoDB Approach
SpatialPyramid
Requests
Load Balance
MongoS
P
S
S
P
S
S
MongoS
Shard
Shard
C
C C
Config
GDAL/OGR
15 minutes !!
7. Open Source
– GDAL is released under an X/MIT style Open Source license
– supported by the Open Source Geospatial Foundation
A library for geospatial data formats
– abstract data model conformed to OGC standards.
– 133 raster data formats, 79 vector data formats
Widely used by the GIS community
– 88 software listed in the gdal.org using GDAL
Basic Library for HPGC
– We use GDAL as the basic tools to build high performance computing algorithms
Spatial Pyramid | GDAL Library
9. GDAL Driver for MongoDB
– Giving MongoDB the way to play with the GIS community
2
View MongoDB as a spatial database
Design GDAL Driver for MongoDB
Cooperate with other GIS tools
10. FID Geometry Name States Time Zone
10001 POINT(40.77, 73.98) NYC New York UTC-05:00
10002 POINT(41.90, 87.65) Chicago Illinois UTC-06:00
Feature – a spatial object
Point
Line
Polygon
Geometries
Attributes, Non-Spatial Data
GDAL | spatial database structure
Spatial Relational Table
1
2
3
11. GDAL | spatial database structure
https://lib.stanford.edu/gis
Tables – Layers
Rows – Features
Where is
RDBS
13. RDBMS GeoDatabase MongoDB
Database Datasource Database
Table Layer Collection
Row(s) Feature(s) JSON Document
Field(s) Field(s) Key:Value
Index R tree Index
Join Join Embedding & Linking
Partition — Shard
GDAL | Terminology
14. WKT, Well-known text, originally defined by the Open Geospatial
Consortium (OGC) and described in their Simple Feature Access and
Coordinate Transformation Service specifications.
GDAL | WKT for Spatial data
Type Examples
Point POINT (30 10)
LineString LINESTRING (30 10, 10 30, 40 40)
Polygon
POLYGON ((30 10, 10 20, 20 40, 40 40, 30 10))
POLYGON ((35 10, 10 20, 15 40, 45 45, 35 10),
(20 30, 35 35, 30 20, 20 30))
In total, there are 18 distinct geometric objects that can be represented.
http://en.wikipedia.org/wiki/Well-known_text
15. GDAL | WKT for Spatial data
{
GEM: POINT(41.90, 87.65)
FID:10002
Name: Chicago,
States: Illinois,
Time Zone: UTC-06:00,
}
FID Geometry Name States Time Zone
10001 POINT(40.77, 73.98) NYC New York UTC-05:00
10002 POINT(41.90, 87.65) Chicago Illinois UTC-06:00
WKT
Geospatial Metadata collection
16. GDAL | WKT for Spatial data
U.S.A
States
Cities
Canada
Roads
G_sys_Metadata
MongoDB Cluster
NYC
Chicago
……
Database
Collection
WKT
Feature
Layer
Datasource
|c_name | coord_d | src | type | Extent|
+----------------------+-------------------+
| Cities | 2 | 4326 | Point | [p1,p2]
| States | 2 | 4326 | Polygon | [p1,p2]
No spatial Index
17. GDAL | GeoJSON for spatial data
FID Geometry Name States Time Zone
10001 POINT(40.77, 73.98) NYC New York UTC-05:00
10002 POINT(41.90, 87.65) Chicago Illinois UTC-06:00
{
type: "Feature",
properties:
{
FID:10002
Name: Chicago,
States: Illinois,
Time Zone: UTC-06:00,
},
geometry:
{
type: "Point",
coordinates: [ 41.90 87.63]
}
}
GeoJSON
Geospatial Metadata collection
20. GDAL | Terminology
* FeatureCollection for GeoJSON format
RDBMS MongoDB GeoDatabase WKT GeoJSON FTCL*
Database Database Datasource Datasource Datasource Datasource
Table Collection Layer Layer Layer Dataset
Row(s)
JSON
Document
Feature Feature Feature Layer
Index Index R tree — Grid Index Grid Index
Join
Embedding &
Linking
Join
Embedding &
Linking
Embedding &
Linking
Embedding &
Linking
Partition Shard — Shard Shard Shard
21. GDAL | who is better?
*http://en.wikipedia.org/wiki/Well-known_text
** http://geojson.org/geojson-spec.html
Features WKT GeoJSON Feature Collection
Structure Flexible & Tight Semi- Semi- & un-
Spatial Index NO Grid Index Grid Index
Efficiency SLOW FAST MEDIUM
Self-explanatory NO YES with semi- YES
Easy-sharing MEDIUM MEDIUM CONVENIENT
Geometry types ALL SFA, 18* LIMITED, 6** LIMITED, 6**
22. ogr2ogr
– convert simple features data between file formats
– spatial or attribute selections, reducing the set of attributes,
– setting the output coordinate system or even reprojecting
– Extract, Transform, and Load (ETL) Tools for MongoDB Geospatial
GDAL | Load all sorts of spatial data
26. A step forward : MongoGIS
– Mend the way for the GIS community to play with MongoDB
3
Evolution of spatial database Tech
Comparison of spatial database solutions
Roadmap to make the way
27. GIS Application
Geometries
GeometriesGeometries
files
FID
20th Century late 80s & early 90s
RDBMS for attribute data
File systems for geometry data.
An unique ID of feature link the two
ESRI Shapefile is one of most famous
Problems with data integrity, multiuser
access and editing
1st Generation | Hybrid Solution
Standard SQL Geoprocessing
Attributes
28. IT
20th Century mid 90s
Attributes & Geometries in database
But geometry as binary large object
SDE as a middleware by GIS venders
Geometries are not understandable.
Poor integration, no spatial structure
query language
2nd Generation | Spatial Database Engine
SDE
Attributes
Geometries
GeometriesGeometries
blobsSQL
GIS Application
29. GIS
eBusiness
GeometriesAttributes
E-SQL
20th Century late 90s
Spatial is a native Data Type
Attributes & geometries all in
Rich GIS functions built inside
Supported by major DB venders
Spatial data queried using E-SQL
DB functionality fully supported
E-SQL
GISGIS
eBusiness
eBusiness
3rd Generation | Object-based Spatial Database
30. BIG DATA Spreading
2008.9
Nature
2009.1
Google
2009.5
UN
Detecting influenza epidemics using search engine query data
Global Plus Project
"Big Data for Development: Opportunities &
Challenges”: A Global Pulse White Paper
2009.12
Microsoft
The Fourth Paradigm:
Data-Intensive Scientific
Discovery
2011.2
Science
Dealing with data
highlight both the challenges posed by the data deluge and the
opportunities that can be realized if we can better organize and
access the data.
2012.3
The White House
Big Data Initiative
more than $200 million to big
data research projects.
31. FeatureSolutions
PostGIS As A Cluster
MongoDB
Cluster
Shared Disk
Failover
File System
Replication
Transaction Log
Shipping
Trigger-Based
Master-Standby
Replication
Statement-Based
Replication
Middleware
Asynchronous
Multi-Master
Replication
Implementation NAS DRBD Streaming Slony-I pgpool-II Bucardo Sharding
Communication Shared Disk Disk Blocks WAL Table Rows SQL Table Rows olog
No Special Hardware × √ √ √ √ √ √
Data Synchronous Sync Sync Sync, Async Async Sync Async Sync
Replication Method × M-S M-S M-S M-M, M-S M-M, M-S M-M
No Master Overhead √ × √ × √ √ √
Failover No Data Loss √ √ With Sync On × √ × √
Failover for HA Fast Fast Fast with Hot Manual Hard to Re-attach × Fast
Writes Scalability × × × × With M-M √ Good
Reads Scalability × × With Hot √ √ √ Good
Parallel Query × × × × With M-M √ √
Complexity For Admin Low Low Low High Very High High Low
Load Balancing × × × × √ × √
MongoDB as a High Performance Database
33. GDAL driver for mongodb
– The way that mongodb plays with the GIS community
– Work with GDAL community to included in the next release
– Open Source: https://github.com/mongogis/mongodb-gdal-driver
MongoGIS
– The Next Generation Infrastructure for the GIS community
– MongoGIS group in the github: https://github.com/mongogis
– We may build it together!
MongoGIS in github
34. Appreciate Your Time!
Sponsored by the China Scholarship Council for one year program at UIUC, Illinois, USA.
Supported by the Scientific Research Foundation of Graduate School of Nanjing University.
Great Thanks go to Craig Wilson, Greg Steinbruner for their precious advices.