Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DSD-INT 2018 Earth Science Through Datacubes - Merticariu

11 views

Published on

Presentation by Vlad Merticariu (Rasdaman) at the Data Science Symposium 2018, during Delft Software Days - Edition 2018. Thursday 15 November 2018, Delft.

Published in: Software
  • Be the first to comment

  • Be the first to like this

DSD-INT 2018 Earth Science Through Datacubes - Merticariu

  1. 1. Datacubes :: Data Science Symposium :: ©2018 rasdaman Data Science Symposium, Delft, 2018-Nov-15 Vlad Merticariu Jacobs University | rasdaman GmbH merticariu@rasdaman.com Earth Science through Datacubes [gamingfeeds.com]
  2. 2. Datacubes :: Data Science Symposium :: ©2018 rasdaman Datacubes ▪ Spatio-temporal, multi-dimensional - Sensor, image [timeseries], simulation, statistics ▪ Suitable for humans - „one cube says more than a million images“ ▪ Suitable paradigm for m2m querying - Arrays ▪ Automatic reorganization during ingest - Optimal support for any given workload
  3. 3. Datacubes :: Data Science Symposium :: ©2018 rasdaman sensor feeds Data Homogenization With OGC & ISO Standards 3 datacube server
  4. 4. Datacubes :: Data Science Symposium :: ©2018 rasdaman Data model
  5. 5. Datacubes :: Data Science Symposium :: ©2018 rasdaman CIS 1.1 Coverage Definition «Feature Type» Coverage «Feature Type» GML::Feature «Data Type» DomainSet «Data Type» RangeSet «Data Type» SWE Common::DataRecord rangeSetdomainSet rangeType 0..1 metadata [OGC 09-146r2]
  6. 6. Datacubes :: Data Science Symposium :: ©2018 rasdaman A Simple Coverage, in GML
  7. 7. Datacubes :: Data Science Symposium :: ©2018 rasdaman Encoding Coverages ▪ Single file encoding: - Informationally complete: GML, JSON, RDF, … - Efficient formats: GeoTIFF, NetCDF, JPEG2000, GRIB, … ▪ Multipart: container( “header” + file1 + file2 + … ) - Multipart/MIME, zip, GMLJP2, SAFE, GeoPackage, ... Coverage Domain set Range type Range set App Metadata Coverage Domain set Range type xlink App Metadata NetCDF
  8. 8. Datacubes :: Data Science Symposium :: ©2018 rasdaman Service model
  9. 9. Datacubes :: Data Science Symposium :: ©2018 rasdaman Web Coverage Service (WCS) ▪ WCS Core: access to spatio-temporal coverages & subsets - format encoding on the fly - subset = trim | slice ▪ WCS Extensions: optional functionality facets - Scaling, CRS transformation, … Large, growing implementation basis: rasdaman, GDAL, QGIS, OpenLayers, OPeNDAP, MapServer, GeoServer, NASA WorldWind, EOx- Server; Pyxis, ERDAS, ArcGIS, ...
  10. 10. Datacubes :: Data Science Symposium :: ©2018 rasdaman OGC Web Coverage Processing Service (WCPS) ▪ = high-level spatio-temporal geo analytics language for $c in ( M1, M2, M3 ) where some( $c.nir > 127 ) return encode( $c.red - $c.nir, “image/tiff“ ) (tiffA, tiffC) 10 ▪ "From MODIS scenes M1, M2, M3: difference between red & nir, as TIFF" • …but only those where nir exceeds 127 somewhere [JacobsU, Fraunhofer; data courtesy BGS, ESA]
  11. 11. Datacubes :: Data Science Symposium :: ©2018 rasdaman OGC Web Coverage Processing Service (WCPS)
  12. 12. Datacubes :: Data Science Symposium :: ©2018 rasdaman OGC Web Coverage Processing Service (WCPS)
  13. 13. Datacubes :: Data Science Symposium :: ©2018 rasdaman Comfort Zone of Well-Known Tools ▪ ...via OGC W*S standard APIs: ▪ Map navigation: OpenLayers, Leaflet, ... ▪ Virtual globe: NASA WorldWind, Cesium, ... ▪ GIS: QGIS, ArcGIS, ... ▪ Analysis: GDAL, R, python (OWSLIB, Jupyter notebooks), ... [screenshots: clients accessing rasdaman]
  14. 14. Datacubes :: Data Science Symposium :: ©2018 rasdaman Implementation: rasdaman
  15. 15. Datacubes :: Data Science Symposium :: ©2018 rasdaman WCS Core Reference Implementation: rasdaman ▪ „raster data manager“: SQL + n-D arrays ▪ Scalable parallel “tile streaming” architecture ▪ Supports R, QGIS, OpenLayers, GDAL, Pyxis, ERDAS, ArcGIS, ... ▪ Blueprint for ISO Array SQL standard rasdaman visitors
  16. 16. Datacubes :: Data Science Symposium :: ©2018 rasdaman rasdaman Scalability ▪ Adaptive data partitioning & distribution - PB+ datacubes select max((A.nir - A.red) / (A.nir + A.red)) - max((B.nir - B.red) / (B.nir + B.red)) from A, B Dataset A select max((B.nir - B.red) / (B.nir + B.red)) from B select max((A.nir - A.red) / (A.nir +A.red)) from A Dataset B ▪ Distributed query processing - Heterogeneous hardware, cloud, federation - 1 query  1,000+ cloud nodes - [SIGMOD DANAC 2014]
  17. 17. Datacubes :: Data Science Symposium :: ©2018 rasdaman Datacube applications
  18. 18. Datacubes :: Data Science Symposium :: ©2018 rasdaman Sea Ice Monitoring [running rasdaman backend]
  19. 19. Datacubes :: Data Science Symposium :: ©2018 rasdaman The BigPicture Project ▪ Big Data Analytics for diagnosing yield variations - satellite image derived, site-specific variations - Goal: Recommendations to farmers • fertilizer placement • application of plant protection products • choice of species to grow • etc. - Ground truthing: 500 farmers across Germany ▪ Application layer on top of rasdaman ▪ Supported by German Federal Ministry of Food and Agriculture
  20. 20. Datacubes :: Data Science Symposium :: ©2018 rasdaman ESA EO Datacube [running rasdaman backend]
  21. 21. Datacubes :: Data Science Symposium :: ©2018 rasdaman ECMWF: River Discharge [running rasdaman backend]
  22. 22. Datacubes :: Data Science Symposium :: ©2018 rasdaman Datacube Federation ECMWF NCI Australia [running rasdaman backend]
  23. 23. Datacubes :: Data Science Symposium :: ©2018 rasdaman Startup Examples ▪ EOfarm/GR: Big Data Analytics for farmers - Data: Landsat8, Sentinels, RapidEye - Functionality: • Color Composites, Band Ratios and Indices • Vegetation Detection • Canopy Greenness Estimation • Land Surface Temperature • Time series over AOI ▪ rasdaman via OGC WCS & WCPS ▪ similar framework deployed for water quality monitoring ▪ Cropmaps.eu: same, SAR-based [running rasdaman backend]
  24. 24. Datacubes :: Data Science Symposium :: ©2018 rasdaman Wrap-Up ▪ Datacubes for analysis-ready spatio-temporal „Big Data“ - sensor, image (timeseries), simulation, statistics datacubes ▪ rasdaman: pioneered datacube services - any query, anytime, on any size – without programming - Petascale datacube federation - http://rasdaman.org - http://rasdaman.com ▪ Applications: multi-scale focused analytics - giving local insights to farmers - sea ice monitoring - forecasting - ... More questions? Write me: merticariu@rasdaman.com

×