Enabling Access to Big Geospatial Data with LocationTech and Apache projects

•

1 like•741 views

LocationPowers OGC BigGeoData 2016 This presentation will discuss tools in the open source landscape that are used to handle big geospatial data. In particular, we will focus on how Apache frameworks such as Spark and Accumulo are "geospatially enabled" by four projects: GeoTrellis, GeoWave, GeoMesa, and GeoJinni. These four projects all participate in LocationTech, a working group under the Eclipse Foundation. In particular, we will discuss how each of these LocationTech technologies implement spatial indexing (e.g. by using space filling curves) in order to provide quick access to data, and other common themes among the four projects. Attendees should walk away from this presentation understanding important parts of the Apache big data ecosystem, a set of LocationTech projects that belong to the cutting edge of enabling those Apache project's handling of geospatial data, as well as some solutions to common problems when dealing with large geospatial data.

Technology

Rob Emanuele
ENABLING ACCESS TO BIG
GEOSPATIAL DATA WITH
&

What we’ll be covering…
LocationTech projects that geospatially enable
Apache big data frameworks by providing spatial
indexing.
Discuss how those four project approach
indexing, focusing on the use of space ﬁlling
curves.

STORING AND PROCESSING
GEOSPATIAL DATA @ SCALE

00 01
1011
10
11 00
01
11
10
00
01
Hilbert Index (52) =
11 01 00

Range Decomposition
70 -> 75
92 -> 99
116 -> 121

Data
Node
Data
Node
Data
Node
Name
Node
Master
Tablet
Server
Tablet
Server
Tablet
Server
Accumulo
BigTable clone (columnar database)
Records stored on HDFS
Lexicographically sorted table index

Periodicity (time dimension)
1997 1998 1999

Periodicity (arbitrary dimensions)
Time
Elevation
Velocity

Spatial index stored per ﬁle on HDFS
Z order (2D and 3D),
Hilbert (N-Dimensional)
Z order (2D and 3D)
Binned per week for spatiotemporal
N-Dimensional Hilbert with
arbitrary binning and tiered indexing
Spatial Indexing

THANK YOU
@lossyrob
gitter.im/geotrellis/geotrellis
github.com/geotrellis/geotrellis
remanuele@azavea.com

What's hot

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasetsRob Emanuele

STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...GEO Analytics Canada

Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman

Geo Analytics Canada Overview - May 2020GEO Analytics Canada

Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth

Hopsworks - ExtremeEarth Open WorkshopExtremeEarth

The next generation of the Montage image mosaic engineG. Bruce Berriman

Snow cover assessment tool using PythonPrasun Kumar Gupta

Big Linked Data Querying - ExtremeEarth Open WorkshopExtremeEarth

SPD and KEA: HDF5 based file formats for Earth ObservationThe HDF-EOS Tools and Information Center

DATACUBES: Conquering Space & Timeplan4all

OpenTopography - Scalable Services for Geosciences DataOpenTopography Facility

High Throughput Processing of Space Debris DataAndreas Schreiber

NASA Terra Data FusionThe HDF-EOS Tools and Information Center

Working with OpenStreetMap using Apache Spark and GeotrellisRob Emanuele

Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...CommunicatieSURF

ExtremeEarth Open Workshop - Overview and AchievementsExtremeEarth

Gdal introductionTomer Lieber

Accelerating Science with Cloud Technologies in the ABoVE Science CloudGlobus

Pilot Project for HDF5 Metadata Structures for SWOTThe HDF-EOS Tools and Information Center

What's hot (20)

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets

STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...

Project Matsu: Elastic Clouds for Disaster Relief

Geo Analytics Canada Overview - May 2020

Big Linked Data Federation - ExtremeEarth Open Workshop

Hopsworks - ExtremeEarth Open Workshop

The next generation of the Montage image mosaic engine

Snow cover assessment tool using Python

Big Linked Data Querying - ExtremeEarth Open Workshop

SPD and KEA: HDF5 based file formats for Earth Observation

DATACUBES: Conquering Space & Time

OpenTopography - Scalable Services for Geosciences Data

High Throughput Processing of Space Debris Data

NASA Terra Data Fusion

Working with OpenStreetMap using Apache Spark and Geotrellis

Access to Open Earth Observation Data, an Overview and Outlook Raymond Sluit...

ExtremeEarth Open Workshop - Overview and Achievements

Gdal introduction

Accelerating Science with Cloud Technologies in the ABoVE Science Cloud

Pilot Project for HDF5 Metadata Structures for SWOT

Viewers also liked

ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataKostis Kyzirakos

Intro To Geospatialdanrickman

Spatial Indexingtorp42

The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLtorp42

Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos

Geospatial Data Preservation Primer GeoConnectionsCommunication and Media Studies, Carleton University

VO Course 11: Spatial indexingJoint ALMA Observatory

Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)The HDF-EOS Tools and Information Center

SQLBits X SQL Server 2012 Spatial IndexingMichael Rys

GeoMesa – Spatio-Temporal Indexing in AccumuloCvilleDataScience

Foundation ComparisonJody Garnett

Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Thierry Badard

Change the Way We build - Part 3Amol Vidwans

Spatial Data processing with HadoopVisionGEOMATIQUE2014

LocationTech ProjectsJody Garnett

GeoMesa: Scalable Geospatial AnalyticsVisionGEOMATIQUE2014

Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit

Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk

RTree Spatial Indexing with MongoDB - MongoDC Nicholas Knize, Ph.D., GISP

Viewers also liked (20)

ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data

Intro To Geospatial

Spatial Indexing

The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL

Data Models and Query Languages for Linked Geospatial Data

Geospatial Data Preservation Primer GeoConnections

VO Course 11: Spatial indexing

Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)

SQLBits X SQL Server 2012 Spatial Indexing

GeoMesa – Spatio-Temporal Indexing in Accumulo

Foundation Comparison

Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...

Change the Way We build - Part 3

Spatial Data processing with Hadoop

LocationTech Projects

GeoMesa: Scalable Geospatial Analytics

Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...

Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta

RTree Spatial Indexing with MongoDB - MongoDC

Similar to Enabling Access to Big Geospatial Data with LocationTech and Apache projects

No(Geo)SQLNicolasgmail.com Helleringer

2016-01 Lucene Solr spatial in 2015, NYC MeetupDavid Smiley

Strata Stinger Talk October 2013alanfgates

Small, Medium and Big DataPierre De Wilde

Above the cloud: Big Data and BIDenny Lee

Fundamental of Big Data with Hadoop and HiveSharjeel Imtiaz

Hadoop introKeith Davis

The Evolution of Big Data FrameworkseXascale Infolab

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center

[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축Kwang Woo NAM

Introduction to Spark on HadoopCarol McDonald

Hive parisSzehon Ho

On the value of Sampling and Pruning for SBSEJianfeng Chen

Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies

Raster data in GeoServer and GeoTools: Achievements, issues and future develo...GeoSolutions

Map-Reduce and Apache HadoopSvetlin Nakov

HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFViewThe HDF-EOS Tools and Information Center

Presentation sreenu dwh-servicesSreenu Musham

Hadoop Training in Hyderabad | Online TrainingN Benchmark IT Solutions

FOSS4G2011 ReportMeg Murakami

Similar to Enabling Access to Big Geospatial Data with LocationTech and Apache projects (20)

No(Geo)SQL

2016-01 Lucene Solr spatial in 2015, NYC Meetup

Strata Stinger Talk October 2013

Small, Medium and Big Data

Above the cloud: Big Data and BI

Fundamental of Big Data with Hadoop and Hive

Hadoop intro

The Evolution of Big Data Frameworks

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...

[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축

Introduction to Spark on Hadoop

Hive paris

On the value of Sampling and Pruning for SBSE

Let Spark Fly: Advantages and Use Cases for Spark on Hadoop

Raster data in GeoServer and GeoTools: Achievements, issues and future develo...

Map-Reduce and Apache Hadoop

HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView

Presentation sreenu dwh-services

Hadoop Training in Hyderabad | Online Training

FOSS4G2011 Report

Recently uploaded

Architecting Cloud Native ApplicationsWSO2

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

DBX First Quarter 2024 Investor PresentationDropbox

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

ICT role in 21st century education and its challengesrafiqahmad00786416

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2

Platformless Horizons for Digital AdaptabilityWSO2

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Corporate and higher education May webinar.pptxRustici Software

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Recently uploaded (20)

Architecting Cloud Native Applications

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

DBX First Quarter 2024 Investor Presentation

Vector Search -An Introduction in Oracle Database 23ai.pptx

ICT role in 21st century education and its challenges

Introduction to Multilingual Retrieval Augmented Generation (RAG)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Six Myths about Ontologies: The Basics of Formal Ontology

WSO2's API Vision: Unifying Control, Empowering Developers

Platformless Horizons for Digital Adaptability

AWS Community Day CPH - Three problems of Terraform

Apidays New York 2024 - The value of a flexible API Management solution for O...

Corporate and higher education May webinar.pptx

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

How to Troubleshoot Apps for the Modern Connected Worker

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MS Copilot expands with MS Graph connectors

Enabling Access to Big Geospatial Data with LocationTech and Apache projects

1. Rob Emanuele ENABLING ACCESS TO BIG GEOSPATIAL DATA WITH &

2. What we’ll be covering… LocationTech projects that geospatially enable Apache big data frameworks by providing spatial indexing. Discuss how those four project approach indexing, focusing on the use of space ﬁlling curves.

3. STORING AND PROCESSING GEOSPATIAL DATA @ SCALE

4. STORING AND PROCESSING GEOSPATIAL DATA @ SCALE

8. WHAT IS ?

10.

11.

12.

13. GEOJINNI (FORMERLY SPATIALHADOOP)

14.

15.

16.

17. SPACE FILLING CURVES

18.

19.

20. 00 01 1011 10 11 00 01 11 10 00 01 Hilbert Index (52) = 11 01 00

21.

22.

23.

24. Geo + accessed through

25. Z curve

26. Z curve (also XZ)

27.

28. Geo + accessed through GEOWAVE

29. Hilbert Curve

30. Range Decomposition 70 -> 75 92 -> 99 116 -> 121

31.

32. False positives - secondary ﬁltering

33.

34. Geo + Rasters +

35.

36.

37.

38. Z or Hilbert

39.

40. Data Node Data Node Data Node Name Node Master Tablet Server Tablet Server Tablet Server Accumulo BigTable clone (columnar database) Records stored on HDFS Lexicographically sorted table index

41.

42.

43.

44. partition id split id

45. split id partition id

46. Tiered Indexing

47. Tiered Indexing

48. Periodicity (time dimension) 1997 1998 1999

49.

50. Periodicity (arbitrary dimensions) Time Elevation Velocity

51. Spatial index stored per ﬁle on HDFS Z order (2D and 3D), Hilbert (N-Dimensional) Z order (2D and 3D) Binned per week for spatiotemporal N-Dimensional Hilbert with arbitrary binning and tiered indexing Spatial Indexing

52.

53. CQL

54. Future integration work ?

55. THANK YOU @lossyrob gitter.im/geotrellis/geotrellis github.com/geotrellis/geotrellis remanuele@azavea.com

56. GeoMesa GeoWave

57. Tiered Indexing