SlideShare a Scribd company logo
1 of 20
Download to read offline
© 201 IBM Corporation
Geospatial Big Data
@normanbarker
Foss4g-NA
© 2015 IBM Corporation‹#›
Overview
▪ CouchDB / Cloudant
▪ A brief history
▪ What is Big Data
▪ NoSQL
▪ Consistent Hashing / Sharding
▪ CAP Theorem
▪ Multi-Master database
▪ Image Classification Example
▪ Geospatial Querying using Secondary Indexes
▪ libspatialindex / geos
© 2015 IBM Corporation‹#›
CouchDB
▪ NoSQL document store with map/reduce initially developed by
Damien Katz
▪ Cloudant/Couchbase were the primary developers
▪ Now a large diverse community
▪ Written in Erlang
▪ GeoCouch, Lucene FTI
▪ JSON/GeoJSON
▪ Cloudant
▪ Donated BigCouch to enable clustering of CouchDB
▪ Cluster Of Unreliable Commodity Hardware
▪ Runs a DBaaS on top of CouchDB
© 2015 IBM Corporation‹#›
NoSQL
▪ Unstructured text
▪ JSON
▪ Schemas are enforced on READs
© 2015 IBM Corporation‹#›
Consistent Hashing
▪ Data is too large to fit on a single node
▪ DB is a collection of partitions
▪ need availability
▪ need to expand cluster horizontally
▪ Based on the Amazon Dynamo Model
▪ Document IDs are hashed
▪ Small number of keys are remapped when a node goes down
© 2015 IBM Corporation‹#›
Consistent hashing for data availability
© 2015 IBM Corporation‹#›
GeoHash as a consistent hash
▪ GeoHash is a Morton Key
▪ Can assign geometries to nodes
▪ Fixed data
▪ Hot spots
© 2015 IBM Corporation‹#›
CAP Theorem
▪ Distributed systems can provide two of the following
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it
succeeded or failed)
Partition Tolerance (the system continues to operate despite arbitrary message loss
or failure of part of the system)
Mathematical proof by contradiction: http://bit.ly/1FbnmrC
© 2015 IBM Corporation‹#›
Multi-Master (Peers)
▪ CouchDB is an eventually consistent distributed system
▪ Pick Two
▪ Availability
▪ Partition Tolerance
▪ Eventual Consistency
▪ Multi-Master system
▪ Sync (Changes Feed)
▪ Data Collection
▪ Online/Offline
▪ Distributed Big Geo Data
© 2015 IBM Corporation‹#›
Multi-Master (Peers)
Distributed geo big data
© 2015 IBM Corporation‹#›
PouchDB is an open-source JavaScript database inspired by Apache
CouchDB that is designed to run well within the browser.
© 2015 IBM Corporation‹#›
Multi-Master (Data Centres)
▪ High availability
▪ High velocity
© 2015 IBM Corporation‹#›
Image Classification
▪ Traditional
▪ Input is a large GeoTiff file where the pixel values represent a
data class (e.g. water, vegetation)
▪ Using ENVI / GRASS / GDAL export as shape file per class
and overlay on a map
▪ Possibly store results in a WFS
© 2015 IBM Corporation‹#›
▪ CouchDB
▪ Split the image into tiles and process classes as vectors as above
▪ note: potrace is a utility for this that exports to geojson
▪ Store the result as GeoJSON within CouchDB
▪ Data is highly available, redundant
▪ Field user can edit the classes and sync back to a server (multi-master)
▪ MVCC
▪ Eventual consistency can be managed
Image Classification
© 2015 IBM Corporation‹#›
Geospatial Indexes
▪ Distributed spatial index
▪ Always going to be slower on writes (redundancy)
▪ Need to make READs quick
R-Tree R*-Tree
© 2015 IBM Corporation‹#›
Temporal Indexes
▪ Index speed is related to index size
▪ When using temporal data, use an appropriate tree!
▪ TPR*-Tree, MVR*-Tree
© 2015 IBM Corporation‹#›
Cloudant Geo
▪ Secondary geospatial indexes
▪ libspatialindex
▪ R*-Tree, MVR-Tree, TPR-Tree
▪ libgeos
▪ CS-Map
▪ CouchDB / BigCouch
▪ Sync
© 2015 IBM Corporation‹#›
Geo Big Data Summary
▪ A CouchDB View of the World
▪ data can and should be distributed
▪ Sync significant data between masters
▪ multi-master, eventually consistent
▪ Secondary indexes enable complex queries
▪ Use a cluster to archive data
© 2015 IBM Corporation‹#›
Issues
▪ Within geospatial systems transactions are assumed
▪ Need a revision model for eventual consistency
▪ Feature data is ok for MVCC
▪ What about big raster data?
▪ Image streaming
▪ Versioning
© 2015 IBM Corporation‹#›
Questions?
@normanbarker
BoF Tonight
7:30 - 9:30pm
Sandpebble CDE

More Related Content

What's hot

MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 StepsScott Cinnamond
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveMike Frampton
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Senthil Kumar
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 

What's hot (20)

Utilizing HDF4 File Content Maps for the Cloud Computing
Utilizing HDF4 File Content Maps for the Cloud ComputingUtilizing HDF4 File Content Maps for the Cloud Computing
Utilizing HDF4 File Content Maps for the Cloud Computing
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Hadoop
Hadoop Hadoop
Hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Hadoop by sunitha
Hadoop by sunithaHadoop by sunitha
Hadoop by sunitha
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 

Similar to Geospatial Big Data - Foss4gNA

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octParadigma Digital
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Mladen Kovacevic
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
The NoSQL Geospatial Landscape
The NoSQL Geospatial LandscapeThe NoSQL Geospatial Landscape
The NoSQL Geospatial LandscapeRaj Singh
 
As fast as a grid, as safe as a database
As fast as a grid, as safe as a databaseAs fast as a grid, as safe as a database
As fast as a grid, as safe as a databasegojkoadzic
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 

Similar to Geospatial Big Data - Foss4gNA (20)

Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4oct
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
The NoSQL Geospatial Landscape
The NoSQL Geospatial LandscapeThe NoSQL Geospatial Landscape
The NoSQL Geospatial Landscape
 
As fast as a grid, as safe as a database
As fast as a grid, as safe as a databaseAs fast as a grid, as safe as a database
As fast as a grid, as safe as a database
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

Geospatial Big Data - Foss4gNA

  • 1. © 201 IBM Corporation Geospatial Big Data @normanbarker Foss4g-NA
  • 2. © 2015 IBM Corporation‹#› Overview ▪ CouchDB / Cloudant ▪ A brief history ▪ What is Big Data ▪ NoSQL ▪ Consistent Hashing / Sharding ▪ CAP Theorem ▪ Multi-Master database ▪ Image Classification Example ▪ Geospatial Querying using Secondary Indexes ▪ libspatialindex / geos
  • 3. © 2015 IBM Corporation‹#› CouchDB ▪ NoSQL document store with map/reduce initially developed by Damien Katz ▪ Cloudant/Couchbase were the primary developers ▪ Now a large diverse community ▪ Written in Erlang ▪ GeoCouch, Lucene FTI ▪ JSON/GeoJSON ▪ Cloudant ▪ Donated BigCouch to enable clustering of CouchDB ▪ Cluster Of Unreliable Commodity Hardware ▪ Runs a DBaaS on top of CouchDB
  • 4. © 2015 IBM Corporation‹#› NoSQL ▪ Unstructured text ▪ JSON ▪ Schemas are enforced on READs
  • 5. © 2015 IBM Corporation‹#› Consistent Hashing ▪ Data is too large to fit on a single node ▪ DB is a collection of partitions ▪ need availability ▪ need to expand cluster horizontally ▪ Based on the Amazon Dynamo Model ▪ Document IDs are hashed ▪ Small number of keys are remapped when a node goes down
  • 6. © 2015 IBM Corporation‹#› Consistent hashing for data availability
  • 7. © 2015 IBM Corporation‹#› GeoHash as a consistent hash ▪ GeoHash is a Morton Key ▪ Can assign geometries to nodes ▪ Fixed data ▪ Hot spots
  • 8. © 2015 IBM Corporation‹#› CAP Theorem ▪ Distributed systems can provide two of the following Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it succeeded or failed) Partition Tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) Mathematical proof by contradiction: http://bit.ly/1FbnmrC
  • 9. © 2015 IBM Corporation‹#› Multi-Master (Peers) ▪ CouchDB is an eventually consistent distributed system ▪ Pick Two ▪ Availability ▪ Partition Tolerance ▪ Eventual Consistency ▪ Multi-Master system ▪ Sync (Changes Feed) ▪ Data Collection ▪ Online/Offline ▪ Distributed Big Geo Data
  • 10. © 2015 IBM Corporation‹#› Multi-Master (Peers) Distributed geo big data
  • 11. © 2015 IBM Corporation‹#› PouchDB is an open-source JavaScript database inspired by Apache CouchDB that is designed to run well within the browser.
  • 12. © 2015 IBM Corporation‹#› Multi-Master (Data Centres) ▪ High availability ▪ High velocity
  • 13. © 2015 IBM Corporation‹#› Image Classification ▪ Traditional ▪ Input is a large GeoTiff file where the pixel values represent a data class (e.g. water, vegetation) ▪ Using ENVI / GRASS / GDAL export as shape file per class and overlay on a map ▪ Possibly store results in a WFS
  • 14. © 2015 IBM Corporation‹#› ▪ CouchDB ▪ Split the image into tiles and process classes as vectors as above ▪ note: potrace is a utility for this that exports to geojson ▪ Store the result as GeoJSON within CouchDB ▪ Data is highly available, redundant ▪ Field user can edit the classes and sync back to a server (multi-master) ▪ MVCC ▪ Eventual consistency can be managed Image Classification
  • 15. © 2015 IBM Corporation‹#› Geospatial Indexes ▪ Distributed spatial index ▪ Always going to be slower on writes (redundancy) ▪ Need to make READs quick R-Tree R*-Tree
  • 16. © 2015 IBM Corporation‹#› Temporal Indexes ▪ Index speed is related to index size ▪ When using temporal data, use an appropriate tree! ▪ TPR*-Tree, MVR*-Tree
  • 17. © 2015 IBM Corporation‹#› Cloudant Geo ▪ Secondary geospatial indexes ▪ libspatialindex ▪ R*-Tree, MVR-Tree, TPR-Tree ▪ libgeos ▪ CS-Map ▪ CouchDB / BigCouch ▪ Sync
  • 18. © 2015 IBM Corporation‹#› Geo Big Data Summary ▪ A CouchDB View of the World ▪ data can and should be distributed ▪ Sync significant data between masters ▪ multi-master, eventually consistent ▪ Secondary indexes enable complex queries ▪ Use a cluster to archive data
  • 19. © 2015 IBM Corporation‹#› Issues ▪ Within geospatial systems transactions are assumed ▪ Need a revision model for eventual consistency ▪ Feature data is ok for MVCC ▪ What about big raster data? ▪ Image streaming ▪ Versioning
  • 20. © 2015 IBM Corporation‹#› Questions? @normanbarker BoF Tonight 7:30 - 9:30pm Sandpebble CDE