SlideShare a Scribd company logo
Maintaining Spatial Data Infrastructures (SDIs)
using distributed task queues
Paolo Corti and Ben Lewis
Harvard Center for Geographic Analysis
2017 FOSS4G
Boston
Background
Harvard Center for Geographic Analysis
• WorldMap http://worldmap.harvard.edu
– Biggest GeoNode instance on the planet
– https://github.com/cga-harvard/cga-worldmap
• HHypermap http://hh.worldmap.harvard.edu
– Map service registry
– https://github.com/cga-harvard/HHypermap
Note
Billion Object Platform (BOP)
https://github.com/cga-harvard/hhypermap-bop
Demo of WorldMap / HHypermap
The need for an asynchronous processor
In WorldMap and HHypermap there are operations run by users which are
time consuming and cannot be handled in the context of a web request
● Harvest the metadata of a service and its layers
● Synchronize the metadata of a new or updated layer to the search
engine
● Feed a gazetteer when a new layer is uploaded or updated
● Upload a spatial datasets to the server
● Create a new layer using a table join
HTTP request/response cycle must be fast
● In web applications the HTTP
request/response cycle can be
synchronous as long as there are
very quick interactions between
the client and the server
● unfortunately there are cases
when the cycle become slower
● In these situations the best
practice for a web application is
to process asynchronously these
tasks using a task queue
Task Queues
Asynchronous processing in a web application can be
delegated to a task queue, which is a system for parallel
execution of tasks in a non-blocking fashion
Asynchronous processing model
Asynchronous processing model
● The asynchronous processing model is composed by services that
produce processing tasks (producers) and by services which
consume and process these tasks (consumers) accordingly
● A message queue is a broker which facilitates message passing by
providing a protocol or interface which other services can access.
Work can be distributed across threads or machines
● In the context of a web application the producer is the client
application that creates messages based on the user interaction. The
consumer is a daemon process that can consume the messages and
run the needed process
Glossary
● Task Queue: a system for parallel execution of tasks in a non-blocking
fashion
● Broker or Message Queue: provides a protocol or interface for messages
exchanging between different services and applications
● Producer: the code that places the tasks to be executed later in the broker
● Consumer or Worker: takes tasks from the broker and process them
● Exchange: takes a message from a producer and route it to zero or more
queues (messages routing)
Tasks must be consumed faster than being produced. If not, add more workers
Use cases for task queues
● in web applications some process is taking too much time
and must be processed asynchronously
● heterogeneous applications/services in a given system
architecture need an easy way to reliably communicate
between each other
● periodic operations (vs crontab)
● a way of parallelizing tasks in multi processors
● monitor processes and analyze failing tasks (and execute
them again)
Typical use cases for a task queue in a web application
● Thumbnails generation
● Sending bulk email
● Fetching large amounts of data from APIs
● Performing time-intensive calculations
● Expensive queries
● Search engine index synchronization
● Interaction with another application/service
● Replacing cron jobs (backups, maintenance, etc…)
Typical use cases for a task queue in a GIS Portal/SDI
● Upload a shapefile to the server (GeoNode)
● Thumbnails generation for layers and maps (GeoNode)
● OGC services harvesting (Harvard Hypermap)
● Geoprocessing operations
● Geospatial data maintenance
Producer, broker and consumer architecture
Producer
Consumer
Producer
Broker
Consumer
Producer
Broker
Consumer
Producer
Broker
Consumer
Producer
Message brokers implementations
Most of them are open source!
● RabbitMQ (AMQP, STOMP, JMS)
● Apache ActiveMQ (STOMP, JMS)
● Amazon Simple Queue Service (JMS)
● Apache Kafka
Several standard protocols:
● AMQP, STOMP, JMS, MSMQ (Microsoft .NET)
Tasks (Jobs) queues implementations
● Celery (RabbitMQ, Redis, Amazon SQS, Zookeeper)
● Redis Queue (Redis)
● Resque (Redis)
● Kue (Redis)
And many others!
Celery
● asynchronous task queue based on distributed message
passing
● focused on real-time operation, but supports scheduling
as well
● the execution units, called tasks, are executed
concurrently on a single or more worker servers
● it supports many message brokers (RabbitMQ, Redis,
MongoDB, CouchDB, ...)
● written in Python but it can operate with other languages
● great integration with Django!
● great monitoring tools (Flower, django-celery-results)
RabbitMQ
● RabbitMQ is a message broker: it accepts and
forwards messages
● most widely deployed open source broker (35k+
deployments)
● support many message protocols
● supported by many operating systems and
languages
● Written in Erlang
Architecture of Celery/RabbitMQ
https://tests4geeks.com/python-celery-rabbitmq-tutorial/
A real use case: Harvard Hypermap
HHypermap (Harvard Hypermap)
Registry is a platform that manages
OWS, Esri REST, and other types of
map service harvesting, and
orchestration and maintains uptime
statistics for services and layers.
Where possible, layers are cached
by MapProxy.
HHypermap provides thousands of
remote layers to WorldMap users
Harvard
Hypermap
WorldMap
Architecture
HHypermap interface
Need for a task queue
SLOW!!!
Producer
Is the code that
places the tasks to
be executed later
in the broker
Celery messages
Consumer
Takes tasks from
the broker and
process them in a
worker
Replacing cron jobs
Replacing cron jobs
Workers and threads with htop
Monitoring
Monitoring a task
Thanks!
Question and Answer

More Related Content

What's hot

Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Shubham Tagra
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...Flink Forward
 
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn
 
gRPC Design and Implementation
gRPC Design and ImplementationgRPC Design and Implementation
gRPC Design and ImplementationVarun Talwar
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Flink Forward
 
Dynamo db and Cross Region Migration
Dynamo db and Cross Region MigrationDynamo db and Cross Region Migration
Dynamo db and Cross Region MigrationAnamika Gupta
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYEKnoldus Inc.
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksEron Wright
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeTao Feng
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkKnoldus Inc.
 

What's hot (20)

Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
 
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
Flink Forward San Francisco 2019: Apache Beam portability in the times of rea...
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...Flink Forward San Francisco 2019: The Trade Desk's Year in Flink -  Jonathan ...
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
 
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
gRPC Design and Implementation
gRPC Design and ImplementationgRPC Design and Implementation
gRPC Design and Implementation
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
 
Dynamo db and Cross Region Migration
Dynamo db and Cross Region MigrationDynamo db and Cross Region Migration
Dynamo db and Cross Region Migration
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYE
 
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
Flink Forward Berlin 2017: Roberto Bentivoglio, Saverio Veltri - NSDB (Natura...
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
 

Similar to Maintaining spatial data infrastructures (SDIs) using distributed task queues

Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Thomas Weise
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward
 
HBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYCHBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYCmlai
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 
Cloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management SystemCloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management SystemLukas Forer
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Globus
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftRX-M Enterprises LLC
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native EnvironmentsWSO2
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
The new (is it really ) api stack
The new (is it really ) api stackThe new (is it really ) api stack
The new (is it really ) api stackRed Hat
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 

Similar to Maintaining spatial data infrastructures (SDIs) using distributed task queues (20)

Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
 
HBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYCHBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYC
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Cloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management SystemCloudgene - A MapReduce based Workflow Management System
Cloudgene - A MapReduce based Workflow Management System
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments[WSO2Con Asia 2018] Architecting for Container-native Environments
[WSO2Con Asia 2018] Architecting for Container-native Environments
 
Flour
FlourFlour
Flour
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
The new (is it really ) api stack
The new (is it really ) api stackThe new (is it really ) api stack
The new (is it really ) api stack
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 

More from Paolo Corti

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019Paolo Corti
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Paolo Corti
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructurePaolo Corti
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016Paolo Corti
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitariePaolo Corti
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Paolo Corti
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Paolo Corti
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demoPaolo Corti
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionPaolo Corti
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...Paolo Corti
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Paolo Corti
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Paolo Corti
 

More from Paolo Corti (13)

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data Infrastructure
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze Umanitarie
 
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
Implementing an Open Source Spatiotemporal Search Platform for Spatial Data I...
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demo
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk Reduction
 
Geonode 2.0
Geonode 2.0Geonode 2.0
Geonode 2.0
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1
 

Recently uploaded

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 

Recently uploaded (20)

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 

Maintaining spatial data infrastructures (SDIs) using distributed task queues

  • 1. Maintaining Spatial Data Infrastructures (SDIs) using distributed task queues Paolo Corti and Ben Lewis Harvard Center for Geographic Analysis 2017 FOSS4G Boston
  • 2. Background Harvard Center for Geographic Analysis • WorldMap http://worldmap.harvard.edu – Biggest GeoNode instance on the planet – https://github.com/cga-harvard/cga-worldmap • HHypermap http://hh.worldmap.harvard.edu – Map service registry – https://github.com/cga-harvard/HHypermap
  • 3. Note Billion Object Platform (BOP) https://github.com/cga-harvard/hhypermap-bop
  • 4. Demo of WorldMap / HHypermap
  • 5.
  • 6. The need for an asynchronous processor In WorldMap and HHypermap there are operations run by users which are time consuming and cannot be handled in the context of a web request ● Harvest the metadata of a service and its layers ● Synchronize the metadata of a new or updated layer to the search engine ● Feed a gazetteer when a new layer is uploaded or updated ● Upload a spatial datasets to the server ● Create a new layer using a table join
  • 7. HTTP request/response cycle must be fast ● In web applications the HTTP request/response cycle can be synchronous as long as there are very quick interactions between the client and the server ● unfortunately there are cases when the cycle become slower ● In these situations the best practice for a web application is to process asynchronously these tasks using a task queue
  • 8. Task Queues Asynchronous processing in a web application can be delegated to a task queue, which is a system for parallel execution of tasks in a non-blocking fashion
  • 10. Asynchronous processing model ● The asynchronous processing model is composed by services that produce processing tasks (producers) and by services which consume and process these tasks (consumers) accordingly ● A message queue is a broker which facilitates message passing by providing a protocol or interface which other services can access. Work can be distributed across threads or machines ● In the context of a web application the producer is the client application that creates messages based on the user interaction. The consumer is a daemon process that can consume the messages and run the needed process
  • 11. Glossary ● Task Queue: a system for parallel execution of tasks in a non-blocking fashion ● Broker or Message Queue: provides a protocol or interface for messages exchanging between different services and applications ● Producer: the code that places the tasks to be executed later in the broker ● Consumer or Worker: takes tasks from the broker and process them ● Exchange: takes a message from a producer and route it to zero or more queues (messages routing) Tasks must be consumed faster than being produced. If not, add more workers
  • 12. Use cases for task queues ● in web applications some process is taking too much time and must be processed asynchronously ● heterogeneous applications/services in a given system architecture need an easy way to reliably communicate between each other ● periodic operations (vs crontab) ● a way of parallelizing tasks in multi processors ● monitor processes and analyze failing tasks (and execute them again)
  • 13. Typical use cases for a task queue in a web application ● Thumbnails generation ● Sending bulk email ● Fetching large amounts of data from APIs ● Performing time-intensive calculations ● Expensive queries ● Search engine index synchronization ● Interaction with another application/service ● Replacing cron jobs (backups, maintenance, etc…)
  • 14. Typical use cases for a task queue in a GIS Portal/SDI ● Upload a shapefile to the server (GeoNode) ● Thumbnails generation for layers and maps (GeoNode) ● OGC services harvesting (Harvard Hypermap) ● Geoprocessing operations ● Geospatial data maintenance
  • 15. Producer, broker and consumer architecture Producer Consumer Producer Broker Consumer Producer Broker Consumer Producer Broker Consumer Producer
  • 16. Message brokers implementations Most of them are open source! ● RabbitMQ (AMQP, STOMP, JMS) ● Apache ActiveMQ (STOMP, JMS) ● Amazon Simple Queue Service (JMS) ● Apache Kafka Several standard protocols: ● AMQP, STOMP, JMS, MSMQ (Microsoft .NET)
  • 17. Tasks (Jobs) queues implementations ● Celery (RabbitMQ, Redis, Amazon SQS, Zookeeper) ● Redis Queue (Redis) ● Resque (Redis) ● Kue (Redis) And many others!
  • 18. Celery ● asynchronous task queue based on distributed message passing ● focused on real-time operation, but supports scheduling as well ● the execution units, called tasks, are executed concurrently on a single or more worker servers ● it supports many message brokers (RabbitMQ, Redis, MongoDB, CouchDB, ...) ● written in Python but it can operate with other languages ● great integration with Django! ● great monitoring tools (Flower, django-celery-results)
  • 19. RabbitMQ ● RabbitMQ is a message broker: it accepts and forwards messages ● most widely deployed open source broker (35k+ deployments) ● support many message protocols ● supported by many operating systems and languages ● Written in Erlang
  • 21. A real use case: Harvard Hypermap HHypermap (Harvard Hypermap) Registry is a platform that manages OWS, Esri REST, and other types of map service harvesting, and orchestration and maintains uptime statistics for services and layers. Where possible, layers are cached by MapProxy. HHypermap provides thousands of remote layers to WorldMap users
  • 24. Need for a task queue SLOW!!!
  • 25. Producer Is the code that places the tasks to be executed later in the broker
  • 27. Consumer Takes tasks from the broker and process them in a worker
  • 30. Workers and threads with htop