Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Maintaining Spatial Data Infrastructures (SDIs)
using distributed task queues
Paolo Corti and Ben Lewis
Harvard Center for...
Background
Harvard Center for Geographic Analysis
• WorldMap http://worldmap.harvard.edu
– Biggest GeoNode instance on the...
Note
Billion Object Platform (BOP)
https://github.com/cga-harvard/hhypermap-bop
Demo of WorldMap / HHypermap
The need for an asynchronous processor
In WorldMap and HHypermap there are operations run by users which are
time consumin...
HTTP request/response cycle must be fast
● In web applications the HTTP
request/response cycle can be
synchronous as long ...
Task Queues
Asynchronous processing in a web application can be
delegated to a task queue, which is a system for parallel
...
Asynchronous processing model
Asynchronous processing model
● The asynchronous processing model is composed by services that
produce processing tasks (p...
Glossary
● Task Queue: a system for parallel execution of tasks in a non-blocking
fashion
● Broker or Message Queue: provi...
Use cases for task queues
● in web applications some process is taking too much time
and must be processed asynchronously
...
Typical use cases for a task queue in a web application
● Thumbnails generation
● Sending bulk email
● Fetching large amou...
Typical use cases for a task queue in a GIS Portal/SDI
● Upload a shapefile to the server (GeoNode)
● Thumbnails generatio...
Producer, broker and consumer architecture
Producer
Consumer
Producer
Broker
Consumer
Producer
Broker
Consumer
Producer
Br...
Message brokers implementations
Most of them are open source!
● RabbitMQ (AMQP, STOMP, JMS)
● Apache ActiveMQ (STOMP, JMS)...
Tasks (Jobs) queues implementations
● Celery (RabbitMQ, Redis, Amazon SQS, Zookeeper)
● Redis Queue (Redis)
● Resque (Redi...
Celery
● asynchronous task queue based on distributed message
passing
● focused on real-time operation, but supports sched...
RabbitMQ
● RabbitMQ is a message broker: it accepts and
forwards messages
● most widely deployed open source broker (35k+
...
Architecture of Celery/RabbitMQ
https://tests4geeks.com/python-celery-rabbitmq-tutorial/
A real use case: Harvard Hypermap
HHypermap (Harvard Hypermap)
Registry is a platform that manages
OWS, Esri REST, and oth...
Harvard
Hypermap
WorldMap
Architecture
HHypermap interface
Need for a task queue
SLOW!!!
Producer
Is the code that
places the tasks to
be executed later
in the broker
Celery messages
Consumer
Takes tasks from
the broker and
process them in a
worker
Replacing cron jobs
Replacing cron jobs
Workers and threads with htop
Monitoring
Monitoring a task
Thanks!
Question and Answer
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Upcoming SlideShare
Loading in …5
×

Maintaining spatial data infrastructures (SDIs) using distributed task queues

336 views

Published on

Using a task queue to process asynchronously requests in a geoportal

Published in: Technology
  • Be the first to comment

Maintaining spatial data infrastructures (SDIs) using distributed task queues

  1. 1. Maintaining Spatial Data Infrastructures (SDIs) using distributed task queues Paolo Corti and Ben Lewis Harvard Center for Geographic Analysis 2017 FOSS4G Boston
  2. 2. Background Harvard Center for Geographic Analysis • WorldMap http://worldmap.harvard.edu – Biggest GeoNode instance on the planet – https://github.com/cga-harvard/cga-worldmap • HHypermap http://hh.worldmap.harvard.edu – Map service registry – https://github.com/cga-harvard/HHypermap
  3. 3. Note Billion Object Platform (BOP) https://github.com/cga-harvard/hhypermap-bop
  4. 4. Demo of WorldMap / HHypermap
  5. 5. The need for an asynchronous processor In WorldMap and HHypermap there are operations run by users which are time consuming and cannot be handled in the context of a web request ● Harvest the metadata of a service and its layers ● Synchronize the metadata of a new or updated layer to the search engine ● Feed a gazetteer when a new layer is uploaded or updated ● Upload a spatial datasets to the server ● Create a new layer using a table join
  6. 6. HTTP request/response cycle must be fast ● In web applications the HTTP request/response cycle can be synchronous as long as there are very quick interactions between the client and the server ● unfortunately there are cases when the cycle become slower ● In these situations the best practice for a web application is to process asynchronously these tasks using a task queue
  7. 7. Task Queues Asynchronous processing in a web application can be delegated to a task queue, which is a system for parallel execution of tasks in a non-blocking fashion
  8. 8. Asynchronous processing model
  9. 9. Asynchronous processing model ● The asynchronous processing model is composed by services that produce processing tasks (producers) and by services which consume and process these tasks (consumers) accordingly ● A message queue is a broker which facilitates message passing by providing a protocol or interface which other services can access. Work can be distributed across threads or machines ● In the context of a web application the producer is the client application that creates messages based on the user interaction. The consumer is a daemon process that can consume the messages and run the needed process
  10. 10. Glossary ● Task Queue: a system for parallel execution of tasks in a non-blocking fashion ● Broker or Message Queue: provides a protocol or interface for messages exchanging between different services and applications ● Producer: the code that places the tasks to be executed later in the broker ● Consumer or Worker: takes tasks from the broker and process them ● Exchange: takes a message from a producer and route it to zero or more queues (messages routing) Tasks must be consumed faster than being produced. If not, add more workers
  11. 11. Use cases for task queues ● in web applications some process is taking too much time and must be processed asynchronously ● heterogeneous applications/services in a given system architecture need an easy way to reliably communicate between each other ● periodic operations (vs crontab) ● a way of parallelizing tasks in multi processors ● monitor processes and analyze failing tasks (and execute them again)
  12. 12. Typical use cases for a task queue in a web application ● Thumbnails generation ● Sending bulk email ● Fetching large amounts of data from APIs ● Performing time-intensive calculations ● Expensive queries ● Search engine index synchronization ● Interaction with another application/service ● Replacing cron jobs (backups, maintenance, etc…)
  13. 13. Typical use cases for a task queue in a GIS Portal/SDI ● Upload a shapefile to the server (GeoNode) ● Thumbnails generation for layers and maps (GeoNode) ● OGC services harvesting (Harvard Hypermap) ● Geoprocessing operations ● Geospatial data maintenance
  14. 14. Producer, broker and consumer architecture Producer Consumer Producer Broker Consumer Producer Broker Consumer Producer Broker Consumer Producer
  15. 15. Message brokers implementations Most of them are open source! ● RabbitMQ (AMQP, STOMP, JMS) ● Apache ActiveMQ (STOMP, JMS) ● Amazon Simple Queue Service (JMS) ● Apache Kafka Several standard protocols: ● AMQP, STOMP, JMS, MSMQ (Microsoft .NET)
  16. 16. Tasks (Jobs) queues implementations ● Celery (RabbitMQ, Redis, Amazon SQS, Zookeeper) ● Redis Queue (Redis) ● Resque (Redis) ● Kue (Redis) And many others!
  17. 17. Celery ● asynchronous task queue based on distributed message passing ● focused on real-time operation, but supports scheduling as well ● the execution units, called tasks, are executed concurrently on a single or more worker servers ● it supports many message brokers (RabbitMQ, Redis, MongoDB, CouchDB, ...) ● written in Python but it can operate with other languages ● great integration with Django! ● great monitoring tools (Flower, django-celery-results)
  18. 18. RabbitMQ ● RabbitMQ is a message broker: it accepts and forwards messages ● most widely deployed open source broker (35k+ deployments) ● support many message protocols ● supported by many operating systems and languages ● Written in Erlang
  19. 19. Architecture of Celery/RabbitMQ https://tests4geeks.com/python-celery-rabbitmq-tutorial/
  20. 20. A real use case: Harvard Hypermap HHypermap (Harvard Hypermap) Registry is a platform that manages OWS, Esri REST, and other types of map service harvesting, and orchestration and maintains uptime statistics for services and layers. Where possible, layers are cached by MapProxy. HHypermap provides thousands of remote layers to WorldMap users
  21. 21. Harvard Hypermap WorldMap Architecture
  22. 22. HHypermap interface
  23. 23. Need for a task queue SLOW!!!
  24. 24. Producer Is the code that places the tasks to be executed later in the broker
  25. 25. Celery messages
  26. 26. Consumer Takes tasks from the broker and process them in a worker
  27. 27. Replacing cron jobs
  28. 28. Replacing cron jobs
  29. 29. Workers and threads with htop
  30. 30. Monitoring
  31. 31. Monitoring a task
  32. 32. Thanks! Question and Answer

×