MongoDB as a Cloud Queue

•Download as PPTX, PDF•

1 like•988 views

MongoDB

USAGE
Ordered execution
Buffering consumer/producer
Work distribution

GOALS OF PROJECT
Leverage Mongo
• Reduce ops overhead by reusing infrastructure
• Map queue semantics to Mongo’s strengths

Reliable
• Durable - support long running process
• Resilient to machine failure
• Narrow down window of failure/ data loss.

Centralized, distributed:
• Multiple producers
• Multiple consumers

ITERATION 0
Capped collection – not the perfect choice
• Tailing queue seems attractive, but…
• Need external sync to avoid double-consume
• Secondary indexes and updating are anti-pattern

Relaxing FIFO is OK
• No guarantee that first-popped is first done
• Multi-client is negated if they have to sync on execution order
• Race condition for queue insertion has same effect

Conclusion: Project doesn’t use capped collection and
relaxes FIFO.

PARANOID BY DESIGN

Network dies
Process dies
DB dies

Machine dies Poison letter Dead letter

$ITERATION 1 db.q4foo.save({v:{f:1}}) db.q4foo.findAndModify({query: {}, sort: {_id:1}, remove: true}) Hot: quick and simple Not: dead client, dead in transit, no trace$

ARE WE THERE YET?

Network dies
Process dies
DB dies

Machine dies Poison letter Dead letter

QUEUE SEMANTICS
Local / Memory Distributed
Push Put
Pop Get << visibility >>
<< exception >> Release << retry >>
Delete
<< exception >>

$ITERATION 2 db.q4foo.save({v:{f:1}, dq: null}) db.q4foo.findAndModify( { query: { dq: null}, sort: {_id:1}, update:{ $set: { dq: later(60)}}}) … If processing was success => delete.. Hot: If client dies, item remains in queue. Data not lost. Not: index on _id less useful in high volume.$

$ITERATION 3 db.q4foo.save({v:{f:1}, dq: null, pc: 0}) db.q4foo.findAndModify({ query: { dq: null, pc:{$lt:3}}, sort: {_id:1}, update:{$set:{dq:later(60)},$inc:{pc:1}}}) // consume db.q4foo.findAndModify({ query: {_id:"..."}, update:{$set:{dq: null}}}) // release Hot: An item can be retried automatically (pc) after released. Exhausted item remains in queue. Not: Not strict FIFO.$

ARE WE THERE? YES.

Network dies
Process dies
DB dies

Machine dies Poison letter Dead letter

ITERATION 4

Ensure your queue writes use applicable durability
• db.q4foo.save() + getLastError(…)
• db.q4foo.findAndModify () + getLastError(…)

Replica sets for durability only. No capacity or speed gain.

OTHER THOUGHTS
Create admin jobs to monitor queues:
• Growth
• Retries exhausted

Consider TTL risks (ex: client failure before calling Release())

Consider idempotent operations when possible

Design clients to back off polling

Separate queue vs. extra “topic” field

Consider dedicated DB for write-lock scope

Capped vs. regular collection – capped now can have _id, in-place update.

Q&A

Thank you!

Nuri Halperin

nuri@plusnconsulting.com

What's hot

DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDYMalikireddy Bramhananda Reddy

Functional Reactive Programming with RxJSstefanmayer13

Spock FrameworkDaniel Kolman

Spock Testing Framework - The Next GenerationBTI360

Spock: Test Well and ProsperKen Kousen

Full Text Search in PostgreSQLAleksander Alekseev

Python in the databasepybcn

Swift after one week of codingSwiftWro

Леонид Шевцов «Clojure в деле»DataArt

Concurrency Concepts in JavaDoug Hawkins

CeleryÒscar Vilaplana

Data recovery using pg_filedumpAleksander Alekseev

Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseSages

RestMQ - HTTP/Redis based Message QueueGleicon Moraes

Using Cerberus and PySpark to validate semi-structured datasetsBartosz Konieczny

mysql 高级优化之理解索引使用nigel889

Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob LisiInfluxData

TensorFlow BASTA2018 MachinelearningMax Kleiner

Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...InfluxData

Gur1009Cdiscount

What's hot (20)

DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY

Functional Reactive Programming with RxJS

Spock Framework

Spock Testing Framework - The Next Generation

Spock: Test Well and Prosper

Full Text Search in PostgreSQL

Python in the database

Swift after one week of coding

Леонид Шевцов «Clojure в деле»

Concurrency Concepts in Java

Celery

Data recovery using pg_filedump

Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course

RestMQ - HTTP/Redis based Message Queue

Using Cerberus and PySpark to validate semi-structured datasets

mysql 高级优化之理解索引使用

Using Grafana with InfluxDB 2.0 and Flux Lang by Jacob Lisi

TensorFlow BASTA2018 Machinelearning

Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...

Gur1009

Similar to MongoDB as a Cloud Queue

Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks

Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit

介绍 Percona 服务器 XtraDB 和 XtrabackupYUCHENG HU

Distributed Queries in IDS: New features.Keshav Murthy

Apache Pinot Meetup Sept02, 2020Mayank Shrivastava

Dissecting Real-World Database Performance DilemmasScyllaDB

2013 london advanced-replicationMarc Schwering

Data herdingunbracketed

Yevhen Tatarynov "From POC to High-Performance .NET applications"LogeekNightUkraine

ETL with SPARK - First Spark London meetupRafal Kwasny

Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsDatabricks

NoSQL InfrastructureServer Density

Kerberizing spark. Spark Summit eastJorge Lopez-Malla

Nyt Prof 200910HighLoad2009

MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB

Data Pipeline at TapadToby Matejovsky

Handout3oShahbaz Sidhu

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Data Con LA

OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS

Similar to MongoDB as a Cloud Queue (20)

Project Tungsten: Bringing Spark Closer to Bare Metal

Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar

介绍 Percona 服务器 XtraDB 和 Xtrabackup

Distributed Queries in IDS: New features.

Apache Pinot Meetup Sept02, 2020

Dissecting Real-World Database Performance Dilemmas

2013 london advanced-replication

Data herding

Yevhen Tatarynov "From POC to High-Performance .NET applications"

ETL with SPARK - First Spark London meetup

Tracing the Breadcrumbs: Apache Spark Workload Diagnostics

NoSQL Infrastructure

Kerberizing spark. Spark Summit east

Nyt Prof 200910

MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector

Data Pipeline at Tapad

Handout3o

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...

OSDC 2012 | Scaling with MongoDB by Ross Lawley

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB as a Cloud Queue

1. QUEUE IN THE CLOUD WITH MONGODB MONGODB LA 2013 NURI HALPERIN

2. QUEUE

3. USAGE Ordered execution Buffering consumer/producer Work distribution

4. GOALS OF PROJECT Leverage Mongo • Reduce ops overhead by reusing infrastructure • Map queue semantics to Mongo’s strengths Reliable • Durable - support long running process • Resilient to machine failure • Narrow down window of failure/ data loss. Centralized, distributed: • Multiple producers • Multiple consumers

5. ITERATION 0 Capped collection – not the perfect choice • Tailing queue seems attractive, but… • Need external sync to avoid double-consume • Secondary indexes and updating are anti-pattern Relaxing FIFO is OK • No guarantee that first-popped is first done • Multi-client is negated if they have to sync on execution order • Race condition for queue insertion has same effect Conclusion: Project doesn’t use capped collection and relaxes FIFO.

6. PARANOID BY DESIGN Network dies Process dies DB dies Machine dies Poison letter Dead letter

7. ITERATION 1 db.q4foo.save({v:{f:1}}) db.q4foo.findAndModify({query: {}, sort: {_id:1}, remove: true}) Hot: quick and simple Not: dead client, dead in transit, no trace

8. ARE WE THERE YET? Network dies Process dies DB dies Machine dies Poison letter Dead letter

9. QUEUE SEMANTICS Local / Memory Distributed Push Put Pop Get << visibility >> << exception >> Release << retry >> Delete << exception >>

10. ITERATION 2 db.q4foo.save({v:{f:1}, dq: null}) db.q4foo.findAndModify( { query: { dq: null}, sort: {_id:1}, update:{ $set: { dq: later(60)}}}) … If processing was success => delete.. Hot: If client dies, item remains in queue. Data not lost. Not: index on _id less useful in high volume.

11. ARE WE THERE YET? Network dies Process dies DB dies Machine dies Poison letter Dead letter

12. ITERATION 3 db.q4foo.save({v:{f:1}, dq: null, pc: 0}) db.q4foo.findAndModify({ query: { dq: null, pc:{$lt:3}}, sort: {_id:1}, update:{$set:{dq:later(60)},$inc:{pc:1}}}) // consume db.q4foo.findAndModify({ query: {_id:"..."}, update:{$set:{dq: null}}}) // release Hot: An item can be retried automatically (pc) after released. Exhausted item remains in queue. Not: Not strict FIFO.

13. ARE WE THERE? YES. Network dies Process dies DB dies Machine dies Poison letter Dead letter

14. ITERATION 4 Ensure your queue writes use applicable durability • db.q4foo.save() + getLastError(…) • db.q4foo.findAndModify () + getLastError(…) Replica sets for durability only. No capacity or speed gain.

15. OTHER THOUGHTS Create admin jobs to monitor queues: • Growth • Retries exhausted Consider TTL risks (ex: client failure before calling Release()) Consider idempotent operations when possible Design clients to back off polling Separate queue vs. extra “topic” field Consider dedicated DB for write-lock scope Capped vs. regular collection – capped now can have _id, in-place update.

16. Q&A Thank you! Nuri Halperin nuri@plusnconsulting.com

Editor's Notes

Queue holds elements until they are required. Items in the queue are accessed from the head of the queue only – implied order.
Add capped collection

MongoDB as a Cloud Queue

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB as a Cloud Queue

Similar to MongoDB as a Cloud Queue (20)

More from MongoDB

More from MongoDB (20)

MongoDB as a Cloud Queue

Editor's Notes