Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ReadConcern and WriteConcern

520 views

Published on

Speaker: Alex Komyagin

MongoDB replica sets allow you to make the database highly available so that you can keep your applications running even when some of the database nodes are down. In a distributed system, local durability of writes with journaling is no longer enough to guarantee system-wide durability, as the node might go down just before any other node replicates new write operations from it. As such, we need a new concept of cluster-wide durability.

How do you make sure that your write operations are durable within a replica set? How do you make sure that your read operations do not see those writes that are not yet durable? This talk will cover the mechanics of ensuring durability of writes via write concern and how to prevent reading of stale data in MongoDB using read concern. We will discuss the decision flow for selecting an appropriate level of write concern, as well as associated tradeoffs and several practical use cases and examples."

  • Be the first to comment

ReadConcern and WriteConcern

  1. 1. # M D B l o c a l Alex Komyagin Senior Consulting Engineer MongoDB
  2. 2. O C T O B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O # M D B l o c a l Who stole my write? Or the story of Write Concern and Read Concern
  3. 3. # M D B l o c a l WHAT ARE WE GOING TO LEARN TODAY? • What those things are - Write Concern and Read Concern • What you can do with them • What you should do with them
  4. 4. # M D B l o c a l TYPICAL WRITE WORKFLOW The App Secondaryjournal In-memory structures and oplog data files {name:”Alex”} {ok:1} 1 2 3 4 5 6 7 Secondary Primary
  5. 5. # M D B l o c a l WANT TO SEE A MAGIC TRICK?
  6. 6. # M D B l o c a l WRITE SOME DATA The App Secondaryjournal In-memory structures and oplog data files Secondary Primary {x:1},...,{x:99} {ok:1} {x:99} … {x:1} {x:99} … {x:1} {x:99} … {x:1}
  7. 7. # M D B l o c a l WRITE SOME MORE The App Secondaryjournal In-memory structures and oplog data files Secondary Primary {x:100} {ok:1} {x:100} {x:99} … {x:1} {x:99} … {x:1} {x:99} … {x:1}
  8. 8. # M D B l o c a l OOOPSIE! The App Secondaryjournal In-memory structures and oplog data files Secondary Primary {x:100} {ok:1} {x:100} {x:99} … {x:1} {x:99} … {x:1} {x:99} … {x:1}
  9. 9. # M D B l o c a l KEEP WRITING The App Secondary Primary Primary {x:101} {ok:1} {x:100} {x:99} … {x:1} {x:101} {x:99} … {x:1} {x:101} {x:99} … {x:1}
  10. 10. # M D B l o c a l THE OLD PRIMARY COMES BACK ONLINE The App Secondary Primary ??? {x:101} {ok:1} {x:100} {x:99} … {x:1} {x:101} {x:99} … {x:1} {x:101} {x:99} … {x:1}
  11. 11. # M D B l o c a l HE HAS TO FIX HIS STATE TO RESUME REPLICATION The App Secondary Primary ROLLBACK {x:100} {x:99} … {x:1} {x:101} {x:99} … {x:1} {x:101} {x:99} … {x:1} <dbpath>/rollback/<...>.bson {x:99} is the last common point
  12. 12. # M D B l o c a l …AND THINGS ARE BACK TO NORMAL The App Secondary Primary Secondary {x:101} {x:99} … {x:1} {x:101} {x:99} … {x:1} {x:101} {x:99} … {x:1} <dbpath>/rollback/<...>.bson The {x:100} write is not lost per se, but is not accessible for the app
  13. 13. # M D B l o c a l Rollback is entirely unavoidable, but it is not a problem, it’s like self-healing
  14. 14. # M D B l o c a l SO WHERE WAS THE PROBLEM? The App Secondaryjournal In-memory structures and oplog data files Secondary Primary {x:100} {ok:1} {x:100} {x:99} … {x:1} {x:99} … {x:1} {x:99} … {x:1} The App got the “OK” before the write was replicated to any of the secondaries
  15. 15. # M D B l o c a l Solution – write receipt
  16. 16. # M D B l o c a l WRITE CONCERN • Form of an intelligent receipt/confirmation that the write operation was replicated to the desired number of nodes • Default number is 1 • Allows us to express how concerned we are with durability of a particular write in a replica set • Can be set for individual ops / collections / etc • NOT a distributed transaction db.test.insert({x:100},{writeConcern:{w:2}})
  17. 17. # M D B l o c a l HOW DOES IT WORK? • Different levels • {w:<N>} • {w:<N>, j:true} • Includes secondaries since 3.2 • {w:”majority”} - implies {j:true} in MongoDB 3.2+ • Guarantees that confirmed operations won’t be rolled back • Supports timeout • {w:2, wtimeout:100} • Timeout doesn’t imply a write failure - you just get no receipt
  18. 18. # M D B l o c a l WRITE CONCERN TRADEOFFS • Choose {w:”majority”} for writes that matter • The main tradeoff is latency • It’s not as bad as you think (within the same DC, AZ or even region) • Use multiple threads to get desired throughput • Use async frameworks in user facing applications, if needed • For cross-regional deployments choose {w:2} • Reasonable compromise between performance and durability
  19. 19. # M D B l o c a l Failures
  20. 20. # M D B l o c a l WHAT HAPPENS IF WRITE CONCERN FAILS? • “wtimeout” only generates a write concern failure exception • Similar to network exceptions • No useful information in a failure • App code has to handle exceptions and retry when appropriate • Writes need to be made idempotent (e.g. updates with $inc -> $set) • When idempotency is not possible, at least log the failures • Retriable writes: Coming soon! db.test.insert({name:”Alex”}, {writeConcern:{w:2,wtimeout:1000}} writeConcernError SecondaryPrimary
  21. 21. # M D B l o c a l BEST EFFORT WRITE CODE EXAMPLE • Replica set with 2 data nodes and an arbiter • One node goes down every 90 seconds • Inserting 2mln records • w:1 - only 1999911 records were actually there in the end! client = MongoClient("mongodb://a,b,c/?replicaSet=rs") coll = client.test_db.test_col i = 0 while i < 2000000: my_object = {'number': i} try: coll.insert(my_object) except: while True: # repeat until success or we hit a dup key error try: coll.insert(my_object) break except DuplicateKeyError: break except ConnectionFailure: pass i += 1
  22. 22. # M D B l o c a l HOW TO MAKE IT BETTER? • Use write concern to know if writes are durable • We’ll pay with additional latency for writes that might never be rolled back (but we don’t know that!) • It’s not practical to wait for every write - Use bulk inserts client = MongoClient("mongodb://a,b,c/?replicaSet=rs") coll = client.test_db.test_col i = 0 while i < 2000000: my_object = {'number': i} try: coll.insert(my_object) except: while True: # repeat until success or we hit a dup key error try: coll.insert(my_object) break except DuplicateKeyError: break except ConnectionFailure: pass i += 1
  23. 23. # M D B l o c a l client = MongoClient("mongodb://a,b,c/?replicaSet=rs") coll = client.test_db.test_col.with_options(write_concern=WriteConcern(w=2)) i=0 while i<20000: requests = [] for j in range(0,100): requests.append(InsertOne({"number":i*100+j})) while True: #repeat until success or write concern is satisfied try: coll.bulk_write(requests, ordered=False) break except BulkWriteError as bwe: if bwe.details.get('writeConcernErrors') == []: break except ConnectionFailure: pass i+=1 BETTER, SAFER CODE • db.test.count() is 2000000 after the test • Takes the same amount of time with w:2 as w:1 Insert batch Next! Success Problems? No write concern errors Otherwise
  24. 24. # M D B l o c a l Let’s look at the reads now
  25. 25. # M D B l o c a l WHAT IS A DIRTY READ? The App Secondaryjournal In-memory structures and oplog data files Secondary Primary db.test.find({x:100}) {x:100} {x:100} {x:99} … {x:1} {x:99} … {x:1} {x:99} … {x:1}
  26. 26. # M D B l o c a l WHAT IS A DIRTY READ? The App Secondary Primary Secondary {x:101} {x:99} … {x:1} {x:101} {x:99} … {x:1} {x:101} {x:99} … {x:1} <dbpath>/rollback/<...>.bson db.test.find({x:100}) null
  27. 27. # M D B l o c a l READ CONCERN • Determines which data to return from a query • Different modes: - Local - Majority (3.2) - Linearizable (3.4) • NOT related to read preferences Secondaryjournal In-memory structures and oplog data files Secondary Primary {x:100} - local {x:99} - majority {x:98} … {x:1} {x:99} - majority/local {x:98} … {x:1} {x:99} - local {x:98} - majority … {x:1}
  28. 28. # M D B l o c a l READ CONCERN • db.test.find( { x:100 } ) - WORKS • db.test.find( { x:100 } ).readConcern("majority") - RETURNS “null” • db.test.find( { x:100 } ).readConcern("linearizable") - BLOCKS until the last write is replicated - Use the maxTimeMS() option to avoid blocking forever Secondary Primary {x:100} - local {x:99} - majority {x:98} … {x:1} {x:99} - local {x:98} - majority … {x:1}
  29. 29. # M D B l o c a l MAJORITY VS. LINEARIZABLE • Return data that won’t be rolled back • “Majority” returns the most recent data replicated to a majority of nodes that this particular node knows about - Each node maintains and advances a separate “majority-committed” pointer/snapshot • “Linearizable” ensures that this data is the most recent - Enables multiple threads to perform reads and writes on a single document as if a single thread performed these operations in real time - Only on Primary - Significantly slower than “majority” • In most applications dirty reads is not a big problem - If write failures are handled correctly, the “dirtiness” is temporary - Twitter vs. Changing your password
  30. 30. # M D B l o c a l DID WE FORGET ANYTHING? • Read preference controls where we are reading from • Read concern controls what we are reading • Causal consistency, new in 3.6, allows us to read what we wrote from any node • Extension for read concern (read-after-optime) • Compatible with read concern “majority” • Enabled on the session level Secondary Primary {x:100} - local {x:99} - majority {x:98} … {x:1} {x:99} - local {x:98} - majority … {x:1} The App Reads Writes Readsdb.getMongo().setCausalConsistency(true)
  31. 31. # M D B l o c a l Successes
  32. 32. # M D B l o c a l HOW TO CHOOSE THE RIGHT CONCERN? THINK WHAT YOUR USERS CARE ABOUT Writing important data that has to be durable? • Example: ETL process for reporting • Use {w:2}* or {w:”majority”} Reads must see the most recent durable state (can’t be stale or uncommitted)? • Example: Credentials Management Application • Use {w:”majority”} and “linearizable” read concern Mission-critical data where dirty reads are not allowed? • Example: Config servers in sharding • Use {w:”majority”} and “majority” read concern
  33. 33. # M D B l o c a l DOES MY DRIVER SUPPORT THIS?? • Java - https://mongodb.github.io/mongo-java-driver/3.4/javadoc/com/mongodb/WriteConcern.html - https://mongodb.github.io/mongo-java-driver/3.4/javadoc/com/mongodb/ReadConcern.html • C# - https://mongodb.github.io/mongo-csharp-driver/2.3/apidocs/html/T_MongoDB_Driver_WriteConcern.htm - https://mongodb.github.io/mongo-csharp-driver/2.3/apidocs/html/T_MongoDB_Driver_ReadConcern.htm • PHP - http://php.net/manual/en/mongo.writeconcerns.php - http://php.net/manual/en/class.mongodb-driver-readconcern.php • Others do, too!
  34. 34. # M D B l o c a l THANK YOU! TIME FOR YOUR QUESTIONS My name is Alex Don’t email me here: alex@mongodb.com
  35. 35. # M D B l o c a l MORE RESOURCES • Documentation is where we all start: https://docs.mongodb.com/manual/reference/write-concern/ https://docs.mongodb.com/manual/reference/read-concern/ • Great presentation by Jesse Davis on resilient operations: https://www.slideshare.net/mongodb/mongodb-world-2016-smart-strategies-for-resilient- applications

×