Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building Real Time Systems on MongoDB Using the Oplog at Stripe

5,837 views

Published on

MongoDB's oplog is possibly its most underrated feature. The oplog is vital as the basis on which replication is built, but its value doesn't stop there. Unlike the MySQL binlog, which is poorly documented and not directly exposed to MySQL clients, the oplog is a well-documented, structured format for changes that is query-able through the same mechanisms as your data. This allows many types of powerful, application-driven streaming or transformation. At Stripe, we've used the MongoDB oplog to create PostgresSQL, HBase, and ElasticSearch mirrors of our data. We've built a simple real-time trigger mechanism for detecting new data. And we've even used it to recover data. In this talk, we'll show you how we use the MongoDB oplog, and how you can build powerful reactive streaming data applications on top of it.

If you'd like to see the presentation with presenter's notes, I've published my Google Docs presentation at https://docs.google.com/presentation/d/19NcoFI9BG7PwLoBV7zvidjs2VLgQWeVVcUd7Xc7NoV0/pub

Originally given at MongoDB World 2014 in New York

  • Login to see the comments

Building Real Time Systems on MongoDB Using the Oplog at Stripe

  1. 1. MongoDB and the Oplog EVAN BRODER @ebroder
  2. 2. AGENDA INTRO TO THE OPLOG EXAMPLE APPLICATIONS
  3. 3. INTRO TO THE OPLOG
  4. 4. PRIMARY SECONDARIES APPLICATION
  5. 5. APPLICATION save {_id: 1, a: 2}
  6. 6. THINGS I’VE DONE: - save {_id: 1, a: 2}
  7. 7. APPLICATION update where {a: 2}, {$set: {a: 3}}
  8. 8. THINGS I’VE DONE: - save {_id: 1, a: 2} - update {_id: 1}, {$set: {a: 3}}
  9. 9. THINGS I’VE DONE: - save {_id: 1, a: 2} - update {_id: 1}, {$set: {a: 3}} - insert… - delete… - delete… - save… - update…
  10. 10. THINGS I’VE DONE: save … THINGS I’VE DONE:
  11. 11. THINGS I’VE DONE: save … THINGS I’VE DONE: save …
  12. 12. TRIGGERS
  13. 13. GOAL: EVENT PROCESSING
  14. 14. GOAL: DETECT INSERTIONS
  15. 15. WARNING THIS CODE IS NOT PRODUCTION-READY
  16. 16. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  17. 17. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  18. 18. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  19. 19. oplog = mongo_connection['local']['oplog.rs'] ns = 'eventdb.events' oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.each do |op| puts op['o']['_id'] end end
  20. 20. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  21. 21. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  22. 22. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  23. 23. oplog.find({'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  24. 24. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end
  25. 25. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id']
  26. 26. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  27. 27. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  28. 28. DATA TRANSFORMATIONS
  29. 29. GOAL: MONGODB TO POSTGRESQL
  30. 30. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}, 'op' => 'i', 'ns' => ns}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  31. 31. start_entry = oplog.find_one({}, {:sort => {'$natural' => -1}}) start = start_entry['ts'] oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE) cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA) loop do cursor.each do |op| puts op['o']['_id'] end end end
  32. 32. cursor.each do |op| puts op['o']['_id'] end
  33. 33. cursor.each do |op| case op['op'] when 'i' puts op['o']['_id'] else # ¯_(ツ)_/¯ end end
  34. 34. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' else # ¯_(ツ)_/¯ end end
  35. 35. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' when 'd' query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect else # ¯_(ツ)_/¯ end end
  36. 36. query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect when 'u' query = "UPDATE #{op['ns']} SET" updates = op['o']['$set'] ? op['o']['$set'] : op['o'] updates.each do |k, v| query += " #{k}=#{v.inspect}" end query += " WHERE _id=" query += op['o2']['_id'].inspect else # ¯_(ツ)_/¯ end end
  37. 37. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' when 'd' query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect when 'u' query = "UPDATE #{op['ns']} SET" updates = op['o']['$set'] ? op['o']['$set'] : op['o'] updates.each do |k, v| query += " #{k}=#{v.inspect}" end query += " WHERE _id=" + op['o2']['_id'].inspect else # ¯_(ツ)_/¯ end end
  38. 38. github.com/stripe/mosql
  39. 39. github.com/stripe/zerowing
  40. 40. cursor.each do |op| case op['op'] when 'i' query = "INSERT INTO #{op['ns']} (" + op['o'].keys.join(', ') + ') VALUES (' + op['o'].values.map(&:inspect).join(', ') + ')' when 'd' query = "DELETE FROM #{op['ns']} WHERE _id=" + op['o']['_id'].inspect when 'u' query = "UPDATE #{op['ns']} SET" updates = op['o']['$set'] ? op['o']['$set'] : op['o'] updates.each do |k, v| query += " #{k}=#{v.inspect}" end query += " WHERE _id=" + op['o2']['_id'].inspect else # ¯_(ツ)_/¯ end end
  41. 41. DISASTER RECOVERY
  42. 42. task = collection.find_one({'finished' => nil} # do something with task… collection.update({'_id' => task.id}, {'$set' => {'finished' => Time.now.to_i}})
  43. 43. loop do collection.remove( {'finished' => {'$lt' => Time.now.to_i - 30}}) sleep(10) end
  44. 44. evan@caron:~$ mongo MongoDB shell version: 2.4.10 connecting to: test normal:PRIMARY> null < (Date.now() / 1000) - 30 true
  45. 45. THINGS I’VE DONE: insert delete … THINGS I’VE DONE:
  46. 46. > db.getReplicationInfo() { "logSizeMB" : 48964.3541015625, "usedMB" : 46116.4, "timeDiff" : 316550, "timeDiffHours" : 87.93, "tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)", "tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)", "now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)" }
  47. 47. > db.getReplicationInfo() { "logSizeMB" : 48964.3541015625, "usedMB" : 46116.4, "timeDiff" : 316550, "timeDiffHours" : 87.93, "tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)", "tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)", "now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)" }
  48. 48. new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == nil # found one! # save old_task to a file, and we'll re-queue it later end end old_connection['admin'].command({'applyOps' => [op]}) end end
  49. 49. new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end old_connection['admin'].command({'applyOps' => [op]}) end end
  50. 50. new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end old_connection['admin'].command({'applyOps' => [op]}) end end
  51. 51. THINGS I’VE DONE: save … THINGS I’VE DONE: save …
  52. 52. QUESTIONS?

×