Practical Ruby Projects with MongoDB - Ruby Kaigi 2010

2,799 views

Published on

Published in: Technology

Practical Ruby Projects with MongoDB - Ruby Kaigi 2010

  1. 1. PRACTICAL PROJECTS WITH
  2. 2. WHO AM I? Alex Sharp Lead Developer at OptimisDev @ajsharp (on the twitters) alexjsharp.com (on the ‘tubes) github.com/ajsharp (on the ‘hubs)
  3. 3. MAJOR POINTS • Making the case for Mongo • Brief intro to MongoDB • Practical Projects
  4. 4. MAKING THE CASE
  5. 5. THE PROBLEMS OF SQL
  6. 6. A BRIEF HISTORY OF SQL • Developed at IBM in early 1970’s. • Designed to manipulate and retrieve data stored in relational databases
  7. 7. A BRIEF HISTORY OF SQL • Developed at IBM in early 1970’s. • Designed to manipulate and retrieve data stored in relational databases this is a problem
  8. 8. WE DON’T WORK WITH DATA...
  9. 9. WE WORK WITH OBJECTS
  10. 10. WE DON’T CARE ABOUT STORING DATA
  11. 11. WE CARE ABOUT PERSISTING STATE
  12. 12. DATA != OBJECTS
  13. 13. THEREFORE...
  14. 14. BOLD STATEMENT FORTHCOMING...
  15. 15. relational DBs are an antiquated tool for our needs
  16. 16. OK, SO WHAT? Relational schemas are designed to store and query data, not persist objects.
  17. 17. To reconcile this mismatch, we have ORM’s
  18. 18. Object Relational Mappers
  19. 19. ORMs enable us to create a mapping between our native object model and a relational schema
  20. 20. We’re forced to create relationships for data when what we really want is properties for objects.
  21. 21. THIS IS NOT EFFICIENT
  22. 22. A WEIRD EXAMPLE...
  23. 23. @alex = Person.new( :name => "alex", :stalkings => [ Friend.new("Jim"), Friend.new("Bob")] )
  24. 24. NATIVE RUBY OBJECT <Person:0x10017d030 @name="alex", @stalkings= [#<Friend:0x10017d0a8 @name="Jim">, #<Friend:0x10017d058 @name="Bob"> ]>
  25. 25. JSON REPRESENTATION @alex.to_json => { name: "alex", stalkings: [{ name: "Jim" }, { name: "Bob" }] }
  26. 26. Relational Schema Representation people: - name stalkings: - name - stalker_id - stalkee_id
  27. 27. SQL Schema Representation people: - name stalkings: - name What!? - stalker_id - stalkee_id
  28. 28. Ruby -> JSON -> SQL <Person:0x10017d030 @name="alex", @stalkings= Ruby [#<Friend:0x10017d0a8 @name="Jim">, #<Friend:0x10017d058 @name="Bob"> ]> @alex.to_json JSON { name: "alex", stalkings: [{ name: "Jim" }, { name: "Bob" }] } people: - name SQL stalkings: - name - stalker_id - stalkee_id
  29. 29. Ruby -> JSON -> SQL <Person:0x10017d030 @name="alex", @stalkings= Ruby [#<Friend:0x10017d0a8 @name="Jim">, #<Friend:0x10017d058 @name="Bob"> ]> @alex.to_json JSON { name: "alex", stalkings: [{ name: "Jim" }, { name: "Bob" }] } people: - name Feels like we’re working SQL stalkings: too hard here - name - stalker_id - stalkee_id
  30. 30. You’re probably thinking... “SQL isn’t that bad”
  31. 31. Maybe.
  32. 32. But we can do better.
  33. 33. THE IDEAL PERSISTENCE LAYER:
  34. 34. THE IDEAL PERSISTENCE LAYER: 1. To persist objects in their native state
  35. 35. THE IDEAL PERSISTENCE LAYER: 1. To persist objects in their native state 2. Does not interfere w/ application development
  36. 36. THE IDEAL PERSISTENCE LAYER: 1. To persist objects in their native state 2. Does not interfere w/ application development 3. Provides features needed to build modern web apps
  37. 37. MONGO TO THE RESCUE!
  38. 38. WHAT IS MONGODB? mongodb is a high-performance, schema-less, scalable, document- oriented database
  39. 39. STRENGTHS
  40. 40. SCHEMA-LESS
  41. 41. SCHEMA-LESS • Great for rapid, iterative, agile development
  42. 42. SCHEMA-LESS • Great for rapid, iterative, agile development • Makes things possible that in an RDBMS are either a. impossibly hard b. virtually impossible c. way harder than they should be
  43. 43. DOCUMENT-ORIENTED
  44. 44. DOCUMENT-ORIENTED • Mongo stores documents, not rows • documents are stored as binary json
  45. 45. DOCUMENT-ORIENTED • Mongo stores documents, not rows • documents are stored as binary json • Rich, familiar query syntax
  46. 46. DOCUMENT-ORIENTED • Mongo stores documents, not rows • documents are stored as binary json • Rich, familiar query syntax • Embedded documents eliminate many modeling headaches
  47. 47. QUERY EXAMPLES sample mongo document {'status': 500, 'request_method': 'post', 'controller_action': 'notes#create', 'user': {'first_name': 'John', 'last_name': 'Doe'} }
  48. 48. QUERY EXAMPLES logs.find('status' => 500) # SQL equivalent: # select * from logs WHERE logs.status = 500 {'status': 500, 'request_method': 'post', 'controller_action': 'notes#create', 'user': {'first_name': 'John', 'last_name': 'Doe'} }
  49. 49. QUERY EXAMPLES logs.find('status' => 500, 'user.last_name' => 'Doe') # SQL equivalent: none {'status': 500, 'request_method': 'post', 'controller_action': 'notes#create', 'user': {'first_name': 'John', 'last_name': 'Doe'} }
  50. 50. QUERY EXAMPLES logs.find('status' => 500, 'user.last_name' => 'Doe') {'status': 500, 'request_method': 'post', user is an 'controller_action': 'notes#create', embedded document 'user': {'first_name': 'John', 'last_name': 'Doe'} }
  51. 51. QUERY EXAMPLES logs.find('status' => 500, 'user.last_name' => 'Doe') in sql, we would have JOIN-ed to the ‘users’ table to do this query
  52. 52. QUERY EXAMPLES # also works w/ regex logs.find('request_method' => /p/i) # SQL equivalent: # select * from logs where request_method LIKE %p% {'status': 500, 'request_method': 'post', 'controller_action': 'notes#create', 'user': {'first_name': 'John', 'last_name': 'Doe'} }
  53. 53. BIG DATA
  54. 54. BIG DATA • Built to scale horizontally into the cloudz
  55. 55. BIG DATA • Built to scale horizontally into the cloudz • Auto-sharding • set a shard-key, a cluster of nodes, go
  56. 56. FAST WRITES
  57. 57. FAST WRITES • In-place updates • i.e. upsert
  58. 58. FAST WRITES • In-place updates • i.e. upsert • Lot’s of db commands for fast updates • $set, $unset, $inc, $push
  59. 59. UPSERT • Update if present, insert otherwise • careful: update does a hard replace of the entire document # initial insert posts.save(:title => 'my post', :body => '...') # update-in-place "upsert" posts.save(:_id => 'post_id', :title => 'new title', :body => '...')
  60. 60. AGGREGATION Map/reduce for aggregation
  61. 61. AGGREGATION // map function() { emit(this.controller_action, { count: 1 }); } // reduce function(key, values) { var sum = 0; values.forEach(function(doc) { sum += doc.count; }); return { count: sum }; }
  62. 62. FAST READS Optimize reads with indexes, just like an rdbms
  63. 63. # index is only created if it does not exist logs.create_index('status') # can index on embedded documents logs.create_index('user.first_name') # create compound indexes logs.create_index( [['user.first_name', 1], ['user.last_name', 1]] )
  64. 64. WEAKNESSES
  65. 65. JOINS nope.
  66. 66. MULTI-DOCUMENT TRANSACTIONS no-sir-ee.
  67. 67. RELATIONAL INTEGRITY not a chance.
  68. 68. PRACTICAL PROJECTS
  69. 69. 1. Accounting Application 2. Capped Collection Logging 3. Blogging app
  70. 70. GENERAL LEDGER ACCOUNTING APPLICATION
  71. 71. THE OBJECT MODEL ledger * transactions * entries
  72. 72. THE OBJECT MODEL # Credits Debits Line item { 1 { :account => “Cash”, :amount => 100.00 } { :account => “Notes Pay.”, :amount => 100.00 } Ledger Entries Line item { 2 { :account => “A/R”, :amount => 25.00 } { :account => “Gross Revenue”, :amount => 25.00 }
  73. 73. THE OBJECT MODEL
  74. 74. THE OBJECT MODEL • Each ledger line item belongs to a ledger
  75. 75. THE OBJECT MODEL • Each ledger line item belongs to a ledger • Each ledger line item has two (or more) ledger entries • must have at least one credit and one debit • credits and debits must balance
  76. 76. THE OBJECT MODEL • Each ledger line item belongs to a ledger • Each ledger line item has two (or more) ledger entries • must have at least one credit and one debit • credits and debits must balance • These objects must be transactional
  77. 77. SQL-BASED OBJECT MODEL @debit = Entry.new(:account => 'Cash', :amount => 100.00, :type => 'credit') @credit = Entry.new(:account => 'Notes Pay.', :amount => 100.00, :type => 'debit') @line_item = LineItem.new(:ledger_id => 1, :entries => [@debit, @credit])
  78. 78. SQL-BASED OBJECT MODEL @debit = Entry.new(:account => 'Cash', :amount => 100.00, :type => 'credit') @credit = Entry.new(:account => 'Notes Pay.', :amount => 100.00, :type => 'debit') @line_item = LineItem.new :ledger_id => 1, :entries => [@debit, @credit] In a relational schema, we need a database transaction to ensure multi-row atomicity
  79. 79. SQL-BASED OBJECT MODEL @debit = Entry.new(:account => 'Cash', :amount => 100.00, :type => 'credit') @credit = Entry.new(:account => 'Notes Pay.', :amount => 100.00, :type => 'debit') @line_item = LineItem.new :ledger_id => 1, :entries => [@debit, @credit] Because we need to create 3 new records in the database here
  80. 80. MONGO-FIED MODELING @line_item = LineItem.new(:ledger_id => 1, :entries => [ {:account => 'Cash', :type => 'debit', :amount => 100}, {:account => 'Notes pay.', :type => 'debit', :amount => 100} ] )
  81. 81. MONGO-FIED MODELING @line_item = LineItem.new(:ledger_id => 1, :entries => [ {:account => 'Cash', :type => 'debit', :amount => 100}, {:account => 'Notes pay.', :type => 'debit', :amount => 100} ] ) This is the perfect case for embedded documents.
  82. 82. MONGO-FIED MODELING @line_item = LineItem.new(:ledger_id => 1, :entries => [ {:account => 'Cash', :type => 'debit', :amount => 100}, {:account => 'Notes pay.', :type => 'debit', :amount => 100} ] ) We will never have more than a few entries (usually two)
  83. 83. MONGO-FIED MODELING @line_item = LineItem.new(:ledger_id => 1, :entries => [ {:account => 'Cash', :type => 'debit', :amount => 100}, {:account => 'Notes pay.', :type => 'debit', :amount => 100} ] ) Embedded docs are perfect for “contains” many relationships
  84. 84. MONGO-FIED MODELING @line_item = LineItem.new(:ledger_id => 1, :entries => [ {:account => 'Cash', :type => 'debit', :amount => 100}, {:account => 'Notes pay.', :type => 'debit', :amount => 100} ] ) DB-level transactions no-longer needed. Mongo has single document atomicity.
  85. 85. WINS
  86. 86. WINS • Simplified, de-normalized data model • Objects modeled differently in Mongo than SQL
  87. 87. WINS • Simplified, de-normalized data model • Objects modeled differently in Mongo than SQL • Simpler data model + no transactions = WIN
  88. 88. LOGGING WITH CAPPED COLLECTIONS
  89. 89. BASELESS STATISTIC 95% of the time logs are NEVER used.
  90. 90. MOST LOGS ARE ONLY USED WHEN SOMETHING GOES WRONG
  91. 91. CAPPED COLLECTIONS • Fixed-sized, limited operation, auto age-out collections (kinda like memcached, except persistent) • Fixed insertion order • Super fast (faster than normal writes) • Ideal for logging and caching
  92. 92. GREAT. WHAT’S THAT MEAN? We can log additional pieces of arbitrary data effortlessly due to Mongo’s schemaless nature.
  93. 93. THIS IS AWESOME
  94. 94. CAPTURE. QUERY. ANALYZE. PROFIT.
  95. 95. ALSO A REALLY HANDY TROUBLESHOOTING TOOL
  96. 96. User-reported errors are difficult to diagnose
  97. 97. Wouldn’t it be useful to see the complete click-path?
  98. 98. BUNYAN http://github.com/ajsharp/bunyan Thin ruby layer around a MongoDB capped collection
  99. 99. BUNYAN http://github.com/ajsharp/bunyan Still needs a middleware api. Want to contribute?
  100. 100. require 'bunyan' Bunyan::Logger.configure do database 'my_bunyan_db' collection "#{Rails.env}_bunyan_log" size 1073741824 # 1.gigabyte end
  101. 101. PAPERMILL http://github.com/ajsharp/papermill SINATRA FRONT-END TO BUNYAN
  102. 102. PAPERMILL http://github.com/ajsharp/papermill DEMO TIME
  103. 103. BLOGGING APPLICATION
  104. 104. BLOGGING APPLICATION Much easier to model with Mongo than a relational database
  105. 105. BLOGGING APPLICATION A post has an author
  106. 106. BLOGGING APPLICATION A post has an author A post has many tags
  107. 107. BLOGGING APPLICATION A post has an author A post has many tags A post has many comments
  108. 108. BLOGGING APPLICATION A post has an author A post has many tags A post has many comments Instead of joining separate tables, we can use embedded documents.
  109. 109. MONGOMAPPER
  110. 110. MONGOMAPPER •MongoDB “ORM” developed by John Nunemaker
  111. 111. MONGOMAPPER •MongoDB “ORM” developed by John Nunemaker •Very similar syntax to DataMapper
  112. 112. MONGOMAPPER •MongoDB “ORM” developed by John Nunemaker •Very similar syntax to DataMapper •Declarative rather than inheritance-based
  113. 113. MONGOMAPPER •MongoDB “ORM” developed by John Nunemaker •Very similar syntax to DataMapper •Declarative rather than inheritance-based •Very easy to drop into rails
  114. 114. MONGOMAPPER require 'mongo' connection = Mongo::Connection.new.db("blog_app") class Post include MongoMapper::Document belongs_to :author, :class_name => "User" key :title, String, :required => true key :body, String, :required => true key :author_id, Integer, :required => true key :published_at, Time key :published, Boolean, :default => false key :tags, Array, :default => [] timestamps! end
  115. 115. INDEXES! Post.collection.create_index('tags') Post.collection.create_index('title')
  116. 116. WINS
  117. 117. WINS Simpler persistence model matches simple object model
  118. 118. NEXT STEPS
  119. 119. OBJECT-DOCUMENT MAPPERS • Ruby mongo driver • github.com/mongodb/mongo-ruby-driver • MongoMapper • github.com/jnunemaker/mongomapper • Mongoid • github.com/durran/mongoid
  120. 120. MONGOMAPPER • DataMapper-like API • good for moving from a sql schema to mongo • easy to drop into rails • works with rails 3
  121. 121. MONGOID • DataMapper-like API • uses ActiveModel, so familiar validation syntax • more opinionated towards embedded docs • easy to drop into rails • rails 2 - mongoid 1.x • rails 3 - mongoid 2.x
  122. 122. mongodb.org
  123. 123. MOST IMPORTANTLY ...
  124. 124. WE MUST ARRIVE AT THE WHISKEY BAR EARLIER TOMORROW
  125. 125. BECAUSE IT CLOSES AT MIDNIGHT ;-)
  126. 126. QUESTIONS?
  127. 127. THANKS!

×