Apéro RubyBdx - MongoDB - 8-11-2011

2,081 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,081
On SlideShare
0
From Embeds
0
Number of Embeds
1,100
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Apéro RubyBdx - MongoDB - 8-11-2011

  1. 1. Pierre-Louis GottfroisBastien MurzeauApéro Ruby Bordeaux, 8 novembre 2011
  2. 2. • Brève introduction• Cas pratique• Map / Reduce
  3. 3. Qu’est ce que mongoDB ? mongoDB est une base de donnée de type NoSQL, sans schéma document-oriented
  4. 4. sans-schéma• Très utile en développements ‘agiles’ (itérations, rapidité de modifications, flexibilité pour les développeurs)• Supporte des fonctionnalités qui seraient, en BDDs relationnelles : • quasi-impossible (stockage d’éléments non finis, ex. tags) • trop complexes pour ce qu’elles sont (migrations)
  5. 5. document-oriented• mongoDB stocke des documents, pas de rows • les documents sont stockés sous forme de JSON; binary JSON• la syntaxe de requêtage est aussi fournie que SQL• le mécanisme de documents ‘embedded’ résout bon nombre de problèmes rencontrés
  6. 6. document-oriented• Les documents sont stockés dans une collection, en RoR = model• une partie des ces données sont indexées pour optimiser les performances• un document n’est pas une poubelle !
  7. 7. stockage de données volumineuses• mongoDB (et autres NoSQL) sont plus performantes pour la scalabilité horizontale • ajout de serveurs pour augmenter la capacité de stockage («sharding») • garantissant ainsi une meilleur disponibilité • load-balancing optimisé entre les nodes • augmentation transparente pour l’application
  8. 8. Cas pratique• ORM devient ODM, la gem de référence mongoid • ou : mongoMapper, DataMapper• Création d’une application a base de NoSQL MongoDB • rails new nosql • edition du Gemfile • gem ‘mongoid’ • gem ‘bson_ext’ • bundle install • rails generate mongoid:config
  9. 9. Cas pratique• edition du config/application.rb • #require rails/all • require "action_controller/railtie" • require "action_mailer/railtie" • require "active_resource/railtie" • require "rails/test_unit/railtie"
  10. 10. Cas pratiqueclass Subject include Mongoid::Document include Mongoid::Timestamps has_many :scores, :as => :scorable, :dependent => :delete, :autosave => true has_many :requests, :dependent => :delete belongs_to :author, :class_name => User class Conversation include Mongoid::Document include Mongoid::Timestamps field :public, :type => Boolean, :default => false has_many :scores, :as => :scorable, :dependent => :delete has_and_belongs_to_many :subjects belongs_to :timeline embeds_many :messages
  11. 11. Map Reduce
  12. 12. Example A “ticket” collection{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  13. 13. Problematic• We want to • Calculate the ‘checkout’ sum of each object in our ticket’s collection • Be able to distribute this operation over the network • Be fast!• We don’t want to • Go over all objects again when an update is made
  14. 14. Map : emit(checkout) The ‘map’ function emit (select) every checkout value of each object in our collection 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  15. 15. Reduce : sum(checkout) 430 142 288 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  16. 16. Reduce function The ‘reduce’ function apply the algorithmic logic for each key/value received from ‘map’ functionThis function has to be ‘idempotent’ to be called recursively or in a distributed systemreduce(k, A, B) == reduce(k, B, A)reduce(k, A, B) == reduce(k, reduce(A, B))
  17. 17. Inherently Distributed 430 142 288 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  18. 18. DistributedSince ‘map’ function emits objects to be reducedand ‘reduce’ function processes for each emitted objects independently, it can be distributed through multiple workers. map reduce
  19. 19. Logaritmic UpdateFor the same reason, when updating an object, we don’t have to reprocess for each obejcts. We can call ‘map’ function only on updated objects.
  20. 20. Logaritmic Update 430 142 288 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  21. 21. Logaritmic Update 430 142 288 100 42 210 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  22. 22. Logaritmic Update 430 142 283 100 42 210 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  23. 23. Logarithmic Update 425 142 283 100 42 210 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  24. 24. Let’s do some code!
  25. 25. $> mongo> db.tickets.save({ "_id": 1, "day": 20111017, "checkout": 100 })> db.tickets.save({ "_id": 2, "day": 20111017, "checkout": 42 })> db.tickets.save({ "_id": 3, "day": 20111017, "checkout": 215 })> db.tickets.save({ "_id": 4, "day": 20111017, "checkout": 73 })> db.tickets.count()4> db.tickets.find(){ "_id" : 1, "day" : 20111017, "checkout" : 100 }...> db.tickets.find({ "_id": 1 }){ "_id" : 1, "day" : 20111017, "checkout" : 100 }
  26. 26. > var map = function() {... emit(null, this.checkout)}> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}
  27. 27. Temporary Collection> sumOfCheckouts = db.tickets.mapReduce(map, reduce){ "result" : "tmp.mr.mapreduce_123456789_4", "timeMills" : 8, "counts" : { "input" : 4, "emit" : 4, "output" : 1 }, "ok" : 1}> db.getCollectionNames()[ "tickets", "tmp.mr.mapreduce_123456789_4"]> db[sumOfCheckouts.result].find(){ "_id" : null, "value" : 430 }
  28. 28. Persistent Collection> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })> db.getCollectionNames()[ "sumOfCheckouts", "tickets", "tmp.mr.mapreduce_123456789_4"]> db.sumOfCheckouts.find(){ "_id" : null, "value" : 430 }> db.sumOfCheckouts.findOne().value430
  29. 29. Reduce by Date
  30. 30. > var map = function() {... emit(this.date, this.checkout)}> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}
  31. 31. > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })> db.sumOfCheckouts.find(){ "_id" : 20111017, "value" : 430 }
  32. 32. What we can do
  33. 33. Scored Subjects per UserSubject User Score 1 1 2 1 1 2 1 2 2 2 1 2 2 2 10 2 2 5
  34. 34. Scored Subjects per User (reduced)Subject User Score 1 1 4 1 2 2 2 1 2 2 2 15
  35. 35. $> mongo> db.scores.save({ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 2, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 3, "subject_id": 1, "user_id": 2, "score": 2 })> db.scores.save({ "_id": 4, "subject_id": 2, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 5, "subject_id": 2, "user_id": 2, "score": 10 })> db.scores.save({ "_id": 6, "subject_id": 2, "user_id": 2, "score": 5 })> db.scores.count()6> db.scores.find(){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }...> db.scores.find({ "_id": 1 }){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
  36. 36. > var map = function() {... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id,... user_id:this.user_id, score:this.score});}> var reduce = function(key, values) {... var result = {user_id:"", subject_id:"", score:0};... values.forEach(function (value) {result.score += value.score;result.user_id =... value.user_id;result.subject_id = value.subject_id;});... return result}
  37. 37. ReducedScores Collection> db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" })> db.getCollectionNames()[ "reduced_scores", "scores"]> db.reduced_scores.find(){ "_id" : "1-1", "value" : { "user_id" : 1, "subject_id" : 1, "score" : 4 } }{ "_id" : "1-2", "value" : { "user_id" : 1, "subject_id" : 2, "score" : 2 } }{ "_id" : "2-1", "value" : { "user_id" : 2, "subject_id" : 1, "score" : 2 } }{ "_id" : "2-2", "value" : { "user_id" : 2, "subject_id" : 2, "score" : 15 } }> db.reduced_scores.findOne().score4
  38. 38. Dealing with Rails Queryruby-1.9.2-p180 :007 > ReducedScores.first => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId(...),"subject_id"=>BSON::ObjectId(...), "score"=>4.0}>ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count => 2ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value[score] => 4.0ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value[score] => 2.0
  39. 39. Questions ?

×