Apéro RubyBdx - MongoDB - 8-11-2011

  • 1,819 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,819
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
8
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Pierre-Louis GottfroisBastien MurzeauApéro Ruby Bordeaux, 8 novembre 2011
  • 2. • Brève introduction• Cas pratique• Map / Reduce
  • 3. Qu’est ce que mongoDB ? mongoDB est une base de donnée de type NoSQL, sans schéma document-oriented
  • 4. sans-schéma• Très utile en développements ‘agiles’ (itérations, rapidité de modifications, flexibilité pour les développeurs)• Supporte des fonctionnalités qui seraient, en BDDs relationnelles : • quasi-impossible (stockage d’éléments non finis, ex. tags) • trop complexes pour ce qu’elles sont (migrations)
  • 5. document-oriented• mongoDB stocke des documents, pas de rows • les documents sont stockés sous forme de JSON; binary JSON• la syntaxe de requêtage est aussi fournie que SQL• le mécanisme de documents ‘embedded’ résout bon nombre de problèmes rencontrés
  • 6. document-oriented• Les documents sont stockés dans une collection, en RoR = model• une partie des ces données sont indexées pour optimiser les performances• un document n’est pas une poubelle !
  • 7. stockage de données volumineuses• mongoDB (et autres NoSQL) sont plus performantes pour la scalabilité horizontale • ajout de serveurs pour augmenter la capacité de stockage («sharding») • garantissant ainsi une meilleur disponibilité • load-balancing optimisé entre les nodes • augmentation transparente pour l’application
  • 8. Cas pratique• ORM devient ODM, la gem de référence mongoid • ou : mongoMapper, DataMapper• Création d’une application a base de NoSQL MongoDB • rails new nosql • edition du Gemfile • gem ‘mongoid’ • gem ‘bson_ext’ • bundle install • rails generate mongoid:config
  • 9. Cas pratique• edition du config/application.rb • #require rails/all • require "action_controller/railtie" • require "action_mailer/railtie" • require "active_resource/railtie" • require "rails/test_unit/railtie"
  • 10. Cas pratiqueclass Subject include Mongoid::Document include Mongoid::Timestamps has_many :scores, :as => :scorable, :dependent => :delete, :autosave => true has_many :requests, :dependent => :delete belongs_to :author, :class_name => User class Conversation include Mongoid::Document include Mongoid::Timestamps field :public, :type => Boolean, :default => false has_many :scores, :as => :scorable, :dependent => :delete has_and_belongs_to_many :subjects belongs_to :timeline embeds_many :messages
  • 11. Map Reduce
  • 12. Example A “ticket” collection{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  • 13. Problematic• We want to • Calculate the ‘checkout’ sum of each object in our ticket’s collection • Be able to distribute this operation over the network • Be fast!• We don’t want to • Go over all objects again when an update is made
  • 14. Map : emit(checkout) The ‘map’ function emit (select) every checkout value of each object in our collection 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  • 15. Reduce : sum(checkout) 430 142 288 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  • 16. Reduce function The ‘reduce’ function apply the algorithmic logic for each key/value received from ‘map’ functionThis function has to be ‘idempotent’ to be called recursively or in a distributed systemreduce(k, A, B) == reduce(k, B, A)reduce(k, A, B) == reduce(k, reduce(A, B))
  • 17. Inherently Distributed 430 142 288 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 215 “checkout” : 73} } } }
  • 18. DistributedSince ‘map’ function emits objects to be reducedand ‘reduce’ function processes for each emitted objects independently, it can be distributed through multiple workers. map reduce
  • 19. Logaritmic UpdateFor the same reason, when updating an object, we don’t have to reprocess for each obejcts. We can call ‘map’ function only on updated objects.
  • 20. Logaritmic Update 430 142 288 100 42 215 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  • 21. Logaritmic Update 430 142 288 100 42 210 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  • 22. Logaritmic Update 430 142 283 100 42 210 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  • 23. Logarithmic Update 425 142 283 100 42 210 73{ { { { “id” : 1, “id” : 2, “id” : 3, “id” : 4, “day” : 20111017, “day” : 20111017, “day” : 20111017, “day” : 20111017, “checkout” : 100 “checkout” : 42 “checkout” : 210 “checkout” : 73} } } }
  • 24. Let’s do some code!
  • 25. $> mongo> db.tickets.save({ "_id": 1, "day": 20111017, "checkout": 100 })> db.tickets.save({ "_id": 2, "day": 20111017, "checkout": 42 })> db.tickets.save({ "_id": 3, "day": 20111017, "checkout": 215 })> db.tickets.save({ "_id": 4, "day": 20111017, "checkout": 73 })> db.tickets.count()4> db.tickets.find(){ "_id" : 1, "day" : 20111017, "checkout" : 100 }...> db.tickets.find({ "_id": 1 }){ "_id" : 1, "day" : 20111017, "checkout" : 100 }
  • 26. > var map = function() {... emit(null, this.checkout)}> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}
  • 27. Temporary Collection> sumOfCheckouts = db.tickets.mapReduce(map, reduce){ "result" : "tmp.mr.mapreduce_123456789_4", "timeMills" : 8, "counts" : { "input" : 4, "emit" : 4, "output" : 1 }, "ok" : 1}> db.getCollectionNames()[ "tickets", "tmp.mr.mapreduce_123456789_4"]> db[sumOfCheckouts.result].find(){ "_id" : null, "value" : 430 }
  • 28. Persistent Collection> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })> db.getCollectionNames()[ "sumOfCheckouts", "tickets", "tmp.mr.mapreduce_123456789_4"]> db.sumOfCheckouts.find(){ "_id" : null, "value" : 430 }> db.sumOfCheckouts.findOne().value430
  • 29. Reduce by Date
  • 30. > var map = function() {... emit(this.date, this.checkout)}> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}
  • 31. > db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })> db.sumOfCheckouts.find(){ "_id" : 20111017, "value" : 430 }
  • 32. What we can do
  • 33. Scored Subjects per UserSubject User Score 1 1 2 1 1 2 1 2 2 2 1 2 2 2 10 2 2 5
  • 34. Scored Subjects per User (reduced)Subject User Score 1 1 4 1 2 2 2 1 2 2 2 15
  • 35. $> mongo> db.scores.save({ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 2, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 3, "subject_id": 1, "user_id": 2, "score": 2 })> db.scores.save({ "_id": 4, "subject_id": 2, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 5, "subject_id": 2, "user_id": 2, "score": 10 })> db.scores.save({ "_id": 6, "subject_id": 2, "user_id": 2, "score": 5 })> db.scores.count()6> db.scores.find(){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }...> db.scores.find({ "_id": 1 }){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
  • 36. > var map = function() {... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id,... user_id:this.user_id, score:this.score});}> var reduce = function(key, values) {... var result = {user_id:"", subject_id:"", score:0};... values.forEach(function (value) {result.score += value.score;result.user_id =... value.user_id;result.subject_id = value.subject_id;});... return result}
  • 37. ReducedScores Collection> db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" })> db.getCollectionNames()[ "reduced_scores", "scores"]> db.reduced_scores.find(){ "_id" : "1-1", "value" : { "user_id" : 1, "subject_id" : 1, "score" : 4 } }{ "_id" : "1-2", "value" : { "user_id" : 1, "subject_id" : 2, "score" : 2 } }{ "_id" : "2-1", "value" : { "user_id" : 2, "subject_id" : 1, "score" : 2 } }{ "_id" : "2-2", "value" : { "user_id" : 2, "subject_id" : 2, "score" : 15 } }> db.reduced_scores.findOne().score4
  • 38. Dealing with Rails Queryruby-1.9.2-p180 :007 > ReducedScores.first => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId(...),"subject_id"=>BSON::ObjectId(...), "score"=>4.0}>ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count => 2ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value[score] => 4.0ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value[score] => 2.0
  • 39. Questions ?