MongoDB - A Document NoSQL Database
Upcoming SlideShare
Loading in...5
×
 

MongoDB - A Document NoSQL Database

on

  • 1,100 views

 

Statistics

Views

Total Views
1,100
Views on SlideShare
1,099
Embed Views
1

Actions

Likes
1
Downloads
23
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MongoDB - A Document NoSQL Database MongoDB - A Document NoSQL Database Presentation Transcript

  • MongoDBA NoSQL Document Oriented Database
  • Agenda● RelationalDBs● NoSQL– What, Why– Types– History– Features– Types● MongoDB– Indexes– Replication– Sharding– Querying– Mapping– MapReduce● Use Case: RealNetworks
  • Relational DBs● Born in the 70s– storage is expensive– schemas are simple● Based on Relational Model– Mathematical model for describing data structure– Data represented in „tuples“, grouped into „relations“● Queries based on Relational Algebra– union, intersection, difference, cartesian product, selection,projection, join, division● Constraints– Foreign Keys, Primary Keys, Indexes– Domain Integrity (DataTypes)
  • Joins
  • Relational Dbs● Normalization– minimize redundancy– avoid duplication
  • Normalization
  • Relational DBs - Transactions● Atomicity– If one part of the transaction fails, the whole transaction fails● Consistency– Transaction leaves the DB in a valid state● Isolation– One transaction doesnt see an intermediate state of the other● Durability– Transaction gets persisted
  • Relational Dbs - Use
  • NoSQL – Why?● Web2.0– Huge DataVolumes– Need for Speed– Accesibility● RDBMS are difficult to scale● Storage gets cheap● Commodity machines get cheap
  • NoSQL – What?● Simple storage of data● Looser consistency model (eventual consistency), inorder to achieve:– higher availability– horizontal scaling● No JOINs● Optimized for big data, when no relational features areneeded
  • Vertical ScaleHorizontal Scale
  • Vertical ScaleHorizontal ScaleEnforces parallel computing
  • Eventual Consistency● RDBMS: all users see a consistent viewof the data● ACID gets difficult when distributingdata across nodes● Eventual Consistency: inconsistenciesare transitory. The DB may have someinconsistencies at a point of time, but willeventually get consistent.● BASE (in contrast to ACID)– BasicallyAvailable Soft-state Eventually
  • CAP TheoremAll nodes seethe same dataat the same timeRequests alwaysget an immediate responseSystem continues to work,even if a part of it breaks
  • NoSQL - History● Term first used in 1998 by C. Strozzi to namehis RelationalDB that didnt use SQL● Term reused in 2009 by E.Evans to name thedistributed Dbs that didnt provide ACID● Some people traduce it as „Not Only SQL“● Should actually be called „NoRel“ (noRelational)
  • NoSQL – Some Features● Auto-Sharding● Replication● Caching● Dynamic Schema
  • NoSQL - Types● Document– „Map“ key-value, with a „Document“ (xml, json, pdf, ..) asvalue– MongoDB, CouchDB● Key-Value– „Map“ key-value, with an „Object“ (Integer, String, Order, ..)as value– Cassandra, Dynamo, Voldemort● Graph– Data stored in a graph structure – nodes have pointer toadjacent ones– Neo4J
  • MongoDB● OpenSource NoSQL Document DB written inC++● Started in 2009● Commercial Support by 10gen● From humongous (huge)● http://www.mongodb.org/
  • MongoDB – Document Oriented● No Document Structure - schemaless● Atomicity: only at document level (notransactions across documents)● Normalization is not easy to achieve:– Embed: +duplication, +performance– Reference: -duplication, +roundtrips
  • MongoDB●> db.users.save({ name: ruben,surname : inoto,age : 36 } )●> db.users.find()– { "_id" : ObjectId("519a3dd65f03c7847ca5f560"),"name" : "ruben","surname" : "inoto","age" : "36" }● > db.users.update({ name: ruben },{ $set: { age : 24 } } )Documents are stored in BSON format
  • MongoDB - Querying● find(): Returns a cursor containing a number of documents– All users– db.users.find()– User with id 42– db.users.find({ _id: 42})– Age between 20 and 30– db.users.find( { age: { $gt: 20, $lt: 30 } } )– Subdocuments: ZIP 5026– db.users.find( { address.zip: 5026 } )– OR: ruben or younger than 30– db.users.find({ $or: [{ name : "ruben" },{ age: { $lt: 30 } }]})– Projection: Deliver only name and age– db.users.find({ }, { name: 1, age: 1 }){"_id": 42,"name": "ruben","surname": "inoto",„age“: „36“,"address": {"street": "Glaserstraße","zip": "5026" }}
  • MongoDB - Saving● Insert– db.test.save( { _id: "42", name: "ruben" } )● Update– db.test.update( { _id : "42" }, { name : "harald" } )– db.test.update( { _id : "42" }, { name : "harald", age : 39 } )● Atomic Operators ($inc)– db.test.update( { _id : "42" }, { $inc: { age : 1 } } )● Arrays– { _id : "48", name : "david", hobbies : [ "bike", "judo" ] }– Add element to array atomic ($push)● db.test.update( { _id : "48" }, { $push: { hobbies : "swimming" } } )– $each, $pop, $pull, $addToSet...
  • MongoDB - Delete● db.test.remove ( { _id : „42“ } )
  • MongoDB – Indexes● Indexes on any attribute– > db.users.ensureIndex( { age : 1 } )● Compound indexes– > db.users.ensureIndex( { age : 1 }, { name:1 } )● Unique Indexes● >v2.4 → Text Indexing (search)
  • SQL → Mongo Mapping (I)SQL Statement Mongo Query LanguageCREATE TABLE USERS (a Number, bNumber)implicitINSERT INTO USERS VALUES(1,1) db.users.insert({a:1,b:1})SELECT a,b FROM users db.users.find({}, {a:1,b:1})SELECT * FROM users db.users.find()SELECT * FROM users WHERE age=33 db.users.find({age:33})SELECT * FROM users WHERE age=33ORDER BY namedb.users.find({age:33}).sort({name:1})
  • SQL → Mongo Mapping (I)SQL Statement Mongo Query LanguageSELECT * FROM users WHERE age>33 db.users.find({age:{$gt:33}})})CREATE INDEX myindexname ONusers(name)db.users.ensureIndex({name:1})SELECT * FROM users WHERE a=1 andb=qdb.users.find({a:1,b:q})SELECT * FROM users LIMIT 10 SKIP 20 db.users.find().limit(10).skip(20)SELECT * FROM users LIMIT 1 db.users.findOne()EXPLAIN PLAN FOR SELECT * FROM usersWHERE z=3db.users.find({z:3}).explain()SELECT DISTINCT last_name FROM users db.users.distinct(last_name)SELECT COUNT(*)FROM users where AGE > 30db.users.find({age: {$gt: 30}}).count()
  • Embed vs Reference
  • Relational
  • Documentuser: {id: "1",name: "ruben"}order: {id: "a",user_id: "1",items: [ {product_id: "x",quantity: 10,price: 300},{product_id: "y",quantity: 5,price: 300}]}referencedembedded
  • MongoDB – Replication (I)● Master-slave replication: primary and secondary nodes● replica set: cluster of mongod instances that replicate amongst oneanother and ensure automated failoverWriteConcern
  • MongoDB – Replication (II)● adds redundancy● helps to ensure high availability – automaticfailover● simplifies backups
  • WriteConcerns● Errors Ignored– even network errors are ignored● Unacknowledged– at least network errors are handled● Acknowledged– constraints are handled (default)● Journaled– persisted to journal log● Replica ACK– 1..n– Or majority
  • MongoDB – Sharding (I)● Scale Out● Distributes data to nodes automatically● Balances data and load accross machines
  • MongoDB – Sharding (II)● A sharded Cluster is composed of:– Shards: holds data.● Either one mongod instance (primary daemon process –handles data requests), or a replica set– config Servers:● mongod instance holding cluster metadata– mongos instances:● route application calls to the shards● No single point of failure
  • MongoDB – Sharding (III)
  • MongoDB – Sharding (IV)
  • MongoDB – Sharding (V)● Collection has a shard key: existing field(s) inall documents● Documents get distributed according to ranges● In a shard, documents are partitioned intochunks● Mongo tries to keep all chunks at the same size
  • MongoDB – Sharding (VI)● Shard Balancing– When a shard has too many chunks, mongo moveschunks to other shards● Only makes sense with huge amount of data
  • Object Mappers● C#, PHP, Scala, Erlang, Perl, Ruby● Java– Morphia– Spring MongoDB– mongo-jackson-mapper– jongo● ..
  • Jongo - ExampleDB db = new MongoClient().getDB("jongo");Jongo jongo = new Jongo(db);MongoCollection users = jongo.getCollection("users");User user = new User("ruben", "inoto", new Address("Musterstraße", "5026"));users.save(user);User ruben = users.findOne("{name: ruben}").as(User.class);public class User {private String name;private String surname;private Address address;public class Address {private String street;private String zip;{"_id" : ObjectId("51b0e1c4d78a1c14a26ada9e"),"name" : "ruben","surname" : "inoto","address" : {"street" : "Musterstraße","zip" : "5026"}}
  • TTL (TimeToLive)● Data with an expiryDate● After the specified TimeToLive, the data will beremoved from the DB● Implemented as an Index● Useful for logs, sessions, ..db.broadcastMessages.ensureIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
  • MapReduce● Programming model for processing large data sets with aparallel, distributed algorithm.● Handles complex aggregation tasks● Problem can be distributed in smaller tasks, distributed acrossnodes● map phase: selects the data– Associates a value with a key and a value pair– Values will be grouped by the key, and passed to the reduce function● reduce phase: transforms the data– Accepts two arguments: key and values– Reduces to a single object all the values associated with the key
  • MapReduce
  • MapReduce Use Example● Problem: Count how much money eachcustomer has paid in all its orders
  • Solution - Relationalselect customer_id, sum(price * quantity)from ordersgroup by customer_idorder_id customer_id price quantitya 1 350 2b 2 100 2c 1 20 1customer_id total1 7202 200
  • Solution - Sequentialvar customerTotals = new Map();for (Order order: orders) {var newTotal = order.price * order.quantity;if (customerTotals.containsKey(order.customerId)) {newTotal += customerTotals.get(order.customerId);}customerTotals.put(order.customerId, newTotal);}[{order_id: "a",customer_id: "1",price: 350,quantity: 2},{order_id: "b",customer_id: "2",price: 100,quantity: 2},{order_id: "c",customer_id: "1",price: 20,quantity: 1}]{ „1“: 720 }{ „2“: 200 }
  • Solution - MapReducedb.orders.insert([{order_id: "a",customer_id: "1",price: 350quantity: 2},{order_id: "b",customer_id: "2",price: 100,quantity: 2},{order_id: "c",customer_id: "1",price: 20,quantity: 1}]);var mapOrders = function() {var totalPrice = this.price * this.quantity;emit(this.customer_id, totalPrice);};var reduceOrders = function(customerId, tempTotal) {return Array.sum(tempTotal);};db.orders.mapReduce(mapOrders,reduceOrders,{ out: "map_reduce_orders" });> db.map_reduce_orders.find().pretty();{ "_id" : "1", "value" : 720 }{ "_id" : "2", "value" : 200 }
  • MapReduce
  • Who is using Mongo?● Craigslist● SourceForge● Disney● TheGuardian● Forbes● CERN● ….
  • „Real“ Use Case – AndroidNotifications● App to send „notifications“ (messages) to deviceswith an installed RealNetworks application (Music,RBT)● Scala, Scalatra, Lift, Jersey, Guice,ProtocolBuffers● MongoDB, Casbah, Salat● Mongo Collections– Devices: deviceId, msisdn, application– Messages: message, audience– SentMessages: deviceId, message, status
  • Criticism● Loss of data– Specially in a cluster
  • Conclusion● Not a silver bullet● Makes sense when:– Eventual consistency is acceptable– Prototyping– Performance– Object model doesnt suit in a Relational DB● Easy to learn