MongoDB - A Document NoSQL Database

MongoDB
A NoSQL Document Oriented Database

Agenda
● RelationalDBs
● NoSQL
– What, Why
– Types
– History
– Features
– Types
● MongoDB
– Indexes
– Replication
– Sharding
– Querying
– Mapping
– MapReduce
● Use Case: RealNetworks

Relational DBs
● Born in the 70s
– storage is expensive
– schemas are simple
● Based on Relational Model
– Mathematical model for describing data structure
– Data represented in „tuples“, grouped into „relations“
● Queries based on Relational Algebra
– union, intersection, difference, cartesian product, selection,
projection, join, division
● Constraints
– Foreign Keys, Primary Keys, Indexes
– Domain Integrity (DataTypes)

Relational Dbs
● Normalization
– minimize redundancy
– avoid duplication

Relational DBs - Transactions
● Atomicity
– If one part of the transaction fails, the whole transaction fails
● Consistency
– Transaction leaves the DB in a valid state
● Isolation
– One transaction doesn't see an intermediate state of the other
● Durability
– Transaction gets persisted

NoSQL – Why?
● Web2.0
– Huge DataVolumes
– Need for Speed
– Accesibility
● RDBMS are difficult to scale
● Storage gets cheap
● Commodity machines get cheap

NoSQL – What?
● Simple storage of data
● Looser consistency model (eventual consistency), in
order to achieve:
– higher availability
– horizontal scaling
● No JOINs
● Optimized for big data, when no relational features are
needed

Vertical Scale
Horizontal Scale

Vertical Scale
Horizontal Scale
Enforces parallel computing

Eventual Consistency
● RDBMS: all users see a consistent view
of the data
● ACID gets difficult when distributing
data across nodes
● Eventual Consistency: inconsistencies
are transitory. The DB may have some
inconsistencies at a point of time, but will
eventually get consistent.
● BASE (in contrast to ACID)– Basically
Available Soft-state Eventually

CAP Theorem
All nodes see
the same data
at the same time
Requests always
get an immediate response
System continues to work,
even if a part of it breaks

NoSQL - History
● Term first used in 1998 by C. Strozzi to name
his RelationalDB that didn't use SQL
● Term reused in 2009 by E.Evans to name the
distributed Dbs that didn't provide ACID
● Some people traduce it as „Not Only SQL“
● Should actually be called „NoRel“ (no
Relational)

NoSQL – Some Features
● Auto-Sharding
● Replication
● Caching
● Dynamic Schema

NoSQL - Types
● Document
– „Map“ key-value, with a „Document“ (xml, json, pdf, ..) as
value
– MongoDB, CouchDB
● Key-Value
– „Map“ key-value, with an „Object“ (Integer, String, Order, ..)
as value
– Cassandra, Dynamo, Voldemort
● Graph
– Data stored in a graph structure – nodes have pointer to
adjacent ones
– Neo4J

MongoDB
● OpenSource NoSQL Document DB written in
C++
● Started in 2009
● Commercial Support by 10gen
● From humongous (huge)
● http://www.mongodb.org/

MongoDB – Document Oriented
● No Document Structure - schemaless
● Atomicity: only at document level (no
transactions across documents)
● Normalization is not easy to achieve:
– Embed: +duplication, +performance
– Reference: -duplication, +roundtrips

MongoDB
●
> db.users.save(
{ name: 'ruben',
surname : 'inoto',
age : '36' } )
●
> db.users.find()
– { "_id" : ObjectId("519a3dd65f03c7847ca5f560"),
"name" : "ruben",
"surname" : "inoto",
"age" : "36" }
● > db.users.update(
{ name: 'ruben' },
{ $set: { 'age' : '24' } } )
Documents are stored in BSON format

MongoDB - Querying
● find(): Returns a cursor containing a number of documents
– All users
– db.users.find()
– User with id 42
– db.users.find({ _id: 42})
– Age between 20 and 30
– db.users.find( { age: { $gt: 20, $lt: 30 } } )
– Subdocuments: ZIP 5026
– db.users.find( { address.zip: 5026 } )
– OR: ruben or younger than 30
– db.users.find({ $or: [
{ name : "ruben" },
{ age: { $lt: 30 } }
]})
– Projection: Deliver only name and age
– db.users.find({ }, { name: 1, age: 1 })
{
"_id": 42,
"name": "ruben",
"surname": "inoto",
„age“: „36“,
"address": {
"street": "Glaserstraße",
"zip": "5026" }
}

MongoDB - Saving
● Insert
– db.test.save( { _id: "42", name: "ruben" } )
● Update
– db.test.update( { _id : "42" }, { name : "harald" } )
– db.test.update( { _id : "42" }, { name : "harald", age : 39 } )
● Atomic Operators ($inc)
– db.test.update( { _id : "42" }, { $inc: { age : 1 } } )
● Arrays
– { _id : "48", name : "david", hobbies : [ "bike", "judo" ] }
– Add element to array atomic ($push)
● db.test.update( { _id : "48" }, { $push: { hobbies : "swimming" } } )
– $each, $pop, $pull, $addToSet...

MongoDB - Delete
● db.test.remove ( { _id : „42“ } )

MongoDB – Indexes
● Indexes on any attribute
– > db.users.ensureIndex( { 'age' : 1 } )
● Compound indexes
– > db.users.ensureIndex( { 'age' : 1 }, { 'name':
1 } )
● Unique Indexes
● >v2.4 → Text Indexing (search)

SQL → Mongo Mapping (I)
SQL Statement Mongo Query Language
CREATE TABLE USERS (a Number, b
Number)
implicit
INSERT INTO USERS VALUES(1,1) db.users.insert({a:1,b:1})
SELECT a,b FROM users db.users.find({}, {a:1,b:1})
SELECT * FROM users db.users.find()
SELECT * FROM users WHERE age=33 db.users.find({age:33})
SELECT * FROM users WHERE age=33
ORDER BY name
db.users.find({age:33}).sort({name:1})

SQL → Mongo Mapping (I)
SQL Statement Mongo Query Language
SELECT * FROM users WHERE age>33 db.users.find({'age':{$gt:33}})})
CREATE INDEX myindexname ON
users(name)
db.users.ensureIndex({name:1})
SELECT * FROM users WHERE a=1 and
b='q'
db.users.find({a:1,b:'q'})
SELECT * FROM users LIMIT 10 SKIP 20 db.users.find().limit(10).skip(20)
SELECT * FROM users LIMIT 1 db.users.findOne()
EXPLAIN PLAN FOR SELECT * FROM users
WHERE z=3
db.users.find({z:3}).explain()
SELECT DISTINCT last_name FROM users db.users.distinct('last_name')
SELECT COUNT(*)
FROM users where AGE > 30
db.users.find({age: {'$gt': 30}}).count()

Document
user: {
id: "1",
name: "ruben"
}
order: {
id: "a",
user_id: "1",
items: [ {
product_id: "x",
quantity: 10,
price: 300
},
{
product_id: "y",
quantity: 5,
price: 300
}]
}
referenced
embedded

MongoDB – Replication (I)
● Master-slave replication: primary and secondary nodes
● replica set: cluster of mongod instances that replicate amongst one
another and ensure automated failover
WriteConcern

MongoDB – Replication (II)
● adds redundancy
● helps to ensure high availability – automatic
failover
● simplifies backups

WriteConcerns
● Errors Ignored
– even network errors are ignored
● Unacknowledged
– at least network errors are handled
● Acknowledged
– constraints are handled (default)
● Journaled
– persisted to journal log
● Replica ACK
– 1..n
– Or 'majority'

MongoDB – Sharding (I)
● Scale Out
● Distributes data to nodes automatically
● Balances data and load accross machines

MongoDB – Sharding (II)
● A sharded Cluster is composed of:
– Shards: holds data.
● Either one mongod instance (primary daemon process –
handles data requests), or a replica set
– config Servers:
● mongod instance holding cluster metadata
– mongos instances:
● route application calls to the shards
● No single point of failure

MongoDB – Sharding (V)
● Collection has a shard key: existing field(s) in
all documents
● Documents get distributed according to ranges
● In a shard, documents are partitioned into
chunks
● Mongo tries to keep all chunks at the same size

MongoDB – Sharding (VI)
● Shard Balancing
– When a shard has too many chunks, mongo moves
chunks to other shards
● Only makes sense with huge amount of data

Object Mappers
● C#, PHP, Scala, Erlang, Perl, Ruby
● Java
– Morphia
– Spring MongoDB
– mongo-jackson-mapper
– jongo
● ..

Jongo - Example
DB db = new MongoClient().getDB("jongo");
Jongo jongo = new Jongo(db);
MongoCollection users = jongo.getCollection("users");
User user = new User("ruben", "inoto", new Address("Musterstraße", "5026"));
users.save(user);
User ruben = users.findOne("{name: 'ruben'}").as(User.class);
public class User {
private String name;
private String surname;
private Address address;
public class Address {
private String street;
private String zip;
{
"_id" : ObjectId("51b0e1c4d78a1c14a26ada9e"),
"name" : "ruben",
"surname" : "inoto",
"address" : {
"street" : "Musterstraße",
"zip" : "5026"
}
}

TTL (TimeToLive)
● Data with an expiryDate
● After the specified TimeToLive, the data will be
removed from the DB
● Implemented as an Index
● Useful for logs, sessions, ..
db.broadcastMessages.ensureIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )

MapReduce
● Programming model for processing large data sets with a
parallel, distributed algorithm.
● Handles complex aggregation tasks
● Problem can be distributed in smaller tasks, distributed across
nodes
● map phase: selects the data
– Associates a value with a key and a value pair
– Values will be grouped by the key, and passed to the reduce function
● reduce phase: transforms the data
– Accepts two arguments: key and values
– Reduces to a single object all the values associated with the key

MapReduce Use Example
● Problem: Count how much money each
customer has paid in all its orders

Solution - Relational
select customer_id, sum(price * quantity)
from orders
group by customer_id
order_id customer_id price quantity
a 1 350 2
b 2 100 2
c 1 20 1
customer_id total
1 720
2 200

Solution - Sequential
var customerTotals = new Map();
for (Order order: orders) {
var newTotal = order.price * order.quantity;
if (customerTotals.containsKey(order.customerId)) {
newTotal += customerTotals.get(order.customerId);
}
customerTotals.put(order.customerId, newTotal);
}
[{
order_id: "a",
customer_id: "1",
price: 350,
quantity: 2
},
{
order_id: "b",
customer_id: "2",
price: 100,
quantity: 2
},
{
order_id: "c",
customer_id: "1",
price: 20,
quantity: 1
}]
{ „1“: 720 }
{ „2“: 200 }

Solution - MapReduce
db.orders.insert([
{
order_id: "a",
customer_id: "1",
price: 350
quantity: 2
},
{
order_id: "b",
customer_id: "2",
price: 100,
quantity: 2
},
{
order_id: "c",
customer_id: "1",
price: 20,
quantity: 1
}
]);
var mapOrders = function() {
var totalPrice = this.price * this.quantity;
emit(this.customer_id, totalPrice);
};
var reduceOrders = function(customerId, tempTotal) {
return Array.sum(tempTotal);
};
db.orders.mapReduce(
mapOrders,
reduceOrders,
{ out: "map_reduce_orders" }
);
> db.map_reduce_orders.find().pretty();
{ "_id" : "1", "value" : 720 }
{ "_id" : "2", "value" : 200 }

Who is using Mongo?
● Craigslist
● SourceForge
● Disney
● TheGuardian
● Forbes
● CERN
● ….

„Real“ Use Case – Android
Notifications
● App to send „notifications“ (messages) to devices
with an installed RealNetworks application (Music,
RBT)
● Scala, Scalatra, Lift, Jersey, Guice,
ProtocolBuffers
● MongoDB, Casbah, Salat
● Mongo Collections
– Devices: deviceId, msisdn, application
– Messages: message, audience
– SentMessages: deviceId, message, status

Criticism
● Loss of data
– Specially in a cluster

Conclusion
● Not a silver bullet
● Makes sense when:
– Eventual consistency is acceptable
– Prototyping
– Performance
– Object model doesn't suit in a Relational DB
● Easy to learn

MongoDB - A Document NoSQL Database

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to MongoDB - A Document NoSQL Database

Similar to MongoDB - A Document NoSQL Database (20)

Recently uploaded

Recently uploaded (20)

MongoDB - A Document NoSQL Database