Mongo DB schema design patterns

MongoDB
PRESENTED BY
Jörg Reichert
Licensed under cc-by v3.0 (any jurisdiction)

Introduction
● Name derived from humongous (= gigantic)
● NoSQL (= not only SQL) database
● Document oriented database
– documents stored as binary JSON (BSON)
● Ad-hoc queries
● Server side Javascript execution
● Aggregation / MapReduce
● High performance, availability, scalability

MongoDB
Relational vs. document based: concepts
SQL
Person
Name AddressId
MongoDB
1
2
Mueller 1
Id
Address
City Street
1
2
<null> 2
Leipzig Burgstr. 1
Dresden <null>
Person
{
_id: ObjectId(“...“),
Name: “Mueller“,
Address: {
City: “Leipzig“,
Street: “Burgstr. 1“,
},
}, {
Address: {
},
}
DB DB
Table CollectionColumn
Row
Document
Key: Value
FieldPK
FK
Relation
Embedded document
PK
PK: primary key, FK: foreign key

MongoDB
SELECT * FROM Person;
SELECT * FROM Person
WHERE name = “Mueller“;
WHERE name like “M%“;
SELECT name FROM Person;
SELECT distinct(name)
FROM Person
Relational vs. document based: syntax (1/3)
db.getCollection(“Person“).find();
db.Person.find({ “name“: "Mueller“ });
db.Person.find({ “name“: /M.*/ });
db.Person.find({}, {name: 1, _id: 0});
db.Person.distinct(
“name“, { “name“: "Mueller“ });

MongoDB
WHERE id > 10
AND name <> “Mueller“;
SELECT p.name FROM Person p
JOIN Address a
ON p.address = a.id
WHERE a.city = “Leipzig“
ORDER BY p.name DESC;
SELECT * FROM
WHERE name IS NOT NULL;
SELECT COUNT(*) FROM PERSON
db.Person.find({ $and: [
{ _id: { $gt: ObjectId("...") }},
{ name: { $ne: "Mueller" }}]});
db.Person.find(
{ Address.city: “Leipzig“ },
{ name: 1, _id: 0 }
).sort({ name: -1 });
db.Person.find( { name: {
$not: { $type: 10 }, $exists: true }});
db.Person.count({ name: “Mueller“ });
db.Person.find(
{ name: “Mueller“ }).count();

MongoDB
UPDATE Person
SET name = “Müller“
DELETE Person
INSERT Person (name, address)
VALUES (“Mueller“, 3);
ALTER TABLE PERSON
DROP COLUMN name;
DROP TABLE PERSON;
db.Person.updateMany(
{ name: “Mueller“ },
{ $set: { name: “Müller“} });
db.Person.remove( { name: “Mueller“ } );
db.Person.insert(
{ name: “Mueller“, Address: { … } });
db.Person.updateMany( {},
{ $unset: { name: 1 }} );
db.Person.drop();

MongoDB
● principle of least cardinality
● Store what you query for
schema design principles

MongoDB
● applicable for 1:1 and 1:n when
n can‘t get to large
● Embedded document cannot get
too large
● Embedded document not very
likely to change
● arrays that grow without bound
should never be embedded
schema design: embedded document
{
Person: [
{
},
{
Name: “Schneider“,
},
]
}
Address

MongoDB
● applicable for :n when n can‘t
get to large
● Referenced document likely to
change often in future
● there are many referenced
documents expected, so storing
only the reference is cheaper
● there are large referenced
documents expected, so storing
only the reference is cheaper
● arrays that grow without bound
should never be embedded
● Address should be accessible on
its own
schema design: referencing
{
Person: [
ObjectId(“...“), ObjectId(“...“),
]
}
{
}
Address
Person

MongoDB
● applicable for :n relations when
n can get very large (note: a
MongoDB document isn‘t
allowed to exceed 16MB)
● Joins are done on application
level
schema design: parent-referencing
{
City: “Dubai“,
Street: “1 Sheikh Mohammed
bin Rashid Blvd“,
}
{
Address: ObjectId(“...“),
}
Address
Person

MongoDB
● applicable for m:n when n and m
can‘t get to large and application
requires to navigate both ends
● disadvantage: need to update
operations when changing
references
schema design: two way referencing
{
Person: [
]
}
{
Address: [
]
}
Address
Person

MongoDB
● queries expected to filter by
certain fields of the referenced
document, so including this field
already in the hosts saves an
additional query at application
level
● disadvantage: two update
operations for duplicated field
● disadvantage: additional
memory consumption
schema design: denormalization
{
}
{
Address: [
{
id: ObjectId(“...“),
city: “Leipzig“,
}, ...
]
}
Address
Person

MongoDB
● applicable for :n relations when
n can get very large and it‘s
expected that application will
use pagination anyway
● DB schema will already create
the chunks, the application will
later query for
schema design: bucketing
{
}
{
Address: ObjectId(“...“),
Page: 13,
Count: 50,
Persons: [
{ Name: “Mueller“ }, ...
]
}
Address
Person

MongoDB
Aggregation Framework
● Aggregation pipeline consisting of (processing) stages
– $match, $group, $project, $redact, $unwind, $lookup, $sort, ...
● Aggregation operators
– Boolean: $and, $or, $not
– Aggregation: $eq, $lt, $lte, $gt, $gte, $ne, $cmp
– Arithmetic: $add, $substract, $multiply, $divide, ...
– String: $concat, $substr, …
– Array: $size, $arrayElemAt, ...
– Aggregation variable: $map, $let
– Group Accumulator: $sum, $avg, $addToSet, $push, $min, $max
$first, $last, …
– ...

MongoDB
Aggregation Framework
db.Person.aggregate( [
{ $match: { name: { $ne: "Fischer" } } },
{ $group: {
_id: "$name",
city_occurs: { $addToSet: "$Address.city" }
} },
{ $project: {
_id: "$_id",
city_count: { $size: "$city_occurs" }
}},
{ $sort: { name: 1 } }
{ $match: { city_count: { $gt: 1 } }},
{ $out: "PersonCityCount"}
] );
PersonCityCount
{
_id: Mueller,
city_count: 2,
},
{
_id: Schmidt,
city_count: 3,
}, ...

MongoDB
Map-Reduce
● More control than aggregation framework, but slower
var map = function() {
if(this.name != "Fischer") emit(this.name, this.Address.city);
}
var reduce = function(key, values) {
var distinct = [];
for(value in values) {
if(distinct.indexOf(value) == -1) distinct.push(value);
}
return distinct.length;
}
db.Person.mapReduce(map, reduce,
{
out: "PersonCityCount2"
});

MongoDB
● Default _id index, assuring uniqueness
● Single field index: db.Person.createIndex( { name: 1 } );
● Compound index: db.Address.createIndex( { city: 1, street: -1 } );
– index sorts first asc. by city then desc. by street
– Index will also used when query only filters by one of the fields
● Multikey index: db.Person.createIndex( { Address.city: 1 } )
– Indexes content stored in arrays, an index entry is created foreach
● Geospatial index
● Text index
● Hashed index
Indexes

MongoDB
● uniqueness: insertion of duplicate field value will be rejected
● partial index: indexes only documents matching certain filter criteria
● sparse index: indexes only documents having the indexed field
● TTL index: automatically removes documents after certain time
● Query optimization: use db.MyCollection.find({ … }).explain() to check
whether query is answered using an index, and how many documents had
still to be scanned
● Covered queries: if a query only contains indexed fields, the results will
delivered directly from index without scanning or materializing any
documents
● Index intersection: can apply different indexes to cover query parts
Index properties

MongoDB
● Since MongoDB 3.0 WiredTiger is the default storage engine
– locking at document level enables concurrent writes on collection
– durability ensured via write-ahead transaction log and checkpoints (
Journaling)
– supports compression of collections and indexes (via snappy or zlib)
● MMAPv1 was the default storage until MongoDB 3.0
– since MongoDB 3.0 supports locking at collection level, before only
database level
– useful for selective updates, as WiredTiger always replace the hole
document in a update operation
Storage engines

MongoDB
Clustering, Sharding, Replication
Shard 1
Primary
(mongod)
Secondary
(mongod)
Secondary
(mongod)
Config server
(replica set)
App server
(mongos)
Client app
(driver)
Heartbeat
Replication Replication
writes
reads

MongoDB
Shard key selection
Shard 1 Shard 2 Shard 3
{
key: 12,
...
}
{
key: 21,
...
}
{
key: 35,
...
}
min <= key < 15 15 <= key < 30 30 <= key < max
Sharded Collection
(Hash function)

MongoDB
● ACID → MongoDB is compliant to this only at document level
– Atomicity
– Consistency
– Isolation
– Durability
● CAP → MongoDB assures CP
– Consistency
– Availability
– Partition tolerance
transactions
BASE:
Basically Available, Soft state,
Eventual consistency
MongoDB doesn't support transactions
multi document updates can be
performed via Two-Phase-Commit

MongoDB
● Javascript: Mongo Node.js driver
● Java: Java MongoDB Driver
● Python: PyMongo, Motor (async)
● Ruby: MongoDB Ruby Driver
● C#: Mongo Csharp Driver
● ...
Driver
Object-document mappers
● Javascript: mongoose, Camo, MEAN.JS
● Java: Morphia, SpringData MongoDB
● Python: Django MongoDB engine
● Ruby: MongoMapper, Mongoid
● C#: LinQ
● ...

MongoDB
● CKAN
● MongoDB-Hadoop connector
● MongoDB Spark connector
● MongoDB ElasticSearch/Solr connector
● ...
Extensions and connectors
Tool support
● Robomongo
● MongoExpress
● ...

MongoDB
● Who uses MongoDB
● Case studies
● Arctic TimeSeries and Tick store
● uptime
Real world examples
MongoDB in Code For Germany projects
● Politik bei uns (Offenes Ratsinformationssystem), gescrapte Stadtratsdaten
werden gemäß dem OParl-Format in einer MongoDB gespeichert, siehe
auch Daten, Web-API und Oparl-Client

MongoDB
●
Choose
– mass data processing, like event data
– dynamic scheme
●
Not to choose
– static scheme with lot of relations
– strict transaction requirements
When to choose, when not to choose

MongoDB
●
MongoDB Schema Simulation
●
6 Rules of Thumb for MongoDB Schema Design
●
MongoDB Aggregation
●
MongoDB Indexes
●
Sharding
●
MongoDB University
●
Why Relational Databases are not the Cure-All
Links

Mongo DB schema design patterns

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mongo DB schema design patterns

Similar to Mongo DB schema design patterns (20)

More from joergreichert

More from joergreichert (20)

Recently uploaded

Recently uploaded (20)

Mongo DB schema design patterns