Mongo DB schema design patterns

MongoDB
PRESENTED BY
Jörg Reichert
Licensed under cc-by v3.0 (any jurisdiction)

Introduction
● Name derived from humongous (= gigantic)
● NoSQL (= not only SQL) database
● Document oriented database
– documents stored as binary JSON (BSON)
● Ad-hoc queries
● Server side Javascript execution
● Aggregation / MapReduce
● High performance, availability, scalability

MongoDB
Relational vs. document based: concepts
SQL
Person
Name AddressId
MongoDB
1
2
Mueller 1
Id
Address
City Street
1
2
<null> 2
Leipzig Burgstr. 1
Dresden <null>
Person
{
_id: ObjectId(“...“),
Name: “Mueller“,
Address: {
City: “Leipzig“,
Street: “Burgstr. 1“,
},
}, {
Address: {
},
}
DB DB
Table CollectionColumn
Row
Document
Key: Value
FieldPK
FK
Relation
Embedded document
PK
PK: primary key, FK: foreign key

MongoDB
SELECT * FROM Person;
SELECT * FROM Person
WHERE name = “Mueller“;
WHERE name like “M%“;
SELECT name FROM Person;
SELECT distinct(name)
FROM Person
Relational vs. document based: syntax (1/3)
db.getCollection(“Person“).find();
db.Person.find({ “name“: "Mueller“ });
db.Person.find({ “name“: /M.*/ });
db.Person.find({}, {name: 1, _id: 0});
db.Person.distinct(
“name“, { “name“: "Mueller“ });

MongoDB
WHERE id > 10
AND name <> “Mueller“;
SELECT p.name FROM Person p
JOIN Address a
ON p.address = a.id
WHERE a.city = “Leipzig“
ORDER BY p.name DESC;
SELECT * FROM
WHERE name IS NOT NULL;
SELECT COUNT(*) FROM PERSON
db.Person.find({ $and: [
{ _id: { $gt: ObjectId("...") }},
{ name: { $ne: "Mueller" }}]});
db.Person.find(
{ Address.city: “Leipzig“ },
{ name: 1, _id: 0 }
).sort({ name: -1 });
db.Person.find( { name: {
$not: { $type: 10 }, $exists: true }});
db.Person.count({ name: “Mueller“ });
db.Person.find(
{ name: “Mueller“ }).count();

MongoDB
UPDATE Person
SET name = “Müller“
DELETE Person
INSERT Person (name, address)
VALUES (“Mueller“, 3);
ALTER TABLE PERSON
DROP COLUMN name;
DROP TABLE PERSON;
db.Person.updateMany(
{ name: “Mueller“ },
{ $set: { name: “Müller“} });
db.Person.remove( { name: “Mueller“ } );
db.Person.insert(
{ name: “Mueller“, Address: { … } });
db.Person.updateMany( {},
{ $unset: { name: 1 }} );
db.Person.drop();

MongoDB
● principle of least cardinality
● Store what you query for
schema design principles

MongoDB
● applicable for 1:1 and 1:n when
n can‘t get to large
● Embedded document cannot get
too large
● Embedded document not very
likely to change
● arrays that grow without bound
should never be embedded
schema design: embedded document
{
Person: [
{
},
{
Name: “Schneider“,
},
]
}
Address

MongoDB
● applicable for :n when n can‘t
get to large
● Referenced document likely to
change often in future
● there are many referenced
documents expected, so storing
only the reference is cheaper
● there are large referenced
documents expected, so storing
only the reference is cheaper
● arrays that grow without bound
should never be embedded
● Address should be accessible on
its own
schema design: referencing
{
Person: [
ObjectId(“...“), ObjectId(“...“),
]
}
{
}
Address
Person

MongoDB
● applicable for :n relations when
n can get very large (note: a
MongoDB document isn‘t
allowed to exceed 16MB)
● Joins are done on application
level
schema design: parent-referencing
{
City: “Dubai“,
Street: “1 Sheikh Mohammed
bin Rashid Blvd“,
}
{
Address: ObjectId(“...“),
}
Address
Person

MongoDB
● applicable for m:n when n and m
can‘t get to large and application
requires to navigate both ends
● disadvantage: need to update
operations when changing
references
schema design: two way referencing
{
Person: [
]
}
{
Address: [
]
}
Address
Person

MongoDB
● queries expected to filter by
certain fields of the referenced
document, so including this field
already in the hosts saves an
additional query at application
level
● disadvantage: two update
operations for duplicated field
● disadvantage: additional
memory consumption
schema design: denormalization
{
}
{
Address: [
{
id: ObjectId(“...“),
city: “Leipzig“,
}, ...
]
}
Address
Person

MongoDB
● applicable for :n relations when
n can get very large and it‘s
expected that application will
use pagination anyway
● DB schema will already create
the chunks, the application will
later query for
schema design: bucketing
{
}
{
Address: ObjectId(“...“),
Page: 13,
Count: 50,
Persons: [
{ Name: “Mueller“ }, ...
]
}
Address
Person

MongoDB
Aggregation Framework
● Aggregation pipeline consisting of (processing) stages
– $match, $group, $project, $redact, $unwind, $lookup, $sort, ...
● Aggregation operators
– Boolean: $and, $or, $not
– Aggregation: $eq, $lt, $lte, $gt, $gte, $ne, $cmp
– Arithmetic: $add, $substract, $multiply, $divide, ...
– String: $concat, $substr, …
– Array: $size, $arrayElemAt, ...
– Aggregation variable: $map, $let
– Group Accumulator: $sum, $avg, $addToSet, $push, $min, $max
$first, $last, …
– ...

MongoDB
Aggregation Framework
db.Person.aggregate( [
{ $match: { name: { $ne: "Fischer" } } },
{ $group: {
_id: "$name",
city_occurs: { $addToSet: "$Address.city" }
} },
{ $project: {
_id: "$_id",
city_count: { $size: "$city_occurs" }
}},
{ $sort: { name: 1 } }
{ $match: { city_count: { $gt: 1 } }},
{ $out: "PersonCityCount"}
] );
PersonCityCount
{
_id: Mueller,
city_count: 2,
},
{
_id: Schmidt,
city_count: 3,
}, ...

MongoDB
Map-Reduce
● More control than aggregation framework, but slower
var map = function() {
if(this.name != "Fischer") emit(this.name, this.Address.city);
}
var reduce = function(key, values) {
var distinct = [];
for(value in values) {
if(distinct.indexOf(value) == -1) distinct.push(value);
}
return distinct.length;
}
db.Person.mapReduce(map, reduce,
{
out: "PersonCityCount2"
});

MongoDB
● Default _id index, assuring uniqueness
● Single field index: db.Person.createIndex( { name: 1 } );
● Compound index: db.Address.createIndex( { city: 1, street: -1 } );
– index sorts first asc. by city then desc. by street
– Index will also used when query only filters by one of the fields
● Multikey index: db.Person.createIndex( { Address.city: 1 } )
– Indexes content stored in arrays, an index entry is created foreach
● Geospatial index
● Text index
● Hashed index
Indexes

MongoDB
● uniqueness: insertion of duplicate field value will be rejected
● partial index: indexes only documents matching certain filter criteria
● sparse index: indexes only documents having the indexed field
● TTL index: automatically removes documents after certain time
● Query optimization: use db.MyCollection.find({ … }).explain() to check
whether query is answered using an index, and how many documents had
still to be scanned
● Covered queries: if a query only contains indexed fields, the results will
delivered directly from index without scanning or materializing any
documents
● Index intersection: can apply different indexes to cover query parts
Index properties

MongoDB
● Since MongoDB 3.0 WiredTiger is the default storage engine
– locking at document level enables concurrent writes on collection
– durability ensured via write-ahead transaction log and checkpoints (
Journaling)
– supports compression of collections and indexes (via snappy or zlib)
● MMAPv1 was the default storage until MongoDB 3.0
– since MongoDB 3.0 supports locking at collection level, before only
database level
– useful for selective updates, as WiredTiger always replace the hole
document in a update operation
Storage engines

MongoDB
Clustering, Sharding, Replication
Shard 1
Primary
(mongod)
Secondary
(mongod)
Secondary
(mongod)
Config server
(replica set)
App server
(mongos)
Client app
(driver)
Heartbeat
Replication Replication
writes
reads

MongoDB
Shard key selection
Shard 1 Shard 2 Shard 3
{
key: 12,
...
}
{
key: 21,
...
}
{
key: 35,
...
}
min <= key < 15 15 <= key < 30 30 <= key < max
Sharded Collection
(Hash function)

MongoDB
● ACID → MongoDB is compliant to this only at document level
– Atomicity
– Consistency
– Isolation
– Durability
● CAP → MongoDB assures CP
– Consistency
– Availability
– Partition tolerance
transactions
BASE:
Basically Available, Soft state,
Eventual consistency
MongoDB doesn't support transactions
multi document updates can be
performed via Two-Phase-Commit

MongoDB
● Javascript: Mongo Node.js driver
● Java: Java MongoDB Driver
● Python: PyMongo, Motor (async)
● Ruby: MongoDB Ruby Driver
● C#: Mongo Csharp Driver
● ...
Driver
Object-document mappers
● Javascript: mongoose, Camo, MEAN.JS
● Java: Morphia, SpringData MongoDB
● Python: Django MongoDB engine
● Ruby: MongoMapper, Mongoid
● C#: LinQ
● ...

MongoDB
● CKAN
● MongoDB-Hadoop connector
● MongoDB Spark connector
● MongoDB ElasticSearch/Solr connector
● ...
Extensions and connectors
Tool support
● Robomongo
● MongoExpress
● ...

MongoDB
● Who uses MongoDB
● Case studies
● Arctic TimeSeries and Tick store
● uptime
Real world examples
MongoDB in Code For Germany projects
● Politik bei uns (Offenes Ratsinformationssystem), gescrapte Stadtratsdaten
werden gemäß dem OParl-Format in einer MongoDB gespeichert, siehe
auch Daten, Web-API und Oparl-Client

MongoDB
●
Choose
– mass data processing, like event data
– dynamic scheme
●
Not to choose
– static scheme with lot of relations
– strict transaction requirements
When to choose, when not to choose

MongoDB
●
MongoDB Schema Simulation
●
6 Rules of Thumb for MongoDB Schema Design
●
MongoDB Aggregation
●
MongoDB Indexes
●
Sharding
●
MongoDB University
●
Why Relational Databases are not the Cure-All
Links

Mongo DB schema design patterns

More Related Content

What's hot

Similar to Mongo DB schema design patterns

More from joergreichert

Recently uploaded

Mongo DB schema design patterns