Agenda • Data, queries, etc. • Concurrency • AggregaEon • Deployment • Durability • Things to be aware of
MongoDB • Document database • Currently in v. 2.0.4 • Developed by 10gen • Open source – server is GNU AGPL v3 – clients (the oﬃcial) are Apache V2 • Absolutely free to use – you can get a commercial version of the db though – has support, SSL, and more security features
Conceptual data organizaEon process database collection document process database table row
Example 1 • Install • Mongo Shell • Show database contents • Add and show a document
Queries including several other query operators: $gt, $gte, $lt, $lte, $exists, $all, etc...
Updates including several other update modiﬁers: $inc, $set, $addToSet, $rename, etc...
Example 2 • Import some data • Query • Update • Index • Query
ACID? • Atomic: Yeah well, per document. • Consistent: Yeah well, can be. • Isolated: Yeah well, per document. • Durable: Yeah well, can be – not default though....
Concurrency • Pushing it down the stack
Concurrency • Preserve invariants with update precondiEons
Concurrency • Use opEmisEc locking when replacing document (and then check whether n is 0 or 1...)
Concurrency • Use FindAndModify to “check out” documents
AggregaEon • Map/reduce
AggregaEon • Map/reduce – Map: for each document: emit 0 or more (key, value) tuples – Reduce: given a (key, value), return 1 value
Example 3 • Use map/reduce to collect informaEon on who appeared in each episode
AggregaEon • AggregaEon framework (not available unEl 2.2) – declaraEve syntax for construcEon of an aggregaEon pipeline
AggregaEon • AggregaEon framework (not available unEl 2.2)
Deployment • Several conﬁguraEons – we’ll check out replica sets and sharding
Replica sets • Master-‐slave with automaEc failover – Each mongod should be started with the -‐-‐replset argument – AddiEonal nodes added from the shell – Make sure the number of nodes is odd, possibly by adding an arbiter
Replica sets • Higher availability • Scale out reads • Backup without interfering with the primary
Sharding • Auto-‐sharding – happens by user-‐deﬁned shard key – can be deﬁned per collecEon – requires special nodes: mongos (the load balancer) and a mongod that is conﬁgured to be a conﬁguraEon server
Sharding • Scale out writes • LimitaEons: – Shard key is immutable – All inserts/updates must include the shard key – Cannot enforce (arbitrary) uniqueness across shards, only for shard key
Sharding + replica sets
MongoDB’s durability story • Memory-‐mapped ﬁles. • fsync. • Durability through replicaEon – pre 1.8 • Durability through journaling – an opEon since 1.8 – replica sets sEll cool though – default since 2.0
MongoDB’s durability story • Inserts and updates are unsafe by default!! – only purpose: get awesome benchmarks – bad: bites you in the a** • Exposed diﬀerently on drivers, but always maps to db.getLastError()
MongoDB’s durability story • Conclusion: It’s cool that you can tweak it per operation, but it’s uncool that it’s unsafe.
Things to be aware of • Safe mode oﬀ • 32/64 bit • Memory-‐mapped ﬁle • Global write lock • Indexes should always ﬁt in RAM
Thanks for listening! email@example.com @mookid8000 h8p://mookid.dk/oncode
Image credits The world’s most interesEng man: h8p://i.qkme.me/3mwy.jpg Bison: h8p://www.ﬂickr.com/photos/johan-‐gril/5632513228/ Tired Fry: h8p://cdn.memegenerator.net/instances/400x/18731987.jpg Thanks for lerng me borrow your awesome images – if you ever meet me, I’ll buy you a beer. Seriously, I will.