Introduction to MongoDB Silicon Valley Code Camp Oct 8th, 2011
Before we start…… NoSQL is a movement, its not antiSQL Relational Databases have their place, but they are not the only solution Diversify - Best tool for the job The footers contain quotes from the video Mongo DB is Web Scale
Agenda Relational Databases vs. NoSQL CAP Theorem MongoDB at a high level Collections, Documents Inserting, Querying and Updating Other MongoDB Commands Replication Topologies Using MongoDB via a driver Few Internals Administration
Relational Databases Have been around for years De-facto standard for any persistence ACID compliant Rigid Schema Usually hard to scale over a distributed network Normalization is almost always a requirement ORMs tend to limit the optimizations you can do to the queries. Relational Databases were'nt built for Web Scale. They have impotence mismatch.
NoSQL Why? Not everything can be modeled in a relational construct Cluster-aware out of the box. Replication, shardingetc. is built into the core Schemaless (Mostly) Open Source, Community supported High performance by design and not ball-and-chained with ACID
CAP Theorem : Pick Two Consistency – Each client sees the same data Availability – The system is always available for any reads and writes Partition Tolerance – The system can tolerate any communication failure across the network (except someone pulling the plug across the datacenters). At any given point in time, only two of the above hold true in any distributed datastore. If thats what they need to do to get those kick ass benchmarks, then its a great design.
Visual Guide to NoSQL Databases Source: http://blog.nahurst.com/visual-guide-to-nosql-systems
How do they make up? Usually the NoSQL databases are AP, or CP. Consistency Eventually consistent Write concerns Availability Read-only stale data
Collections The closest comparison to a MongoDB Collection in the relational world is a Table A collection is not bound by a schema A collection has a namespace Can be a capped collection It contains BSON documents
Documents Closest comparison in the relational world is a Row in a Table. Must reside within a Collection Looks like (structured) JSON, stored as BSON within a collection Limited to 16MB (as of 2.0) Larger sizes supported via GridFS Reference : http://www.bsonspec.org. Defined as Binary-encoded Serialization format for JSON-like Documents.
Querying and Updating Documents Query a document Select certain fields Using limit, skip, sort and count Using explain In Place Updates $inc, $push, $pull, $pop, $slice, $in, $nin Indexing on fields MongoDB is a Document Database, that does not need joins. It uses Map Reduce.
Other console commands db.stats() db.collection.stats() db.isMaster() rs.status() db.currentOp() db.serverStatus()
Replication: Master Slave Achieved by “declaring” 1 node as the master, and “declaring” many nodes as its slaves Single point of failure/No failover Can add any number of slaves easily May need to put slaves behind a load balancer
Replication : ReplicaSets Achieved by creating a cluster, called a replSet, and adding “members” to it. The “primary” and “secondary” roles are decided among the nodes. There is no permanent “master” or “slave”. Automatic Failover via voting Arbiter may be needed if there are even number of nodes to break a tie Easy to add new members Adding load-balancing will void failover
Accessing MongoDBProgramatically Scala Using casbah Code to insert a document Code to find/query Code to update
Object-Document Mappers Mongo Drivers understand Hashes, or DBObjects. A DBObject essentially is a Map The class needs to be converted to a DBObject, either by the developer or by the driver. Some such mappers also provide a DAO which makes it easy to perform CRUD operations. MongoMapper for Ruby Salat for Scala Morphia for Java
Internals Data is memory mapped, so writes can scale as no disk IO is performed with every write. Delayed writes to the disc, default 60 seconds. Always easier to keep the indices and the working set of the data in the memory to avoid swapping Pre-allocated files in increments Smart algorithm to add padding to the storage when the document sizes are inconsistent Durability is achieved by journaling, introduced in 1.7
Replication Internals The almighty Oplog – Capped Collection Acts like a tx log which the slaves or secondaries read from and apply. getmore on the primary/master every 4s Failover and voting Delayed sync Using rs.slaveOk() to query the secondaries in a replSet
Scaling MongoDB Be smart with your schema design Know ahead of time if the system will be read-heavy or write-heavy Use explain(), use indices Do not fetch the entire document - select fields. Keep an eye on index misses and page faults via mongostat Denormalize- avoid links, use embeds. You can never replicate enough Horizontal scaling via sharding If /dev/null is faster then WebScale, I’ll use it. Does /dev/null support sharding?
Backups Lock the database for a cold backup Use filer snapshots Use mongodump -> BSON, mongorestore to restore Use mongoexport -> JSON, mongoimport to restore Spare slaves always help
Monitoring MMS Developed by 10gen Munin Plugins available to monitor MongoDB Server Nagios For Machine Health Check
Comparison of NoSQL Solutions Source: http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart
We’re hiring!corp.ign.com/careers, and @ignjobs Scala Java PHP/Zend Rails ElasticSearch MongoDB MySQL HTML5 Jquery Mobile Sencha Touch Phonegap Wordpress ActionScript/Flash Redis/Memcached CI/CD
About Manish Pandit Sr. Engineering Manager IGN Entertainment http://linkedin.com/in/mpandit @lobster1234