Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Silicon Valley Code Camp: 2011 Introduction to MongoDB


Published on

My Talk today at Silicon Valley Code Camp 2011 on Introduction to MongoDB.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Silicon Valley Code Camp: 2011 Introduction to MongoDB

  1. 1. Introduction to MongoDB<br /> Silicon Valley Code Camp<br /> Oct 8th, 2011 <br />
  2. 2. Before we start……<br />NoSQL is a movement, its not antiSQL<br />Relational Databases have their place, but they are not the only solution<br />Diversify - Best tool for the job<br />The footers contain quotes from the video Mongo DB is Web Scale<br />
  3. 3. Agenda<br />Relational Databases vs. NoSQL<br />CAP Theorem <br />MongoDB at a high level<br />Collections, Documents<br />Inserting, Querying and Updating<br />Other MongoDB Commands<br />Replication Topologies<br />Using MongoDB via a driver<br />Few Internals<br />Administration<br />
  4. 4. Relational Databases<br />Have been around for years<br />De-facto standard for any persistence<br /> ACID compliant<br />Rigid Schema<br />Usually hard to scale over a distributed network <br />Normalization is almost always a requirement<br />ORMs tend to limit the optimizations you can do to the queries.<br />Relational Databases were'nt built for Web Scale. They have impotence mismatch.<br />
  5. 5. NoSQL<br />Why?<br />Not everything can be modeled in a relational construct<br />Cluster-aware out of the box. Replication, shardingetc. is built into the core<br />Schemaless<br />(Mostly) Open Source, Community supported<br />High performance by design and not ball-and-chained with ACID<br />
  6. 6. CAP Theorem : Pick Two<br />Consistency – Each client sees the same data<br />Availability – The system is always available for any reads and writes<br />Partition Tolerance – The system can tolerate any communication failure across the network (except someone pulling the plug across the datacenters).<br />At any given point in time, only two of the above hold true in any distributed datastore.<br />If thats what they need to do to get those kick ass benchmarks, then its a great design.<br />
  7. 7. Visual Guide to NoSQL Databases<br />Source:<br />
  8. 8. How do they make up?<br />Usually the NoSQL databases are AP, or CP.<br />Consistency <br />Eventually consistent<br />Write concerns<br />Availability<br />Read-only<br /> stale data<br />
  9. 9. MongoDB : High Level<br />Document-based Database<br />Schemaless<br />Cluster-aware<br />Easy Querying/Javascript Support<br />Memory Mapped<br />Drivers in all the popular languages<br />Excellent developer velocity (Supported by 10gen)<br />Durable via Journaling<br />C-P System based on the CAP theorem<br />MongoDB handles WebScale. You turn it on and it scales right up.<br />
  10. 10. Collections<br />The closest comparison to a MongoDB Collection in the relational world is a Table<br />A collection is not bound by a schema<br />A collection has a namespace<br />Can be a capped collection<br />It contains BSON documents<br />
  11. 11. Documents<br />Closest comparison in the relational world is a Row in a Table.<br />Must reside within a Collection<br />Looks like (structured) JSON, stored as BSON within a collection<br />Limited to 16MB (as of 2.0)<br />Larger sizes supported via GridFS<br />Reference : Defined as Binary-encoded Serialization format for JSON-like Documents. <br />
  12. 12. Inserting Documents<br />Console defaults to localhost port 27017<br />show databases<br />show collections<br />Insert a document in a collection<br />Bulk inserts via Javascript<br />
  13. 13. Querying and Updating Documents<br />Query a document<br />Select certain fields<br />Using limit, skip, sort and count<br />Using explain<br />In Place Updates<br />$inc, $push, $pull, $pop, $slice, $in, $nin<br />Indexing on fields<br />MongoDB is a Document Database, that does not need joins. It uses Map Reduce.<br />
  14. 14. Other console commands<br />db.stats()<br />db.collection.stats()<br />db.isMaster()<br />rs.status()<br />db.currentOp()<br />db.serverStatus()<br />
  15. 15. Replication: Master Slave <br />Achieved by “declaring” 1 node as the master, and “declaring” many nodes as its slaves<br />Single point of failure/No failover<br />Can add any number of slaves easily<br />May need to put slaves behind a load balancer<br />
  16. 16. Replication : ReplicaSets<br />Achieved by creating a cluster, called a replSet, and adding “members” to it.<br />The “primary” and “secondary” roles are decided among the nodes. There is no permanent “master” or “slave”.<br />Automatic Failover via voting<br />Arbiter may be needed if there are even number of nodes to break a tie<br />Easy to add new members<br />Adding load-balancing will void failover<br />
  17. 17. Accessing MongoDBProgramatically<br />Scala<br />Using casbah<br />Code to insert a document<br />Code to find/query<br />Code to update<br />
  18. 18. Object-Document Mappers<br />Mongo Drivers understand Hashes, or DBObjects. A DBObject essentially is a Map<br />The class needs to be converted to a DBObject, either by the developer or by the driver.<br />Some such mappers also provide a DAO which makes it easy to perform CRUD operations.<br />MongoMapper for Ruby<br />Salat for Scala<br />Morphia for Java<br />
  19. 19. Internals<br />Data is memory mapped, so writes can scale as no disk IO is performed with every write.<br />Delayed writes to the disc, default 60 seconds.<br />Always easier to keep the indices and the working set of the data in the memory to avoid swapping<br />Pre-allocated files in increments<br />Smart algorithm to add padding to the storage when the document sizes are inconsistent<br />Durability is achieved by journaling, introduced in 1.7<br />
  20. 20. Replication Internals<br />The almighty Oplog – Capped Collection<br />Acts like a tx log which the slaves or secondaries read from and apply.<br />getmore on the primary/master every 4s<br />Failover and voting<br />Delayed sync<br />Using rs.slaveOk() to query the secondaries in a replSet<br />
  21. 21. Scaling MongoDB<br />Be smart with your schema design<br />Know ahead of time if the system will be read-heavy or write-heavy<br />Use explain(), use indices<br />Do not fetch the entire document - select fields.<br />Keep an eye on index misses and page faults via mongostat<br />Denormalize- avoid links, use embeds.<br />You can never replicate enough<br />Horizontal scaling via sharding<br />If /dev/null is faster then WebScale, I’ll use it. Does /dev/null support sharding?<br />
  22. 22. Backups<br />Lock the database for a cold backup<br />Use filer snapshots<br />Use mongodump -> BSON, mongorestore to restore<br />Use mongoexport -> JSON, mongoimport to restore<br />Spare slaves always help<br />
  23. 23. Monitoring<br />MMS<br />Developed by 10gen<br />Munin<br />Plugins available to monitor MongoDB Server<br />Nagios<br />For Machine Health Check<br />
  24. 24. Comparison of NoSQL Solutions<br />Source:<br />
  25. 25. We’re hiring!, and @ignjobs<br />Scala<br />Java<br />PHP/Zend<br />Rails<br />ElasticSearch<br />MongoDB<br />MySQL<br />HTML5<br />Jquery Mobile<br />Sencha Touch<br />Phonegap<br />Wordpress<br />ActionScript/Flash<br />Redis/Memcached<br />CI/CD<br />
  26. 26. About<br />Manish Pandit <br />Sr. Engineering Manager<br /> IGN Entertainment<br /><br />@lobster1234<br />