Your SlideShare is downloading. ×
Silicon Valley Code Camp: 2011 Introduction to MongoDB
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Silicon Valley Code Camp: 2011 Introduction to MongoDB

1,675
views

Published on

My Talk today at Silicon Valley Code Camp 2011 on Introduction to MongoDB.

My Talk today at Silicon Valley Code Camp 2011 on Introduction to MongoDB.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,675
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to MongoDB
    Silicon Valley Code Camp
    Oct 8th, 2011
  • 2. Before we start……
    NoSQL is a movement, its not antiSQL
    Relational Databases have their place, but they are not the only solution
    Diversify - Best tool for the job
    The footers contain quotes from the video Mongo DB is Web Scale
  • 3. Agenda
    Relational Databases vs. NoSQL
    CAP Theorem
    MongoDB at a high level
    Collections, Documents
    Inserting, Querying and Updating
    Other MongoDB Commands
    Replication Topologies
    Using MongoDB via a driver
    Few Internals
    Administration
  • 4. Relational Databases
    Have been around for years
    De-facto standard for any persistence
    ACID compliant
    Rigid Schema
    Usually hard to scale over a distributed network
    Normalization is almost always a requirement
    ORMs tend to limit the optimizations you can do to the queries.
    Relational Databases were'nt built for Web Scale. They have impotence mismatch.
  • 5. NoSQL
    Why?
    Not everything can be modeled in a relational construct
    Cluster-aware out of the box. Replication, shardingetc. is built into the core
    Schemaless
    (Mostly) Open Source, Community supported
    High performance by design and not ball-and-chained with ACID
  • 6. CAP Theorem : Pick Two
    Consistency – Each client sees the same data
    Availability – The system is always available for any reads and writes
    Partition Tolerance – The system can tolerate any communication failure across the network (except someone pulling the plug across the datacenters).
    At any given point in time, only two of the above hold true in any distributed datastore.
    If thats what they need to do to get those kick ass benchmarks, then its a great design.
  • 7. Visual Guide to NoSQL Databases
    Source: http://blog.nahurst.com/visual-guide-to-nosql-systems
  • 8. How do they make up?
    Usually the NoSQL databases are AP, or CP.
    Consistency
    Eventually consistent
    Write concerns
    Availability
    Read-only
    stale data
  • 9. MongoDB : High Level
    Document-based Database
    Schemaless
    Cluster-aware
    Easy Querying/Javascript Support
    Memory Mapped
    Drivers in all the popular languages
    Excellent developer velocity (Supported by 10gen)
    Durable via Journaling
    C-P System based on the CAP theorem
    MongoDB handles WebScale. You turn it on and it scales right up.
  • 10. Collections
    The closest comparison to a MongoDB Collection in the relational world is a Table
    A collection is not bound by a schema
    A collection has a namespace
    Can be a capped collection
    It contains BSON documents
  • 11. Documents
    Closest comparison in the relational world is a Row in a Table.
    Must reside within a Collection
    Looks like (structured) JSON, stored as BSON within a collection
    Limited to 16MB (as of 2.0)
    Larger sizes supported via GridFS
    Reference : http://www.bsonspec.org. Defined as Binary-encoded Serialization format for JSON-like Documents.
  • 12. Inserting Documents
    Console defaults to localhost port 27017
    show databases
    show collections
    Insert a document in a collection
    Bulk inserts via Javascript
  • 13. Querying and Updating Documents
    Query a document
    Select certain fields
    Using limit, skip, sort and count
    Using explain
    In Place Updates
    $inc, $push, $pull, $pop, $slice, $in, $nin
    Indexing on fields
    MongoDB is a Document Database, that does not need joins. It uses Map Reduce.
  • 14. Other console commands
    db.stats()
    db.collection.stats()
    db.isMaster()
    rs.status()
    db.currentOp()
    db.serverStatus()
  • 15. Replication: Master Slave
    Achieved by “declaring” 1 node as the master, and “declaring” many nodes as its slaves
    Single point of failure/No failover
    Can add any number of slaves easily
    May need to put slaves behind a load balancer
  • 16. Replication : ReplicaSets
    Achieved by creating a cluster, called a replSet, and adding “members” to it.
    The “primary” and “secondary” roles are decided among the nodes. There is no permanent “master” or “slave”.
    Automatic Failover via voting
    Arbiter may be needed if there are even number of nodes to break a tie
    Easy to add new members
    Adding load-balancing will void failover
  • 17. Accessing MongoDBProgramatically
    Scala
    Using casbah
    Code to insert a document
    Code to find/query
    Code to update
  • 18. Object-Document Mappers
    Mongo Drivers understand Hashes, or DBObjects. A DBObject essentially is a Map
    The class needs to be converted to a DBObject, either by the developer or by the driver.
    Some such mappers also provide a DAO which makes it easy to perform CRUD operations.
    MongoMapper for Ruby
    Salat for Scala
    Morphia for Java
  • 19. Internals
    Data is memory mapped, so writes can scale as no disk IO is performed with every write.
    Delayed writes to the disc, default 60 seconds.
    Always easier to keep the indices and the working set of the data in the memory to avoid swapping
    Pre-allocated files in increments
    Smart algorithm to add padding to the storage when the document sizes are inconsistent
    Durability is achieved by journaling, introduced in 1.7
  • 20. Replication Internals
    The almighty Oplog – Capped Collection
    Acts like a tx log which the slaves or secondaries read from and apply.
    getmore on the primary/master every 4s
    Failover and voting
    Delayed sync
    Using rs.slaveOk() to query the secondaries in a replSet
  • 21. Scaling MongoDB
    Be smart with your schema design
    Know ahead of time if the system will be read-heavy or write-heavy
    Use explain(), use indices
    Do not fetch the entire document - select fields.
    Keep an eye on index misses and page faults via mongostat
    Denormalize- avoid links, use embeds.
    You can never replicate enough
    Horizontal scaling via sharding
    If /dev/null is faster then WebScale, I’ll use it. Does /dev/null support sharding?
  • 22. Backups
    Lock the database for a cold backup
    Use filer snapshots
    Use mongodump -> BSON, mongorestore to restore
    Use mongoexport -> JSON, mongoimport to restore
    Spare slaves always help
  • 23. Monitoring
    MMS
    Developed by 10gen
    Munin
    Plugins available to monitor MongoDB Server
    Nagios
    For Machine Health Check
  • 24. Comparison of NoSQL Solutions
    Source: http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart
  • 25. We’re hiring!corp.ign.com/careers, and @ignjobs
    Scala
    Java
    PHP/Zend
    Rails
    ElasticSearch
    MongoDB
    MySQL
    HTML5
    Jquery Mobile
    Sencha Touch
    Phonegap
    Wordpress
    ActionScript/Flash
    Redis/Memcached
    CI/CD
  • 26. About
    Manish Pandit
    Sr. Engineering Manager
    IGN Entertainment
    http://linkedin.com/in/mpandit
    @lobster1234