MongoDB: An Introduction - june-2011

Uploaded on

Presentation to the SVForum Architecture and Platform SIG meetup

Presentation to the SVForum Architecture and Platform SIG meetup

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • I've uploaded a newer version of this presentation here: . Not a lot of changes, just a little bit about using nested documents to avoid the need for joins and transactions.
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. MongoDB: An Introduction
    Chris Westin
    Software Engineer, 10gen
    © Copyright 2010 10gen Inc.
  • 2. Outline
    The Whys of Non-Relational Databases
    Vocabulary of the Non-Relational World
  • 3. Why did non-relational databases arise?
    Problems with relational databases in the web world
    The Whys of Non-Relational Databases
  • 4. Problem - Schema Evolution
    Applications are evolving all the time
    Applications need new fields
    Applications need new indexes
    Data is growing – sometimes very fast
    Users need to be able to alter their schemas without making their data unavailable
    The web world expects 24x7 service
    RDBMSs can have a hard time doing this
  • 5. Problem – Write Rates
    Replication is a solution for high read loads
    Sooner or later, writing becomes a bottleneck
    Sharding – partitioning a logical database across multiple database instances
    Joins and aggregation become a problem
    Distributed transactions are too slow for the web
    Manual management of shards
    Choosing shard partitions
    Rebalancing shards
  • 6. An introduction to terminology you’re going to be seeing a lot
    Vocabulary of the Non-Relational World
  • 7. Data Models
    A non-relational database’s data model determines the kinds of items it can contain and how they can be retrieved
    What can the system store, and what does it know about what it contains?
    The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition
    What kind of queries can you do?
    SQL is a manifestation of the kinds of queries that fall out of relational algebra
  • 8. Non-Relational Data Models
    Key-value stores
    Document stores
    Column-oriented databases
    Graph databases
  • 9. Key-Value Stores
    A mapping from a key to a value
    The store doesn’t know anything about the the key or value
    The store doesn’t know anything about the insides of the value
    Set, get, or delete a key-value pair
  • 10. Document Stores
    The store is a container for documents
    Documents are made up of named fields
    Fields may or may not have type definitions
    e.g. XSDs for XML stores, vs. schema-less JSON stores
    Can create “secondary indexes”
    These provide the ability to query on any document field(s)
    Insert and delete documents
    Update fields within documents
  • 11. Column-Oriented Stores
    Like a relational store, but flipped around: all data for a column is kept together
    An index provides a means to get a column value for a record
    Get, insert, delete records; updating fields
    Streaming column data in and out of Hadoop
  • 12. Graph Databases
    Stores vertex-to-vertex edges
    Getting and setting edges
    Sometimes possible to annotate vertices or edges
    Query languages support finding paths between vertices, subject to various constraints
  • 13. Consistency Models
    Relational databases support transactions
    Can only see committed changes
    Commit/abort span multiple changes
    Read-only transaction flavors
    Read committed, repeatable read, etc
    Classic assumption: “I’m querying the one-and-only database”
    Scaling reads and writes introduce different problems
  • 14. Replication - The 1st Breakdown of Consistency
  • 15. Limitations of a Single Master
    Replication can provide arbitrary read scalability
    Subject to coping with read-consistency issues
    Sooner or later, writing becomes a bottleneck
    Physical limitations (seek time)
    Throughput of a single I/O subsystem
  • 16. Sharding
    Paritition the primary key space via hashing
    Set up a duplicate system for each shard
    The write-rate limitation now applies to each shard
    Joins or aggregation across shards are problematic
    Can the data be re-sharded on a live system?
    Can shards be re-balanced on a live system?
  • 17. Multi-Site Operation
    Failure of a single-master system’s master
    A new master can be chosen
    But what if there’s a network partition?
    Can the application continue in read-only mode?
  • 18. Dynamo
    Now a generic term for multi-master systems
    Writes can occur to any node
    The same record can be updated on different nodes by different clients
    All writes are replicated everywhere
  • 19. Dynamo – the 2nd breakdown of consistency
    Collisions can occur
    Who wins?
    A collision resolution strategy is required
    Vector clocks
    Application access must be aware of this
  • 20. The Commercial Landscape
  • 21. Key Client Implementation Concerns
    Monotonic reads
    Can my reads go back in time?
    If I issue a query immediately after an insert or update, will I see my changes?
    Uninterrupted writes
    Am I always guaranteed the ability to write?
    Conflict Resolution
    Do I need to have a conflict resolution strategy?
  • 22. Using a Single-Master System
    What does the intermediate agent or system do for…
    Monotonic reads?
    Uninterrupted writes?
    Conflict Resolution?
  • 23. Using a Multi-Master System
    What does the intermediate agent or system do for…
    Monotonic reads?
    Uninterrupted writes?
    Conflict Resolution?
  • 24. Where MongoDB fits in the non-relational world
    MongoDB’s architecture and features
    Some real-world users
  • 25. MongoDB is a Document Store
    MongoDB stores JSON objects as BSON
    { LastName: ‘Flintstone’, FirstName: ‘Fred’, …}
    Secondary Indexes
    db.collection.ensureIndex({LastName : 1, FirstName : 1});
    Simple QBE-like query syntax
    db.collection.find({LastName : ‘Flintstone’});
    db.collection.find({LastName : { $gte : ‘Flintstone’});
  • 26. MongoDB – Advanced Queries
    Geo-spatial queries
    Create a geo index
    Find points near a given point, sorted by radial distance
    Can be planar or spherical
    Find points within a certain radial distance, within a bounding box, or a polygon
    Built-in Map-Reduce
    The caller provides map and reduce functions written in JavaScript
  • 27. MongoDB is a Single-Master System
    A database is served by members of a “replica set”
    The system elects a primary (master)
    Failure of the master is detected, and a new master is elected
    Application writes get an error if there is no quorum to elect a new master
    Reads continue to be fulfilled
  • 28. MongoDB Replica Set
  • 29. MongoDB Supports Sharding
    A collection can be sharded
    Each shard is served by its own replica set
    New shards (each a replica set) can be added at any time
    Shard key ranges are automatically balanced
  • 30. MongoDB – Sharded Deployment
  • 31. MongoDB Storage Management
    Data is kept in memory-mapped files
    Servers should have a lot of memory
    Files are allocated as needed
    Documents in a collection are kept on a list using a geographical addressing scheme
    Indexes (B*-trees) point to documents using geographical addresses
  • 32. MongoDB Server Management
    Replica set members are aware of each other
    A majority of votes is required to elect a new primary
    Members can be assigned priorities to affect the election
    e.g., an “invisible” replica can be created with zero priority for backup purposes
  • 33. MongoDB Access
    Drivers are available in many languages
    10gen supported
    C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala
    Community supported
    Clojure, ColdFusion, F#, Go, Groovy, Lua, R
  • 34. MongoDB Availability
    License: AGPL
    License: Apache
  • 35. MongoDB – Hosted Services
    MongoHQ, Mongo Machine, MongoLab
    RESTful access to collections
  • 36. MongoDB Support
    Paid Support
    10gen Hosted Monitoring
    Consulting, training
    Free Support
  • 37. MongoDB Users
  • 38.
  • 39. Mini-demo/tutorial