• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
MongoDB:  An Introduction - june-2011

MongoDB: An Introduction - june-2011



Presentation to the SVForum Architecture and Platform SIG meetup http://www.meetup.com/SVForum-SoftwareArchitecture-PlatformSIG/events/20823081/

Presentation to the SVForum Architecture and Platform SIG meetup http://www.meetup.com/SVForum-SoftwareArchitecture-PlatformSIG/events/20823081/



Total Views
Views on SlideShare
Embed Views



16 Embeds 1,053

http://nosql.mypopescu.com 753
http://kr.blog.kijunseo.com 114
http://kijun.co 104
url_unknown 24
http://kijun.tumblr.com 19
http://coderwall.com 19
http://paper.li 6
http://www.slideshare.net 4
http://translate.googleusercontent.com 2
http://www.linkedin.com 2
http://feeds.feedburner.com 1
http://webcache.googleusercontent.com 1
http://feedproxy.google.com 1
http://bummerware.tumblr.com 1
http://www.hanrss.com 1
http://news.google.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • I've uploaded a newer version of this presentation here: http://www.slideshare.net/cwestin63/mongodb-introjuly2011 . Not a lot of changes, just a little bit about using nested documents to avoid the need for joins and transactions.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    MongoDB:  An Introduction - june-2011 MongoDB: An Introduction - june-2011 Presentation Transcript

    • MongoDB: An Introduction
      Chris Westin
      Software Engineer, 10gen
      © Copyright 2010 10gen Inc.
    • Outline
      The Whys of Non-Relational Databases
      Vocabulary of the Non-Relational World
    • Why did non-relational databases arise?
      Problems with relational databases in the web world
      The Whys of Non-Relational Databases
    • Problem - Schema Evolution
      Applications are evolving all the time
      Applications need new fields
      Applications need new indexes
      Data is growing – sometimes very fast
      Users need to be able to alter their schemas without making their data unavailable
      The web world expects 24x7 service
      RDBMSs can have a hard time doing this
    • Problem – Write Rates
      Replication is a solution for high read loads
      Sooner or later, writing becomes a bottleneck
      Sharding – partitioning a logical database across multiple database instances
      Joins and aggregation become a problem
      Distributed transactions are too slow for the web
      Manual management of shards
      Choosing shard partitions
      Rebalancing shards
    • An introduction to terminology you’re going to be seeing a lot
      Vocabulary of the Non-Relational World
    • Data Models
      A non-relational database’s data model determines the kinds of items it can contain and how they can be retrieved
      What can the system store, and what does it know about what it contains?
      The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition
      What kind of queries can you do?
      SQL is a manifestation of the kinds of queries that fall out of relational algebra
    • Non-Relational Data Models
      Key-value stores
      Document stores
      Column-oriented databases
      Graph databases
    • Key-Value Stores
      A mapping from a key to a value
      The store doesn’t know anything about the the key or value
      The store doesn’t know anything about the insides of the value
      Set, get, or delete a key-value pair
    • Document Stores
      The store is a container for documents
      Documents are made up of named fields
      Fields may or may not have type definitions
      e.g. XSDs for XML stores, vs. schema-less JSON stores
      Can create “secondary indexes”
      These provide the ability to query on any document field(s)
      Insert and delete documents
      Update fields within documents
    • Column-Oriented Stores
      Like a relational store, but flipped around: all data for a column is kept together
      An index provides a means to get a column value for a record
      Get, insert, delete records; updating fields
      Streaming column data in and out of Hadoop
    • Graph Databases
      Stores vertex-to-vertex edges
      Getting and setting edges
      Sometimes possible to annotate vertices or edges
      Query languages support finding paths between vertices, subject to various constraints
    • Consistency Models
      Relational databases support transactions
      Can only see committed changes
      Commit/abort span multiple changes
      Read-only transaction flavors
      Read committed, repeatable read, etc
      Classic assumption: “I’m querying the one-and-only database”
      Scaling reads and writes introduce different problems
    • Replication - The 1st Breakdown of Consistency
    • Limitations of a Single Master
      Replication can provide arbitrary read scalability
      Subject to coping with read-consistency issues
      Sooner or later, writing becomes a bottleneck
      Physical limitations (seek time)
      Throughput of a single I/O subsystem
    • Sharding
      Paritition the primary key space via hashing
      Set up a duplicate system for each shard
      The write-rate limitation now applies to each shard
      Joins or aggregation across shards are problematic
      Can the data be re-sharded on a live system?
      Can shards be re-balanced on a live system?
    • Multi-Site Operation
      Failure of a single-master system’s master
      A new master can be chosen
      But what if there’s a network partition?
      Can the application continue in read-only mode?
    • Dynamo
      Now a generic term for multi-master systems
      Writes can occur to any node
      The same record can be updated on different nodes by different clients
      All writes are replicated everywhere
    • Dynamo – the 2nd breakdown of consistency
      Collisions can occur
      Who wins?
      A collision resolution strategy is required
      Vector clocks
      Application access must be aware of this
    • The Commercial Landscape
    • Key Client Implementation Concerns
      Monotonic reads
      Can my reads go back in time?
      If I issue a query immediately after an insert or update, will I see my changes?
      Uninterrupted writes
      Am I always guaranteed the ability to write?
      Conflict Resolution
      Do I need to have a conflict resolution strategy?
    • Using a Single-Master System
      What does the intermediate agent or system do for…
      Monotonic reads?
      Uninterrupted writes?
      Conflict Resolution?
    • Using a Multi-Master System
      What does the intermediate agent or system do for…
      Monotonic reads?
      Uninterrupted writes?
      Conflict Resolution?
    • Where MongoDB fits in the non-relational world
      MongoDB’s architecture and features
      Some real-world users
    • MongoDB is a Document Store
      MongoDB stores JSON objects as BSON
      { LastName: ‘Flintstone’, FirstName: ‘Fred’, …}
      Secondary Indexes
      db.collection.ensureIndex({LastName : 1, FirstName : 1});
      Simple QBE-like query syntax
      db.collection.find({LastName : ‘Flintstone’});
      db.collection.find({LastName : { $gte : ‘Flintstone’});
    • MongoDB – Advanced Queries
      Geo-spatial queries
      Create a geo index
      Find points near a given point, sorted by radial distance
      Can be planar or spherical
      Find points within a certain radial distance, within a bounding box, or a polygon
      Built-in Map-Reduce
      The caller provides map and reduce functions written in JavaScript
    • MongoDB is a Single-Master System
      A database is served by members of a “replica set”
      The system elects a primary (master)
      Failure of the master is detected, and a new master is elected
      Application writes get an error if there is no quorum to elect a new master
      Reads continue to be fulfilled
    • MongoDB Replica Set
    • MongoDB Supports Sharding
      A collection can be sharded
      Each shard is served by its own replica set
      New shards (each a replica set) can be added at any time
      Shard key ranges are automatically balanced
    • MongoDB – Sharded Deployment
    • MongoDB Storage Management
      Data is kept in memory-mapped files
      Servers should have a lot of memory
      Files are allocated as needed
      Documents in a collection are kept on a list using a geographical addressing scheme
      Indexes (B*-trees) point to documents using geographical addresses
    • MongoDB Server Management
      Replica set members are aware of each other
      A majority of votes is required to elect a new primary
      Members can be assigned priorities to affect the election
      e.g., an “invisible” replica can be created with zero priority for backup purposes
    • MongoDB Access
      Drivers are available in many languages
      10gen supported
      C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala
      Community supported
      Clojure, ColdFusion, F#, Go, Groovy, Lua, R
    • MongoDB Availability
      License: AGPL
      License: Apache
    • MongoDB – Hosted Services
      MongoHQ, Mongo Machine, MongoLab
      RESTful access to collections
    • MongoDB Support
      Paid Support
      10gen Hosted Monitoring
      Consulting, training
      Free Support
    • MongoDB Users
      craigslist: http://www.10gen.com/presentation/mongosf2011/craigslist
      bit.ly: http://blip.tv/mongodb/bit-ly-user-history-auto-sharded-3723147
      shutterfly: http://www.10gen.com/presentation/mongosv2010/shutterfly
    • Mini-demo/tutorial