MongoDB: An Introduction - July 2011


Published on

Introduction to MongoDB presented to the Dallas Big Data meetup

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MongoDB: An Introduction - July 2011

  1. 1. MongoDB: An Introduction<br />Chris Westin<br />Software Engineer, 10gen<br />© Copyright 2010 10gen Inc.<br />
  2. 2. Outline<br />The Whys of Non-Relational Databases<br />Vocabulary of the Non-Relational World<br />MongoDB<br />
  3. 3. Why did non-relational databases arise?<br />Problems with relational databases in the web world<br />The Whys of Non-Relational Databases<br />
  4. 4. Problem - Schema Evolution<br />Applications are evolving all the time<br />Applications need new fields<br />Applications need new indexes<br />Data is growing – sometimes very fast<br />Users need to be able to alter their schemas without making their data unavailable<br />The web world expects 24x7 service<br />RDBMSs can have a hard time doing this<br />
  5. 5. Problem – Write Rates<br />Replication is a solution for high read loads<br />Sooner or later, writing becomes a bottleneck<br />Sharding – partitioning a logical database across multiple database instances<br />Joins and aggregation become a problem<br />Distributed transactions are too slow for the web<br />Manual management of shards<br />Choosing shard partitions<br />Rebalancing shards<br />
  6. 6. An introduction to terminology you’re going to be seeing a lot<br />Vocabulary of the Non-Relational World<br />
  7. 7. Data Models<br />A non-relational database’s data model determines the kinds of items it can contain and how they can be retrieved<br />What can the system store, and what does it know about what it contains?<br />The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition<br />What kind of queries can you do?<br />SQL is a manifestation of the kinds of queries that fall out of relational algebra<br />
  8. 8. Non-Relational Data Models<br />Key-value stores<br />Document stores<br />Column-oriented databases<br />Graph databases<br />
  9. 9. Key-Value Stores<br />A mapping from a key to a value<br />The store doesn’t know anything about the the key or value<br />The store doesn’t know anything about the insides of the value<br />Operations<br />Set, get, or delete a key-value pair<br />
  10. 10. Document Stores<br />The store is a container for documents<br />Documents are made up of named fields<br />Fields may or may not have type definitions<br />e.g. XSDs for XML stores, vs. schema-less JSON stores<br />Can create “secondary indexes”<br />These provide the ability to query on any document field(s)<br />Operations:<br />Insert and delete documents<br />Update fields within documents<br />
  11. 11. Column-Oriented Stores<br />Like a relational store, but flipped around: all data for a column is kept together<br />An index provides a means to get a column value for a record<br />Operations:<br />Get, insert, delete records; updating fields<br />Streaming column data in and out of Hadoop<br />
  12. 12. Graph Databases<br />Stores vertex-to-vertex edges<br />Operations:<br />Getting and setting edges<br />Sometimes possible to annotate vertices or edges<br />Query languages support finding paths between vertices, subject to various constraints<br />
  13. 13. Consistency Models<br />Relational databases support transactions<br />Can only see committed changes<br />Commit/abort span multiple changes<br />Read-only transaction flavors<br />Read committed, repeatable read, etc<br />Classic assumption: “I’m querying the one-and-only database”<br />Scaling reads and writes introduce different problems<br />
  14. 14. Replication - The 1st Breakdown of Consistency<br />
  15. 15. Limitations of a Single Master<br />Replication can provide arbitrary read scalability<br />Subject to coping with read-consistency issues<br />Sooner or later, writing becomes a bottleneck<br />Physical limitations (seek time)<br />Throughput of a single I/O subsystem<br />
  16. 16. Sharding<br />Paritition the primary key space via hashing<br />Set up a duplicate system for each shard<br />The write-rate limitation now applies to each shard<br />Joins or aggregation across shards are problematic<br />Can the data be re-sharded on a live system?<br />Can shards be re-balanced on a live system?<br />
  17. 17. Multi-Site Operation<br />Failure of a single-master system’s master<br />A new master can be chosen<br />But what if there’s a network partition?<br />Can the application continue in read-only mode?<br />
  18. 18. Dynamo<br />Now a generic term for multi-master systems<br />Writes can occur to any node<br />The same record can be updated on different nodes by different clients<br />All writes are replicated everywhere<br />
  19. 19. Dynamo – the 2nd breakdown of consistency<br />Collisions can occur<br />Who wins?<br />A collision resolution strategy is required<br />Vector clocks<br /><br />Application access must be aware of this<br />
  20. 20. The Commercial OSS Landscape<br />
  21. 21. Key Client Implementation Concerns<br />Monotonic reads<br />Can my reads go back in time?<br />Read-your-own-writes<br />If I issue a query immediately after an insert or update, will I see my changes?<br />Uninterrupted writes<br />Am I always guaranteed the ability to write?<br />Conflict Resolution<br />Do I need to have a conflict resolution strategy?<br />
  22. 22. Using a Single-Master System<br />What does the intermediate agent or system do for…<br />Monotonic reads?<br />Read-your-own-writes?<br />Uninterrupted writes?<br />Conflict Resolution?<br />
  23. 23. Using a Multi-Master System<br />What does the intermediate agent or system do for…<br />Monotonic reads?<br />Read-your-own-writes?<br />Uninterrupted writes?<br />Conflict Resolution?<br />
  24. 24. Where MongoDB fits in the non-relational world<br />MongoDB’s architecture and features<br />Some real-world users<br />MongoDB<br />
  25. 25. MongoDB is a Document Store<br />MongoDB stores JSON objects as BSON<br />{ LastName: ‘Flintstone’, FirstName: ‘Fred’, …}<br />Secondary Indexes<br />db.collection.ensureIndex({LastName : 1, FirstName : 1});<br />Simple QBE-like query syntax<br />db.collection.find({LastName : ‘Flintstone’});<br />db.collection.find({LastName : { $gte : ‘Flintstone’});<br />
  26. 26. No Joins or Transactions<br />Nested documents….<br />Can often be used to avoid joins<br />Can often be used to regain atomicity<br />
  27. 27. MongoDB – Advanced Queries<br />Geo-spatial queries<br />Create a geo index<br />Find points near a given point, sorted by radial distance<br />Can be planar or spherical<br />Find points within a certain radial distance, within a bounding box, or a polygon<br />Built-in Map-Reduce<br />The caller provides map and reduce functions written in JavaScript<br />
  28. 28. MongoDB is a Single-Master System<br />A database is served by members of a “replica set”<br />The system elects a primary (master)<br />Failure of the master is detected, and a new master is elected<br />Application writes get an error if there is no quorum to elect a new master<br />Reads continue to be fulfilled<br />
  29. 29. MongoDB Replica Set<br />
  30. 30. MongoDB Supports Sharding<br />A collection can be sharded<br />Each shard is served by its own replica set<br />New shards (each a replica set) can be added at any time<br />Shard key ranges are automatically balanced<br />
  31. 31. MongoDB – Sharded Deployment<br />
  32. 32. MongoDB Storage Management<br />Data is kept in memory-mapped files<br />Servers should have a lot of memory<br />Files are allocated as needed<br />Documents in a collection are kept on a list using a geographical addressing scheme<br />Indexes (B*-trees) point to documents using geographical addresses<br />
  33. 33. MongoDB Server Management<br />Replica set members are aware of each other<br />A majority of votes is required to elect a new primary<br />Members can be assigned priorities to affect the election<br />e.g., an “invisible” replica can be created with zero priority for backup purposes<br />
  34. 34. MongoDB Access<br />Drivers are available in many languages<br />10gen supported<br />C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala<br />Community supported<br />Clojure, ColdFusion, F#, Go, Groovy, Lua, R<br /><br />
  35. 35. MongoDB Availability<br />Source<br /><br />Server<br />License: AGPL<br /><br />Drivers<br />License: Apache<br /><br />
  36. 36. MongoDB – Hosted Services<br /><br />MongoHQ, Mongo Machine, MongoLab<br />RESTful access to collections<br />
  37. 37. MongoDB Support<br />Paid Support<br /><br />10gen Hosted Monitoring<br />Consulting, training<br />Free Support<br /><br /><br />
  38. 38. MongoDB Users<br /><br /><br />craigslist:<br /><br />shutterfly:<br />
  39. 39.
  40. 40. Mini-demo/tutorial<br /><br />