MongoDB:  An Introduction<br />Chris Westin<br />Software Engineer, 10gen<br />© Copyright 2010 10gen Inc.<br />
Outline<br />The Whys of Non-Relational Databases<br />Vocabulary of the Non-Relational World<br />MongoDB<br />
Why did non-relational databases arise?<br />Problems with relational databases in the web world<br />The Whys of Non-Rela...
Problem - Schema Evolution<br />Applications are evolving all the time<br />Applications need new fields<br />Applications...
Problem – Write Rates<br />Replication is a solution for high read loads<br />Sooner or later, writing becomes a bottlenec...
An introduction to terminology you’re going to be seeing a lot<br />Vocabulary of the Non-Relational World<br />
Data Models<br />A non-relational database’s data model determines the kinds of items it can contain and how they can be r...
Non-Relational Data Models<br />Key-value stores<br />Document stores<br />Column-oriented databases<br />Graph databases<...
Key-Value Stores<br />A mapping from a key to a value<br />The store doesn’t know anything about the the key or value<br /...
Document Stores<br />The store is a container for documents<br />Documents are made up of named fields<br />Fields may or ...
Column-Oriented Stores<br />Like a relational store, but flipped around: all data for a column is kept together<br />An in...
Graph Databases<br />Stores vertex-to-vertex edges<br />Operations:<br />Getting and setting edges<br />Sometimes possible...
Consistency Models<br />Relational databases support transactions<br />Can only see committed changes<br />Commit/abort sp...
Replication - The 1st Breakdown of Consistency<br />
Limitations of a Single Master<br />Replication can provide arbitrary read scalability<br />Subject to coping with read-co...
Sharding<br />Paritition the primary key space via hashing<br />Set up a duplicate system for each shard<br />The write-ra...
Multi-Site Operation<br />Failure of a single-master system’s master<br />A new master can be chosen<br />But what if ther...
Dynamo<br />Now a generic term for multi-master systems<br />Writes can occur to any node<br />The same record can be upda...
Dynamo – the 2nd breakdown of consistency<br />Collisions can occur<br />Who wins?<br />A collision resolution strategy is...
The Commercial Landscape<br />
Key Client Implementation Concerns<br />Monotonic reads<br />Can my reads go back in time?<br />Read-your-own-writes<br />...
Using a Single-Master System<br />What does the intermediate agent or system do for…<br />Monotonic reads?<br />Read-your-...
Using a Multi-Master System<br />What does the intermediate agent or system do for…<br />Monotonic reads?<br />Read-your-o...
Where MongoDB fits in the non-relational world<br />MongoDB’s architecture and features<br />Some real-world users<br />Mo...
MongoDB is a Document Store<br />MongoDB stores JSON objects as BSON<br />{ LastName: ‘Flintstone’, FirstName: ‘Fred’, …}<...
MongoDB – Advanced Queries<br />Geo-spatial queries<br />Create a geo index<br />Find points near a given point, sorted by...
MongoDB is a Single-Master System<br />A database is served by members of a “replica set”<br />The system elects a primary...
MongoDB Replica Set<br />
MongoDB Supports Sharding<br />A collection can be sharded<br />Each shard is served by its own replica set<br />New shard...
MongoDB – Sharded Deployment<br />
MongoDB Storage Management<br />Data is kept in memory-mapped files<br />Servers should have a lot of memory<br />Files ar...
MongoDB Server Management<br />Replica set members are aware of each other<br />A majority of votes is required to elect a...
MongoDB Access<br />Drivers are available in many languages<br />10gen supported<br />C, C# (.Net), C++, Erlang, Haskell, ...
MongoDB Availability<br />Source<br />https://github.com/mongodb/mongo<br />Server<br />License:  AGPL<br />http://www.mon...
MongoDB – Hosted Services<br />http://www.mongodb.org/display/DOCS/Hosting+Center<br />MongoHQ, Mongo Machine, MongoLab<br...
MongoDB Support<br />Paid Support<br />http://www.10gen.com/client-portal<br />10gen Hosted Monitoring<br />Consulting, tr...
MongoDB Users<br />http://www.10gen.com/customers<br />http://www.10gen.com/presentations<br />craigslist: http://www.10ge...
Mini-demo/tutorial<br />http://try.mongodb.org/<br />
Upcoming SlideShare
Loading in...5
×

MongoDB: An Introduction - june-2011

4,716

Published on

Presentation to the SVForum Architecture and Platform SIG meetup http://www.meetup.com/SVForum-SoftwareArchitecture-PlatformSIG/events/20823081/

Published in: Technology
1 Comment
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,716
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
1
Likes
10
Embeds 0
No embeds

No notes for slide

MongoDB: An Introduction - june-2011

  1. 1. MongoDB: An Introduction<br />Chris Westin<br />Software Engineer, 10gen<br />© Copyright 2010 10gen Inc.<br />
  2. 2. Outline<br />The Whys of Non-Relational Databases<br />Vocabulary of the Non-Relational World<br />MongoDB<br />
  3. 3. Why did non-relational databases arise?<br />Problems with relational databases in the web world<br />The Whys of Non-Relational Databases<br />
  4. 4. Problem - Schema Evolution<br />Applications are evolving all the time<br />Applications need new fields<br />Applications need new indexes<br />Data is growing – sometimes very fast<br />Users need to be able to alter their schemas without making their data unavailable<br />The web world expects 24x7 service<br />RDBMSs can have a hard time doing this<br />
  5. 5. Problem – Write Rates<br />Replication is a solution for high read loads<br />Sooner or later, writing becomes a bottleneck<br />Sharding – partitioning a logical database across multiple database instances<br />Joins and aggregation become a problem<br />Distributed transactions are too slow for the web<br />Manual management of shards<br />Choosing shard partitions<br />Rebalancing shards<br />
  6. 6. An introduction to terminology you’re going to be seeing a lot<br />Vocabulary of the Non-Relational World<br />
  7. 7. Data Models<br />A non-relational database’s data model determines the kinds of items it can contain and how they can be retrieved<br />What can the system store, and what does it know about what it contains?<br />The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definition<br />What kind of queries can you do?<br />SQL is a manifestation of the kinds of queries that fall out of relational algebra<br />
  8. 8. Non-Relational Data Models<br />Key-value stores<br />Document stores<br />Column-oriented databases<br />Graph databases<br />
  9. 9. Key-Value Stores<br />A mapping from a key to a value<br />The store doesn’t know anything about the the key or value<br />The store doesn’t know anything about the insides of the value<br />Operations<br />Set, get, or delete a key-value pair<br />
  10. 10. Document Stores<br />The store is a container for documents<br />Documents are made up of named fields<br />Fields may or may not have type definitions<br />e.g. XSDs for XML stores, vs. schema-less JSON stores<br />Can create “secondary indexes”<br />These provide the ability to query on any document field(s)<br />Operations:<br />Insert and delete documents<br />Update fields within documents<br />
  11. 11. Column-Oriented Stores<br />Like a relational store, but flipped around: all data for a column is kept together<br />An index provides a means to get a column value for a record<br />Operations:<br />Get, insert, delete records; updating fields<br />Streaming column data in and out of Hadoop<br />
  12. 12. Graph Databases<br />Stores vertex-to-vertex edges<br />Operations:<br />Getting and setting edges<br />Sometimes possible to annotate vertices or edges<br />Query languages support finding paths between vertices, subject to various constraints<br />
  13. 13. Consistency Models<br />Relational databases support transactions<br />Can only see committed changes<br />Commit/abort span multiple changes<br />Read-only transaction flavors<br />Read committed, repeatable read, etc<br />Classic assumption: “I’m querying the one-and-only database”<br />Scaling reads and writes introduce different problems<br />
  14. 14. Replication - The 1st Breakdown of Consistency<br />
  15. 15. Limitations of a Single Master<br />Replication can provide arbitrary read scalability<br />Subject to coping with read-consistency issues<br />Sooner or later, writing becomes a bottleneck<br />Physical limitations (seek time)<br />Throughput of a single I/O subsystem<br />
  16. 16. Sharding<br />Paritition the primary key space via hashing<br />Set up a duplicate system for each shard<br />The write-rate limitation now applies to each shard<br />Joins or aggregation across shards are problematic<br />Can the data be re-sharded on a live system?<br />Can shards be re-balanced on a live system?<br />
  17. 17. Multi-Site Operation<br />Failure of a single-master system’s master<br />A new master can be chosen<br />But what if there’s a network partition?<br />Can the application continue in read-only mode?<br />
  18. 18. Dynamo<br />Now a generic term for multi-master systems<br />Writes can occur to any node<br />The same record can be updated on different nodes by different clients<br />All writes are replicated everywhere<br />
  19. 19. Dynamo – the 2nd breakdown of consistency<br />Collisions can occur<br />Who wins?<br />A collision resolution strategy is required<br />Vector clocks<br />http://en.wikipedia.org/wiki/Vector_clock<br />Application access must be aware of this<br />
  20. 20. The Commercial Landscape<br />
  21. 21. Key Client Implementation Concerns<br />Monotonic reads<br />Can my reads go back in time?<br />Read-your-own-writes<br />If I issue a query immediately after an insert or update, will I see my changes?<br />Uninterrupted writes<br />Am I always guaranteed the ability to write?<br />Conflict Resolution<br />Do I need to have a conflict resolution strategy?<br />
  22. 22. Using a Single-Master System<br />What does the intermediate agent or system do for…<br />Monotonic reads?<br />Read-your-own-writes?<br />Uninterrupted writes?<br />Conflict Resolution?<br />
  23. 23. Using a Multi-Master System<br />What does the intermediate agent or system do for…<br />Monotonic reads?<br />Read-your-own-writes?<br />Uninterrupted writes?<br />Conflict Resolution?<br />
  24. 24. Where MongoDB fits in the non-relational world<br />MongoDB’s architecture and features<br />Some real-world users<br />MongoDB<br />
  25. 25. MongoDB is a Document Store<br />MongoDB stores JSON objects as BSON<br />{ LastName: ‘Flintstone’, FirstName: ‘Fred’, …}<br />Secondary Indexes<br />db.collection.ensureIndex({LastName : 1, FirstName : 1});<br />Simple QBE-like query syntax<br />db.collection.find({LastName : ‘Flintstone’});<br />db.collection.find({LastName : { $gte : ‘Flintstone’});<br />
  26. 26. MongoDB – Advanced Queries<br />Geo-spatial queries<br />Create a geo index<br />Find points near a given point, sorted by radial distance<br />Can be planar or spherical<br />Find points within a certain radial distance, within a bounding box, or a polygon<br />Built-in Map-Reduce<br />The caller provides map and reduce functions written in JavaScript<br />
  27. 27. MongoDB is a Single-Master System<br />A database is served by members of a “replica set”<br />The system elects a primary (master)<br />Failure of the master is detected, and a new master is elected<br />Application writes get an error if there is no quorum to elect a new master<br />Reads continue to be fulfilled<br />
  28. 28. MongoDB Replica Set<br />
  29. 29. MongoDB Supports Sharding<br />A collection can be sharded<br />Each shard is served by its own replica set<br />New shards (each a replica set) can be added at any time<br />Shard key ranges are automatically balanced<br />
  30. 30. MongoDB – Sharded Deployment<br />
  31. 31. MongoDB Storage Management<br />Data is kept in memory-mapped files<br />Servers should have a lot of memory<br />Files are allocated as needed<br />Documents in a collection are kept on a list using a geographical addressing scheme<br />Indexes (B*-trees) point to documents using geographical addresses<br />
  32. 32. MongoDB Server Management<br />Replica set members are aware of each other<br />A majority of votes is required to elect a new primary<br />Members can be assigned priorities to affect the election<br />e.g., an “invisible” replica can be created with zero priority for backup purposes<br />
  33. 33. MongoDB Access<br />Drivers are available in many languages<br />10gen supported<br />C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala<br />Community supported<br />Clojure, ColdFusion, F#, Go, Groovy, Lua, R<br />http://www.mongodb.org/display/DOCS/Overview+-+Writing+Drivers+and+Tools<br />
  34. 34. MongoDB Availability<br />Source<br />https://github.com/mongodb/mongo<br />Server<br />License: AGPL<br />http://www.mongodb.org/downloads<br />Drivers<br />License: Apache<br />http://www.mongodb.org/display/DOCS/Drivers<br />
  35. 35. MongoDB – Hosted Services<br />http://www.mongodb.org/display/DOCS/Hosting+Center<br />MongoHQ, Mongo Machine, MongoLab<br />RESTful access to collections<br />
  36. 36. MongoDB Support<br />Paid Support<br />http://www.10gen.com/client-portal<br />10gen Hosted Monitoring<br />Consulting, training<br />Free Support<br />http://groups.google.com/group/mongodb-user<br />http://stackoverflow.com/questions/tagged/mongodb<br />
  37. 37. MongoDB Users<br />http://www.10gen.com/customers<br />http://www.10gen.com/presentations<br />craigslist: http://www.10gen.com/presentation/mongosf2011/craigslist<br />bit.ly: http://blip.tv/mongodb/bit-ly-user-history-auto-sharded-3723147<br />shutterfly: http://www.10gen.com/presentation/mongosv2010/shutterfly<br />
  38. 38.
  39. 39. Mini-demo/tutorial<br />http://try.mongodb.org/<br />

×