MongoDB:  An IntroductionChris WestinSoftware Engineer, 10gen© Copyright 2010 10gen Inc.
OutlineThe Whys of Non-Relational DatabasesVocabulary of the Non-Relational WorldMongoDB
Why did non-relational databases arise?Problems with relational databases in the web worldThe Whys of Non-Relational Databases
Problem - Schema EvolutionApplications are evolving all the timeApplications need new fieldsApplications need new indexesData is growing – sometimes very fastUsers need to be able to alter their schemas without making their data unavailableThe web world expects 24x7 serviceRDBMSs can have a hard time doing this
Problem – Write RatesReplication is a solution for high read loadsSooner or later, writing becomes a bottleneckSharding – partitioning a logical database across multiple database instancesJoins and aggregation become a problemDistributed transactions are too slow for the webManual management of shardsChoosing shard partitionsRebalancing shards
An introduction to terminology you’re going to be seeing a lotVocabulary of the Non-Relational World
Data ModelsA non-relational database’s data model determines the kinds of items it can contain and how they can be retrievedWhat can the system store, and what does it know about what it contains?The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definitionWhat kind of queries can you do?SQL is a manifestation of the kinds of queries that fall out of relational algebra
Non-Relational Data ModelsKey-value storesDocument storesColumn-oriented databasesGraph databases
Key-Value StoresA mapping from a key to a valueThe store doesn’t know anything about the the key or valueThe store doesn’t know anything about the insides of the valueOperationsSet, get, or delete a key-value pair
Document StoresThe store is a container for documentsDocuments are made up of named fieldsFields may or may not have type definitionse.g. XSDs for XML stores, vs. schema-less JSON storesCan create “secondary indexes”These provide the ability to query on any document field(s)Operations:Insert and delete documentsUpdate fields within documents
Column-Oriented StoresLike a relational store, but flipped around: all data for a column is kept togetherAn index provides a means to get a column value for a recordOperations:Get, insert, delete records; updating fieldsStreaming column data in and out of Hadoop
Graph DatabasesStores vertex-to-vertex edgesOperations:Getting and setting edgesSometimes possible to annotate vertices or edgesQuery languages support finding paths between vertices, subject to various constraints
Consistency ModelsRelational databases support transactionsCan only see committed changesCommit/abort span multiple changesRead-only transaction flavorsRead committed, repeatable read, etcClassic assumption: “I’m querying the one-and-only database”Scaling reads and writes introduce different problems
Replication - The 1st Breakdown of Consistency
Limitations of a Single MasterReplication can provide arbitrary read scalabilitySubject to coping with read-consistency issuesSooner or later, writing becomes a bottleneckPhysical limitations (seek time)Throughput of a single I/O subsystem
ShardingParitition the primary key space via hashingSet up a duplicate system for each shardThe write-rate limitation now applies to each shardJoins or aggregation across shards are problematicCan the data be re-sharded on a live system?Can shards be re-balanced on a live system?
Multi-Site OperationFailure of a single-master system’s masterA new master can be chosenBut what if there’s a network partition?Can the application continue in read-only mode?
DynamoNow a generic term for multi-master systemsWrites can occur to any nodeThe same record can be updated on different nodes by different clientsAll writes are replicated everywhere
Dynamo – the 2nd breakdown of consistencyCollisions can occurWho wins?A collision resolution strategy is requiredVector clockshttp://en.wikipedia.org/wiki/Vector_clockApplication access must be aware of this
The Commercial OSS Landscape
Key Client Implementation ConcernsMonotonic readsCan my reads go back in time?Read-your-own-writesIf I issue a query immediately after an insert or update, will I see my changes?Uninterrupted writesAm I always guaranteed the ability to write?Conflict ResolutionDo I need to have a conflict resolution strategy?
Using a Single-Master SystemWhat does the intermediate agent or system do for…Monotonic reads?Read-your-own-writes?Uninterrupted writes?Conflict Resolution?
Using a Multi-Master SystemWhat does the intermediate agent or system do for…Monotonic reads?Read-your-own-writes?Uninterrupted writes?Conflict Resolution?
Where MongoDB fits in the non-relational worldMongoDB’s architecture and featuresSome real-world usersMongoDB
MongoDB is a Document StoreMongoDB stores JSON objects as BSON{ LastName: ‘Flintstone’, FirstName: ‘Fred’, …}Secondary Indexesdb.collection.ensureIndex({LastName : 1, FirstName : 1});Simple QBE-like query syntaxdb.collection.find({LastName : ‘Flintstone’});db.collection.find({LastName : { $gte : ‘Flintstone’});
No Joins or TransactionsNested documents….Can often be used to avoid joinsCan often be used to regain atomicity
MongoDB – Advanced QueriesGeo-spatial queriesCreate a geo indexFind points near a given point, sorted by radial distanceCan be planar or sphericalFind points within a certain radial distance, within a bounding box, or a polygonBuilt-in Map-ReduceThe caller provides map and reduce functions written in JavaScript
MongoDB is a Single-Master SystemA database is served by members of a “replica set”The system elects a primary (master)Failure of the master is detected, and a new master is electedApplication writes get an error if there is no quorum to elect a new masterReads continue to be fulfilled
MongoDB Replica Set
MongoDB Supports ShardingA collection can be shardedEach shard is served by its own replica setNew shards (each a replica set) can be added at any timeShard key ranges are automatically balanced
MongoDB – Sharded Deployment
MongoDB Storage ManagementData is kept in memory-mapped filesServers should have a lot of memoryFiles are allocated as neededDocuments in a collection are kept on a list using a geographical addressing schemeIndexes (B*-trees) point to documents using geographical addresses
MongoDB Server ManagementReplica set members are aware of each otherA majority of votes is required to elect a new primaryMembers can be assigned priorities to affect the electione.g., an “invisible” replica can be created with zero priority for backup purposes
MongoDB AccessDrivers are available in many languages10gen supportedC, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, ScalaCommunity supportedClojure, ColdFusion, F#, Go, Groovy, Lua, Rhttp://www.mongodb.org/display/DOCS/Overview+-+Writing+Drivers+and+Tools
MongoDB AvailabilitySourcehttps://github.com/mongodb/mongoServerLicense:  AGPLhttp://www.mongodb.org/downloadsDriversLicense:  Apachehttp://www.mongodb.org/display/DOCS/Drivers
MongoDB – Hosted Serviceshttp://www.mongodb.org/display/DOCS/Hosting+CenterMongoHQ, Mongo Machine, MongoLabRESTful access to collections
MongoDB SupportPaid Supporthttp://www.10gen.com/client-portal10gen Hosted MonitoringConsulting, trainingFree Supporthttp://groups.google.com/group/mongodb-userhttp://stackoverflow.com/questions/tagged/mongodb
MongoDB Usershttp://www.10gen.com/customershttp://www.10gen.com/presentationscraigslist: http://www.10gen.com/presentation/mongosf2011/craigslistbit.ly: http://blip.tv/mongodb/bit-ly-user-history-auto-sharded-3723147shutterfly: http://www.10gen.com/presentation/mongosv2010/shutterfly
Mini-demo/tutorialhttp://try.mongodb.org/

MongoDB: An Introduction - July 2011

  • 1.
    MongoDB: AnIntroductionChris WestinSoftware Engineer, 10gen© Copyright 2010 10gen Inc.
  • 2.
    OutlineThe Whys ofNon-Relational DatabasesVocabulary of the Non-Relational WorldMongoDB
  • 3.
    Why did non-relationaldatabases arise?Problems with relational databases in the web worldThe Whys of Non-Relational Databases
  • 4.
    Problem - SchemaEvolutionApplications are evolving all the timeApplications need new fieldsApplications need new indexesData is growing – sometimes very fastUsers need to be able to alter their schemas without making their data unavailableThe web world expects 24x7 serviceRDBMSs can have a hard time doing this
  • 5.
    Problem – WriteRatesReplication is a solution for high read loadsSooner or later, writing becomes a bottleneckSharding – partitioning a logical database across multiple database instancesJoins and aggregation become a problemDistributed transactions are too slow for the webManual management of shardsChoosing shard partitionsRebalancing shards
  • 6.
    An introduction toterminology you’re going to be seeing a lotVocabulary of the Non-Relational World
  • 7.
    Data ModelsA non-relationaldatabase’s data model determines the kinds of items it can contain and how they can be retrievedWhat can the system store, and what does it know about what it contains?The relational data model is about storing records made up of named, scalar-valued fields, as specified by a schema, or type definitionWhat kind of queries can you do?SQL is a manifestation of the kinds of queries that fall out of relational algebra
  • 8.
    Non-Relational Data ModelsKey-valuestoresDocument storesColumn-oriented databasesGraph databases
  • 9.
    Key-Value StoresA mappingfrom a key to a valueThe store doesn’t know anything about the the key or valueThe store doesn’t know anything about the insides of the valueOperationsSet, get, or delete a key-value pair
  • 10.
    Document StoresThe storeis a container for documentsDocuments are made up of named fieldsFields may or may not have type definitionse.g. XSDs for XML stores, vs. schema-less JSON storesCan create “secondary indexes”These provide the ability to query on any document field(s)Operations:Insert and delete documentsUpdate fields within documents
  • 11.
    Column-Oriented StoresLike arelational store, but flipped around: all data for a column is kept togetherAn index provides a means to get a column value for a recordOperations:Get, insert, delete records; updating fieldsStreaming column data in and out of Hadoop
  • 12.
    Graph DatabasesStores vertex-to-vertexedgesOperations:Getting and setting edgesSometimes possible to annotate vertices or edgesQuery languages support finding paths between vertices, subject to various constraints
  • 13.
    Consistency ModelsRelational databasessupport transactionsCan only see committed changesCommit/abort span multiple changesRead-only transaction flavorsRead committed, repeatable read, etcClassic assumption: “I’m querying the one-and-only database”Scaling reads and writes introduce different problems
  • 14.
    Replication - The1st Breakdown of Consistency
  • 15.
    Limitations of aSingle MasterReplication can provide arbitrary read scalabilitySubject to coping with read-consistency issuesSooner or later, writing becomes a bottleneckPhysical limitations (seek time)Throughput of a single I/O subsystem
  • 16.
    ShardingParitition the primarykey space via hashingSet up a duplicate system for each shardThe write-rate limitation now applies to each shardJoins or aggregation across shards are problematicCan the data be re-sharded on a live system?Can shards be re-balanced on a live system?
  • 17.
    Multi-Site OperationFailure ofa single-master system’s masterA new master can be chosenBut what if there’s a network partition?Can the application continue in read-only mode?
  • 18.
    DynamoNow a genericterm for multi-master systemsWrites can occur to any nodeThe same record can be updated on different nodes by different clientsAll writes are replicated everywhere
  • 19.
    Dynamo – the2nd breakdown of consistencyCollisions can occurWho wins?A collision resolution strategy is requiredVector clockshttp://en.wikipedia.org/wiki/Vector_clockApplication access must be aware of this
  • 20.
  • 21.
    Key Client ImplementationConcernsMonotonic readsCan my reads go back in time?Read-your-own-writesIf I issue a query immediately after an insert or update, will I see my changes?Uninterrupted writesAm I always guaranteed the ability to write?Conflict ResolutionDo I need to have a conflict resolution strategy?
  • 22.
    Using a Single-MasterSystemWhat does the intermediate agent or system do for…Monotonic reads?Read-your-own-writes?Uninterrupted writes?Conflict Resolution?
  • 23.
    Using a Multi-MasterSystemWhat does the intermediate agent or system do for…Monotonic reads?Read-your-own-writes?Uninterrupted writes?Conflict Resolution?
  • 24.
    Where MongoDB fitsin the non-relational worldMongoDB’s architecture and featuresSome real-world usersMongoDB
  • 25.
    MongoDB is aDocument StoreMongoDB stores JSON objects as BSON{ LastName: ‘Flintstone’, FirstName: ‘Fred’, …}Secondary Indexesdb.collection.ensureIndex({LastName : 1, FirstName : 1});Simple QBE-like query syntaxdb.collection.find({LastName : ‘Flintstone’});db.collection.find({LastName : { $gte : ‘Flintstone’});
  • 26.
    No Joins orTransactionsNested documents….Can often be used to avoid joinsCan often be used to regain atomicity
  • 27.
    MongoDB – AdvancedQueriesGeo-spatial queriesCreate a geo indexFind points near a given point, sorted by radial distanceCan be planar or sphericalFind points within a certain radial distance, within a bounding box, or a polygonBuilt-in Map-ReduceThe caller provides map and reduce functions written in JavaScript
  • 28.
    MongoDB is aSingle-Master SystemA database is served by members of a “replica set”The system elects a primary (master)Failure of the master is detected, and a new master is electedApplication writes get an error if there is no quorum to elect a new masterReads continue to be fulfilled
  • 29.
  • 30.
    MongoDB Supports ShardingAcollection can be shardedEach shard is served by its own replica setNew shards (each a replica set) can be added at any timeShard key ranges are automatically balanced
  • 31.
  • 32.
    MongoDB Storage ManagementDatais kept in memory-mapped filesServers should have a lot of memoryFiles are allocated as neededDocuments in a collection are kept on a list using a geographical addressing schemeIndexes (B*-trees) point to documents using geographical addresses
  • 33.
    MongoDB Server ManagementReplicaset members are aware of each otherA majority of votes is required to elect a new primaryMembers can be assigned priorities to affect the electione.g., an “invisible” replica can be created with zero priority for backup purposes
  • 34.
    MongoDB AccessDrivers areavailable in many languages10gen supportedC, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, ScalaCommunity supportedClojure, ColdFusion, F#, Go, Groovy, Lua, Rhttp://www.mongodb.org/display/DOCS/Overview+-+Writing+Drivers+and+Tools
  • 35.
    MongoDB AvailabilitySourcehttps://github.com/mongodb/mongoServerLicense: AGPLhttp://www.mongodb.org/downloadsDriversLicense: Apachehttp://www.mongodb.org/display/DOCS/Drivers
  • 36.
    MongoDB – HostedServiceshttp://www.mongodb.org/display/DOCS/Hosting+CenterMongoHQ, Mongo Machine, MongoLabRESTful access to collections
  • 37.
    MongoDB SupportPaid Supporthttp://www.10gen.com/client-portal10genHosted MonitoringConsulting, trainingFree Supporthttp://groups.google.com/group/mongodb-userhttp://stackoverflow.com/questions/tagged/mongodb
  • 38.
    MongoDB Usershttp://www.10gen.com/customershttp://www.10gen.com/presentationscraigslist: http://www.10gen.com/presentation/mongosf2011/craigslistbit.ly:http://blip.tv/mongodb/bit-ly-user-history-auto-sharded-3723147shutterfly: http://www.10gen.com/presentation/mongosv2010/shutterfly
  • 40.