Quick
Overview on
MongoDB
Eman Abdel Ghaffar
Agenda
1. Introduction
2. CRUD
3. Cursors
4. Indexing
5. Schema Design principles
6. Aggregation
7. Map-Reduce
Introduction - ACID
● Relational databases usually guarantee ACID properties related to how reliably
transactions (both reads and writes) are processed.
● The NoSQL movement trades off ACID compliance for other properties, such as
100% availability, and MongoDB is the leader in the field
● https://dzone.com/articles/how-acid-mongodb
Introduction - ACID
● Atomicity requires that each transaction is executed in its entirety, or fail
without any change being applied.
● Consistency requires that the database only passes from a valid state to the
next one, without intermediate points. Any data written to the database must
be valid according to all defined rules, including constraints, cascades, triggers.
● Isolation requires that if transactions are executed concurrently, the result is
equivalent to their serial execution.
● Durability means that the the result of a committed transaction is permanent,
even if the database crashes immediately or in the event of a power loss.
Introduction - CAP
● Consistency Every read receives the most recent write or an error.
● Availability Every request receives a (non-error) response – without
guarantee that it contains the most recent write.
● Partition tolerance The system continues to operate despite an arbitrary
number of messages being dropped (or delayed) by the network between
nodes.
“It is impossible for a distributed data store to simultaneously
provide more than two out of the following three guarantees”
Introduction - MongoDB
● MongoDB is written in C++, open source and licensed under the GNU -
AGPL .
● The core database server runs via an executable called mongod (
mongodb.exe on Windows)
● The MongoDB command shell is a JavaScript-based tool for
administering the database and manipulating data.
manual/reference/mongo-shell/
CRUD - Create
● Databases and collections are created only when documents are first inserted..
● Every MongoDB document requires an _id.
db.collection.insertOne()
db.collection.insertMany()
db.collection.insert()
CRUD - Read
db.collection.find(query, projection)
db.inventory.find( {} ) SELECT * FROM inventory
db.inventory.find( { status: "D" } ) SELECT * FROM inventory WHERE status = "D"
db.inventory.find( { status: {
$in: [ "A", "D" ] } } )
SELECT * FROM inventory WHERE status in ("A", "D")
db.inventory.find( { status: "A", qty:
{ $lt: 30 } } )
SELECT * FROM inventory WHERE status = "A" AND qty < 30
db.inventory.find( {
status: "A", $or: [ { qty:
{ $lt: 30 } }, { item: /^p/ }
] } )
SELECT * FROM inventory WHERE status = "A" AND ( qty <
30 OR item LIKE "p%")
CRUD - Update
● Some Update Operators
○ $currentDate
○ $inc
○ $min
○ $max
○ $mul
○ $rename
○ $set
db.collection.update()
db.collection.findAndModify()
db.collection.updateOne()
db.collection.updateMany()
db.collection.replaceOne()
CRUD - Delete
● Indexes
○ Delete operations do not drop indexes, even if deleting all documents from
a collection.
● Atomicity
○ All write operations in MongoDB are atomic on the level of a single
document.
db.collection.remove()
db.collection.deleteOne()
db.collection.deleteMany()
Cursors
● Cursors, found in many database systems, return query result sets in batches
for efficiency iteratively.
● Queries instantiate a cursor, which is then used to retrieve a resultset in
manageable chunks, successive calls to MongoDB occur as needed to fill the
driver’s cursor buffer.
● Returning a huge result right away would mean:
○ Copying all that data into memory.
○ Transferring it over the wire.
○ Deserializing it on the client side.
Indexing
● Introduction
● Indexing Types
● Indexing Properties
Indexing- Introduction
● Index keys are typically smaller than the documents they catalog, and indexes
are typically available in RAM or located sequentially on disk.
● Covered Queries
○ When the query criteria and the projection of a query include only the indexed fields
○ Results returned directly from the index without scanning any documents or bringing
documents into memory.
● Ensure Indexes Fit in RAM
○ use the db.collection.totalIndexSize() helper, which returns index size in bytes.
Indexing - Index Types
● Single Field
● Compound Index
● Multikey Index
● Geospatial Index
● Text Indexes
● Hashed Indexes
Indexing - Index Properties
● TTL Indexes
○ The TTL index is used for TTL collections, which expire data after a period of time.
● Unique Indexes
○ A unique index causes MongoDB to reject all documents that contain a duplicate value for the
indexed field.
● Partial Indexes
○ A partial index indexes only documents that meet specified filter criteria.
● Case Insensitive Indexes
○ A case insensitive index disregards the case of the index key values.
● Sparse Indexes
○ A sparse index does not index documents that do not have the indexed field.
Schema Design
principles ● Introduction
● Embedding Vs. Referencing
● Model One-to-One
Relationships
● Model One-to-Many
Relationships
Schema Design principles - Introduction
● The application’s data access patterns should govern schema design,
with specific understanding of
○ The read/write ratio of database operations.
○ The types of queries and updates performed by the database.
○ The life-cycle of the data and growth rate of documents.
● When designing a data model, consider how applications will use your database.
○ if your application only uses recently inserted documents, consider using Capped Collections
data-modeling
Embedding Vs. Refencing
Embedding Vs. Refencing
● Embedding provides better performance for read operations, as well as the
ability to request and retrieve related data in a single database operation.
● Not all 1:1 or 1:Many relationships should be embedded in a single document.
Embedding Vs. Refencing
● References store the relationships between data by including links or
references from one document to another.
○ When embedding would not provide sufficient read performance advantages
○ Where the object is referenced from many different sources.
○ To represent complex many-to-many relationships.
○ To model large, hierarchical data sets.
One-to-One Relationships - Embedding
One-to-Many Relationships
One-to-ManyOne-to-Few
One-to-Many Relationships
One-to-Squillions
Aggregation
Aggregation
● Aggregation operations group values from multiple documents together, and
can perform a variety of operations on the grouped data to return a single
result.
● The aggregate command operates on a single collection, logically passing the
entire collection into the aggregation pipeline.
● The $match and $sort pipeline operators can take advantage of an index when
they occur at the beginning of the pipeline.
Aggregation
https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/
Aggregation - Limitations
● If any single document that exceeds the BSON Document Size limit, the
command will produce an error.
● The $group stage has a limit of 100 megabytes of RAM. By default, if the stage
exceeds this limit, $group will produce an error.
Map-Reduce
● Map-reduce is a data processing paradigm for condensing large volumes of data
into useful aggregated results.
● Map-Reduce is less efficient and more complex than the aggregation pipeline.
● All map-reduce functions in MongoDB are JavaScript and run within the
mongod process.
● Map-reduce operations take the documents of a single collection.
Questions

Quick overview on mongo db

  • 1.
  • 2.
    Agenda 1. Introduction 2. CRUD 3.Cursors 4. Indexing 5. Schema Design principles 6. Aggregation 7. Map-Reduce
  • 3.
    Introduction - ACID ●Relational databases usually guarantee ACID properties related to how reliably transactions (both reads and writes) are processed. ● The NoSQL movement trades off ACID compliance for other properties, such as 100% availability, and MongoDB is the leader in the field ● https://dzone.com/articles/how-acid-mongodb
  • 4.
    Introduction - ACID ●Atomicity requires that each transaction is executed in its entirety, or fail without any change being applied. ● Consistency requires that the database only passes from a valid state to the next one, without intermediate points. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers. ● Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. ● Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss.
  • 5.
    Introduction - CAP ●Consistency Every read receives the most recent write or an error. ● Availability Every request receives a (non-error) response – without guarantee that it contains the most recent write. ● Partition tolerance The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. “It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees”
  • 6.
    Introduction - MongoDB ●MongoDB is written in C++, open source and licensed under the GNU - AGPL . ● The core database server runs via an executable called mongod ( mongodb.exe on Windows) ● The MongoDB command shell is a JavaScript-based tool for administering the database and manipulating data. manual/reference/mongo-shell/
  • 7.
    CRUD - Create ●Databases and collections are created only when documents are first inserted.. ● Every MongoDB document requires an _id. db.collection.insertOne() db.collection.insertMany() db.collection.insert()
  • 8.
    CRUD - Read db.collection.find(query,projection) db.inventory.find( {} ) SELECT * FROM inventory db.inventory.find( { status: "D" } ) SELECT * FROM inventory WHERE status = "D" db.inventory.find( { status: { $in: [ "A", "D" ] } } ) SELECT * FROM inventory WHERE status in ("A", "D") db.inventory.find( { status: "A", qty: { $lt: 30 } } ) SELECT * FROM inventory WHERE status = "A" AND qty < 30 db.inventory.find( { status: "A", $or: [ { qty: { $lt: 30 } }, { item: /^p/ } ] } ) SELECT * FROM inventory WHERE status = "A" AND ( qty < 30 OR item LIKE "p%")
  • 9.
    CRUD - Update ●Some Update Operators ○ $currentDate ○ $inc ○ $min ○ $max ○ $mul ○ $rename ○ $set db.collection.update() db.collection.findAndModify() db.collection.updateOne() db.collection.updateMany() db.collection.replaceOne()
  • 10.
    CRUD - Delete ●Indexes ○ Delete operations do not drop indexes, even if deleting all documents from a collection. ● Atomicity ○ All write operations in MongoDB are atomic on the level of a single document. db.collection.remove() db.collection.deleteOne() db.collection.deleteMany()
  • 11.
    Cursors ● Cursors, foundin many database systems, return query result sets in batches for efficiency iteratively. ● Queries instantiate a cursor, which is then used to retrieve a resultset in manageable chunks, successive calls to MongoDB occur as needed to fill the driver’s cursor buffer. ● Returning a huge result right away would mean: ○ Copying all that data into memory. ○ Transferring it over the wire. ○ Deserializing it on the client side.
  • 12.
    Indexing ● Introduction ● IndexingTypes ● Indexing Properties
  • 13.
    Indexing- Introduction ● Indexkeys are typically smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk. ● Covered Queries ○ When the query criteria and the projection of a query include only the indexed fields ○ Results returned directly from the index without scanning any documents or bringing documents into memory. ● Ensure Indexes Fit in RAM ○ use the db.collection.totalIndexSize() helper, which returns index size in bytes.
  • 14.
    Indexing - IndexTypes ● Single Field ● Compound Index ● Multikey Index ● Geospatial Index ● Text Indexes ● Hashed Indexes
  • 15.
    Indexing - IndexProperties ● TTL Indexes ○ The TTL index is used for TTL collections, which expire data after a period of time. ● Unique Indexes ○ A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed field. ● Partial Indexes ○ A partial index indexes only documents that meet specified filter criteria. ● Case Insensitive Indexes ○ A case insensitive index disregards the case of the index key values. ● Sparse Indexes ○ A sparse index does not index documents that do not have the indexed field.
  • 16.
    Schema Design principles ●Introduction ● Embedding Vs. Referencing ● Model One-to-One Relationships ● Model One-to-Many Relationships
  • 17.
    Schema Design principles- Introduction ● The application’s data access patterns should govern schema design, with specific understanding of ○ The read/write ratio of database operations. ○ The types of queries and updates performed by the database. ○ The life-cycle of the data and growth rate of documents. ● When designing a data model, consider how applications will use your database. ○ if your application only uses recently inserted documents, consider using Capped Collections data-modeling
  • 18.
  • 19.
    Embedding Vs. Refencing ●Embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. ● Not all 1:1 or 1:Many relationships should be embedded in a single document.
  • 20.
    Embedding Vs. Refencing ●References store the relationships between data by including links or references from one document to another. ○ When embedding would not provide sufficient read performance advantages ○ Where the object is referenced from many different sources. ○ To represent complex many-to-many relationships. ○ To model large, hierarchical data sets.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    Aggregation ● Aggregation operationsgroup values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. ● The aggregate command operates on a single collection, logically passing the entire collection into the aggregation pipeline. ● The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.
  • 27.
  • 28.
    Aggregation - Limitations ●If any single document that exceeds the BSON Document Size limit, the command will produce an error. ● The $group stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $group will produce an error.
  • 29.
    Map-Reduce ● Map-reduce isa data processing paradigm for condensing large volumes of data into useful aggregated results. ● Map-Reduce is less efficient and more complex than the aggregation pipeline. ● All map-reduce functions in MongoDB are JavaScript and run within the mongod process. ● Map-reduce operations take the documents of a single collection.
  • 31.