SlideShare a Scribd company logo
1 of 31
MongoDB
https://www.mongodb.com/
Prutha Date (dprutha1@umbc.edu)
Siraj Memon (siraj1@umbc.edu)
Outline
• Introduction to MongoDB
• Storage Layout
• Data Management Features
• Performance Analysis
• Limitations
• Conclusion
• Demo
• References
What is MongoDB?
• MongoDB is a NoSQL Document-Oriented database.
• It provides semi-structured flexible schema.
• It provides high performance, high availability, and easy scalability.
• MongoDB is free and open source software.
• License: GNU Affero General Public License (AGPL) and Apache License
• MongoDB is a server process that runs on Linux, Windows and OS X. It can
be run both as a 32 or 64-bit application.
When to use MongoDB?
“Knowing when to use a hammer, and when to use a screwdriver.”
• Account and user profiles: can store arrays of addresses with ease (MetLife)
• Content Management Systems (CMS): the flexible schema of MongoDB is great for heterogeneous
collections of content types (MongoPress)
• Form data: MongoDB makes it easy to evolve the structure of form data over time (ADP)
• Blogs / user-generated content: can keep data with complex relationships together in one object (Forbes,
AOL)
• Messaging: vary message meta-data easily per message or message type without needing to maintain
separate collections or schemas (Viber)
• System configuration: just a nice object graph of configuration values, which is very natural in MongoDB
(Cisco)
• Log data of any kind: structured log data is the future (ebay)
• Location based systems: makes use of Geospatial indices (Foursquare, City government of Chicago)
Terminologies – RDBMS vs MongoDB
*JSON – JavaScript Object Notation
Storage Internals - Directory Layout
Data Directory is found at /data/db
Internal File Format
Extent Structure
Extents and Records
To Sum Up: Internal File Format
• Files on disk are broken into extents which contain the documents.
• A collection has one or more extents.
• Extent grow exponentially up to 2GB.
• Namespace entries in the ns (namespace) file point to the first extent
for that collection.
Virtual Address Space
Storage Engine - MMAP (Memory Mapped)
• All data files are memory mapped to Virtual Memory by the
OS.
• MongoDB just reads / writes to RAM in the filesystem cache
• OS takes care of the rest!
• Virtual process size = total files size + overhead (connections,
heap)
• Uses Memory-mapped file using mmap() system call.
Storage Engine - WiredTiger
• Designed especially for Write-Intensive applications
• Document level locking
• Compression and Record-level locking
• Multi-version concurrency control (MVCC)
• Multi-document transactions
• Support for Log Structured Merge (LSM) trees for very high
insert workloads
What makes MongoDB cool?
• Sharding
• Aggregation Framework and Map-Reduce
• Capped Collection
• GridFS
• Geo-Spatial Indexing
Sharding
• Horizontal scaling - divides the data set and distributes the data over
multiple servers, or shards.
• Used to support deployments with very large data sets and high
throughput operations.
• Sharded Cluster Components –
• Shards – mongod instance or replica sets
• Config Server – Multiple mongod instances
• Routing Instances – Multiple mongos instances
• Shards are divided into fixed size chunks using ranges of shard key
values.
Sharding Internals
Choosing a Shard key
The choice of shard key affects:
• Distribution of reads and writes
• Uneven distribution of reads/writes across shards.
• Solution – Hashed ids
• Size of chunks
• Jumbo chunks cause uneven distribution of data.
• Moving data between shards becomes difficult.
• Solution – Multi-tenant compound index
• The number of shards each query hits
Aggregation Framework
• Aggregation Pipeline
• Map-Reduce
• Single Purpose Aggregation Operations (deprecated in latest version)
Aggregation Pipeline
• The aggregation pipeline is a framework for performing aggregation
tasks, modeled on the concept of data processing pipelines.
• Using this framework, MongoDB passes the documents of a single
collection through a pipeline.
• The pipeline transforms the documents into aggregated results, and is
accessed through the aggregate database command.
• Operators: $match, $project, $unwind, $sort, $limit
• User gets to choose the operator.
Aggregation Pipeline - Example
Continued…
Map-Reduce
Capped Collection
• Fixed size collection called capped collection
• Use the db.createCollection command and marked it as capped
• e.g - db.createCollection(‘logs’, {capped: true, size: 2097152})
• When it reaches the size limit, old documents are automatically
removed
• Guarantees preservation of the insertion order
• Maintains insertion order identical to the order on disk by prohibiting
updates that increase document size
• Allows the use of tailable cursor to retrieve documents
GridFS
• GridFS is a specification for storing and retrieving files that exceed
the BSON (binary JSON) document size limit of 16MB.
• Instead of storing a file in a single document, GridFS divides a file into
parts, or chunks, and stores each of those chunks as a separate
document.
• By default GridFS limits chunk size to 255k.
• GridFS uses two collections to store files. One collection stores the file
chunks, and the other stores file metadata.
• GridFS is useful not only for storing files that exceed 16MB but also
for storing any files for which you want access without having to load
the entire file into memory.
GeoSpatial Indexing
• To support efficient queries of geospatial coordinate data, MongoDB
provides two special indexes:
• 2d indexes that uses planar geometry when returning results.
• 2sphere indexes that use spherical geometry to return results.
• Store location data as GeoJSON objects with this coordinate-axis
order: longitude, latitude.
• GeoJSON Object Supported: Point, LineString, Polygon, etc.
• Query Operations: Inclusion, Intersection, Proximity.
• You cannot use a geospatial index as the shard key index.
Performance Analysis
• Yahoo! Cloud Serving Benchmark (YCSB)
• Throughput (ops/second)
WORKLOADS Cassandra Couchbase MongoDB
50% read, 50% update 134,839 106,638 160,719
95% read, 5% update 144,455 187,798 196,498
50% read, 50% update
(Durability Optimized)
6,289 1,236 31,864
Limitations
• Need to have enough memory to fit your working set into memory,
otherwise performance might suffer.
• MapReduce and Aggregation are single-threaded. To be more specific,
one per mongod.
• No joins across collections.
• On 32-bit, it has limitation of 2.5 Gb data.
• Sharding has some unique exceptions. If you plan to shard your data,
you need to shard early as some things that are feasible on a single
server are not feasible on a sharded collection.
Conclusion
• MongoDB is a semi-structured document-oriented NoSQL Database.
• It has two storage engines: MMAP and WiredTiger
• Multiple Aggregation Frameworks: Aggregation Pipeline and Map-
Reduce
• Support for GridFS, GeoSpatial Indexing, Capped Collection
• Better Performance as compared to Cassandra and Couchbase.
• On-going work – In-memory and HDFS support
DEMO
References
• https://www.mongodb.com/presentations/storage-engine-internals
• http://docs.mongodb.org/manual/core/data-modeling-introduction/
• http://docs.mongodb.org/manual/core/aggregation-introduction/
• https://2013.nosql-matters.org/bcn/wp-content/uploads/2013/12/storage-talk-
mongodb.pdf
• http://info-mongodb-com.s3.amazonaws.com/High Performance Benchmark White
Paper final.pdf
• https://www.mongodb.com/collateral/mongodb-architecture-guide
• Book - MongoDB: The Definitive Guide by Kristina Chodorow and Michael Dirolf
Questions?
Thank you!

More Related Content

What's hot

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases MongoDB
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialJason Terpko
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterMongoDB
 
Google File System
Google File SystemGoogle File System
Google File Systemguest2cb4689
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
 

What's hot (20)

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Google File System
Google File SystemGoogle File System
Google File System
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 

Viewers also liked

Evolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDBEvolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDBMongoDB
 
Evolution of mongodb
Evolution of mongodbEvolution of mongodb
Evolution of mongodbanshuman ravi
 
MongoDB gridfs
MongoDB gridfsMongoDB gridfs
MongoDB gridfsXue Wei
 
Microsoft Hekaton
Microsoft HekatonMicrosoft Hekaton
Microsoft HekatonSiraj Memon
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSMongoDB
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 
Gridfs and MongoDB
Gridfs and MongoDBGridfs and MongoDB
Gridfs and MongoDBMitch Pirtle
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud OperationEdureka!
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBSJared Rosoff
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBEdureka!
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 

Viewers also liked (19)

Tim marston
Tim marstonTim marston
Tim marston
 
Evolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDBEvolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDB
 
Evolution of mongodb
Evolution of mongodbEvolution of mongodb
Evolution of mongodb
 
MongoDB gridfs
MongoDB gridfsMongoDB gridfs
MongoDB gridfs
 
Microsoft Hekaton
Microsoft HekatonMicrosoft Hekaton
Microsoft Hekaton
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJS
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
 
MongoDB
MongoDBMongoDB
MongoDB
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
Gridfs and MongoDB
Gridfs and MongoDBGridfs and MongoDB
Gridfs and MongoDB
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud Operation
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Grid FS
Grid FSGrid FS
Grid FS
 

Similar to MongoDB Internals

MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceSasidhar Gogulapati
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWSMongoDB
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPdarkdata
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterChris Henry
 
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB
 
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 

Similar to MongoDB Internals (20)

MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
Drop acid
Drop acidDrop acid
Drop acid
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
MongoDB
MongoDBMongoDB
MongoDB
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWS
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTP
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
 
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
 
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 

Recently uploaded

1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 

Recently uploaded (20)

1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 

MongoDB Internals

  • 2. Outline • Introduction to MongoDB • Storage Layout • Data Management Features • Performance Analysis • Limitations • Conclusion • Demo • References
  • 3. What is MongoDB? • MongoDB is a NoSQL Document-Oriented database. • It provides semi-structured flexible schema. • It provides high performance, high availability, and easy scalability. • MongoDB is free and open source software. • License: GNU Affero General Public License (AGPL) and Apache License • MongoDB is a server process that runs on Linux, Windows and OS X. It can be run both as a 32 or 64-bit application.
  • 4. When to use MongoDB? “Knowing when to use a hammer, and when to use a screwdriver.” • Account and user profiles: can store arrays of addresses with ease (MetLife) • Content Management Systems (CMS): the flexible schema of MongoDB is great for heterogeneous collections of content types (MongoPress) • Form data: MongoDB makes it easy to evolve the structure of form data over time (ADP) • Blogs / user-generated content: can keep data with complex relationships together in one object (Forbes, AOL) • Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas (Viber) • System configuration: just a nice object graph of configuration values, which is very natural in MongoDB (Cisco) • Log data of any kind: structured log data is the future (ebay) • Location based systems: makes use of Geospatial indices (Foursquare, City government of Chicago)
  • 5. Terminologies – RDBMS vs MongoDB *JSON – JavaScript Object Notation
  • 6. Storage Internals - Directory Layout Data Directory is found at /data/db
  • 10. To Sum Up: Internal File Format • Files on disk are broken into extents which contain the documents. • A collection has one or more extents. • Extent grow exponentially up to 2GB. • Namespace entries in the ns (namespace) file point to the first extent for that collection.
  • 12. Storage Engine - MMAP (Memory Mapped) • All data files are memory mapped to Virtual Memory by the OS. • MongoDB just reads / writes to RAM in the filesystem cache • OS takes care of the rest! • Virtual process size = total files size + overhead (connections, heap) • Uses Memory-mapped file using mmap() system call.
  • 13. Storage Engine - WiredTiger • Designed especially for Write-Intensive applications • Document level locking • Compression and Record-level locking • Multi-version concurrency control (MVCC) • Multi-document transactions • Support for Log Structured Merge (LSM) trees for very high insert workloads
  • 14. What makes MongoDB cool? • Sharding • Aggregation Framework and Map-Reduce • Capped Collection • GridFS • Geo-Spatial Indexing
  • 15. Sharding • Horizontal scaling - divides the data set and distributes the data over multiple servers, or shards. • Used to support deployments with very large data sets and high throughput operations. • Sharded Cluster Components – • Shards – mongod instance or replica sets • Config Server – Multiple mongod instances • Routing Instances – Multiple mongos instances • Shards are divided into fixed size chunks using ranges of shard key values.
  • 17. Choosing a Shard key The choice of shard key affects: • Distribution of reads and writes • Uneven distribution of reads/writes across shards. • Solution – Hashed ids • Size of chunks • Jumbo chunks cause uneven distribution of data. • Moving data between shards becomes difficult. • Solution – Multi-tenant compound index • The number of shards each query hits
  • 18. Aggregation Framework • Aggregation Pipeline • Map-Reduce • Single Purpose Aggregation Operations (deprecated in latest version)
  • 19. Aggregation Pipeline • The aggregation pipeline is a framework for performing aggregation tasks, modeled on the concept of data processing pipelines. • Using this framework, MongoDB passes the documents of a single collection through a pipeline. • The pipeline transforms the documents into aggregated results, and is accessed through the aggregate database command. • Operators: $match, $project, $unwind, $sort, $limit • User gets to choose the operator.
  • 23. Capped Collection • Fixed size collection called capped collection • Use the db.createCollection command and marked it as capped • e.g - db.createCollection(‘logs’, {capped: true, size: 2097152}) • When it reaches the size limit, old documents are automatically removed • Guarantees preservation of the insertion order • Maintains insertion order identical to the order on disk by prohibiting updates that increase document size • Allows the use of tailable cursor to retrieve documents
  • 24. GridFS • GridFS is a specification for storing and retrieving files that exceed the BSON (binary JSON) document size limit of 16MB. • Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, and stores each of those chunks as a separate document. • By default GridFS limits chunk size to 255k. • GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata. • GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory.
  • 25. GeoSpatial Indexing • To support efficient queries of geospatial coordinate data, MongoDB provides two special indexes: • 2d indexes that uses planar geometry when returning results. • 2sphere indexes that use spherical geometry to return results. • Store location data as GeoJSON objects with this coordinate-axis order: longitude, latitude. • GeoJSON Object Supported: Point, LineString, Polygon, etc. • Query Operations: Inclusion, Intersection, Proximity. • You cannot use a geospatial index as the shard key index.
  • 26. Performance Analysis • Yahoo! Cloud Serving Benchmark (YCSB) • Throughput (ops/second) WORKLOADS Cassandra Couchbase MongoDB 50% read, 50% update 134,839 106,638 160,719 95% read, 5% update 144,455 187,798 196,498 50% read, 50% update (Durability Optimized) 6,289 1,236 31,864
  • 27. Limitations • Need to have enough memory to fit your working set into memory, otherwise performance might suffer. • MapReduce and Aggregation are single-threaded. To be more specific, one per mongod. • No joins across collections. • On 32-bit, it has limitation of 2.5 Gb data. • Sharding has some unique exceptions. If you plan to shard your data, you need to shard early as some things that are feasible on a single server are not feasible on a sharded collection.
  • 28. Conclusion • MongoDB is a semi-structured document-oriented NoSQL Database. • It has two storage engines: MMAP and WiredTiger • Multiple Aggregation Frameworks: Aggregation Pipeline and Map- Reduce • Support for GridFS, GeoSpatial Indexing, Capped Collection • Better Performance as compared to Cassandra and Couchbase. • On-going work – In-memory and HDFS support
  • 29. DEMO
  • 30. References • https://www.mongodb.com/presentations/storage-engine-internals • http://docs.mongodb.org/manual/core/data-modeling-introduction/ • http://docs.mongodb.org/manual/core/aggregation-introduction/ • https://2013.nosql-matters.org/bcn/wp-content/uploads/2013/12/storage-talk- mongodb.pdf • http://info-mongodb-com.s3.amazonaws.com/High Performance Benchmark White Paper final.pdf • https://www.mongodb.com/collateral/mongodb-architecture-guide • Book - MongoDB: The Definitive Guide by Kristina Chodorow and Michael Dirolf