SlideShare a Scribd company logo
1 of 32
Document databases
The mystery revealed
Contents
 noSQL
 Culture shock
 Document databases
 Concepts
 Benefits
 Schema design
 MongoDB
 Internals
 Use in .NET
noSQL
 Collective term for
a range of db’s
 Non-relational
 Key/value pairs
 key = field name
Document Databases
Comparison
Article
- id
- authorid
- title
- content
Comment
- id
- articleid
- message
Author
- id
- name
- email
Article
- _id
- title
- content
- author
- comments[]
- _id
- name
- email
 Relational  Document db
Terminology
 In parallel with SQL:
Relational Document db
Table Collection
Row Document
Column Field
Index Index
Join Embedding & linking
Schema N/A
Data integrity
 Shift of responsibilities to the app
 Manage data integrity and validity yourself
 Database more efficient and more scalable
DB
data integrity &
validity checks
APPLICATION
Concepts
 Joins
 No joins
 Joins at "design time", not at "query time“
 Due to embedded docs and arrays less joins are needed
 Constraints
 No foreign key constraints
 Unique indexes
 Transactions
 No commit/rollback
 Atomic operations
 Multiple actions inside the same document
 Incl. embedded documents
Dynamic schema
 No schema
 Implied: definition in the app, not the db
 A field can exist in certain docs and not in others
 When indexing null as a value
 Sparse index: exclude docs without that field
 Writing to a non-existent collection or database
 Lazy creation
 Reading from a non-existent collection
 Empty value returned
Relations
 Embedded fields
 Can be queried, the parent doc is returned
 Can be indexed
 Can’t be used for ordering
 Linking
 Get the 2nd doc yourself in de app via a reference
 Avoid where possible
 Use for:
 Many-to-many relations
 Subdoc often needs to be modified
Benefits
 Scalable: good for a lot of data / traffic
 Horizontal scaling: to more nodes
 Good for web-apps
 Performance
 No joins and constraints
 Dev/user friendly
 Data is modeled to how the app is going to use it
 No conversion between object oriented > relational
 No static schema = agile
Drawbacks
 More mistake-prone
 No data integrity checks
 Database is app-specific
 Less flexibility for shared usage
 Data aggregation is harder
 Less suitable for reporting
Schema Design
Schema design
 Start from application-specific queries
 “What questions do I have?” vs “What answers”
 “Data like the application wants it”
 Base parent documents on:
 The most common usage
 What do I want returned?
Schema design
 Hybrid embed / link
 Changing the author name is a seldom occurring action
 First update author.name
 Then update the articles async
Article
- _id
- author
- content
- _id
- name
- email
Author
- _id
- name
- email
Schema design
 Data duplication & denormalisation
 Pro
 simplicity
 optimalisation (less IO operations)
 query processing
 Con
 more disk usage
 data integrity
 Embedded docs
 Recommended < 250 kB
Product
Single collection inheritance
Product
- _id
- price
Book
- author
- title
Album
- artist
- title
Jeans
- size
- color
Book
- _id
- price
- author
- title
 Relational  Document db
Jeans
- _id
- price
- size
- color
Product
Single collection inheritance
Product
- _id
- price
Book
- author
- title
Album
- artist
- title
Jeans
- size
- color
_type: Book
- _id
- price
- author
- title
 Relational  Document db
_type: Jeans
- _id
- price
- size
- color
One-to-many
 Embedded array / array keys
 Some queries get harder
 You can index arrays!
 Normalized approach
 More flexibility
 A lot less performance
Article
- _id
- content
- tags: {“foo”, “bar”}
- comments: {“id1”, “id2”}
Many-to-many
 Using array keys
 No join table
 References on both sides
 Advantage: simple queries
articles.Where(p => p.CategoryIds.Contains(categoryId))
categories.Where(c => c.ArticleIds.Contains(articleId))
 Disadvantage: duplication, update two docs
Article
- _id
- content
- category_ids : {“id1”, “id2”}
Category
- _id
- name
- article_ids: {“id7”, “id8”}
Many-to-many
 References on one side
 Advantage: data in one place
 Disadvantage: 2 queries
articles.Where(p => p.CategoryIds.Contains(categoryId))
var article = articles.Single(p => p.Id == articleId)
categories.Where(c => c.Id.In(article.CategoryIds))
Article
- _id
- content
- category_ids : {“id1”, “id2”}
Category
- _id
- name
To sum up
 A new mind set
 Serialize complex .NET objects directly to the db
 Data duplication and denormalisation are key
 Big shift of responsibilities to the app
 No built-in data integrity checks
 Database has a single responsibility: storing data
 Quicker and easier to scale
MongoDB
 Why MongoDB?
 Largest user base, mature
 Platform independent
 Open source, free
Source: Google Trends
MongoDB: internals
 Durability
 By default through replication
 Single server durability: less performance
 Eventual consistency
 Configure fsync: sync between memory and disk
 by default every 60 sec.
 Configure replicate before return
MongoDB: internals
 Safe mode
 Turn off eventual consistency
 sync directly to the disk
 sufficiently replicate data, in replication sets
 Calls GetLastError to determine whether the action was
successful
 Applies to actions without a return value
 On connection or action level
MongoDB: internals
 Replication sets
 Nodes that are copies of each other
 Set-up of master and slave nodes
 If the master goes down, the slave automatically
takes over and promotes itself to master
 Sharding
 Scale out
 Clusters of replica sets
 Connected to
 a central proxy
 used by clients
 config servers
 contain meta-data
 Write to multiple nodes
MongoDB: internals
MongoDB: internals
 Sharding
 Based on a shard key (= field)
 Commands are sent to the shard that includes the
relevant range of the data
 Data is evenly distributed across the shards
 Automatic reallocation of data when adding or removing
servers
MongoDB: internals
 BSON
 Data storage and network transfer format
 Binary serialized JSON
 System collections
 db.systems.collections
 db.systems.indexes
 Geospatial indexing
 Find results closest to coordinate
 db.places.find({ loc: {$near: [50, 4], $maxDistance: 5} })
DEMO
MongoDB in .NET
Links
 http://www.mongodb.org/display/DOCS/CSharp+Language+Cen
ter
 Quick-start
 Documentation
 LINQ
 Serialization
 http://mongly.com/
 Free eBook
 Interactive tutorial

More Related Content

What's hot

5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 

What's hot (20)

Nosql
NosqlNosql
Nosql
 
Chapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsChapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortals
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Webinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDBWebinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDB
 
Nosql
NosqlNosql
Nosql
 
A case for teaching SQL to scientists
A case for teaching SQL to scientistsA case for teaching SQL to scientists
A case for teaching SQL to scientists
 
Introduction to RavenDB
Introduction to RavenDBIntroduction to RavenDB
Introduction to RavenDB
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
3. ADO.NET
3. ADO.NET3. ADO.NET
3. ADO.NET
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET Driver
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
No SQL - MongoDB
No SQL - MongoDBNo SQL - MongoDB
No SQL - MongoDB
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Azure Database Options - NoSql vs Sql
Azure Database Options - NoSql vs SqlAzure Database Options - NoSql vs Sql
Azure Database Options - NoSql vs Sql
 
Spring Test DBUnit
Spring Test DBUnitSpring Test DBUnit
Spring Test DBUnit
 

Similar to Document databases

Building social and RESTful frameworks
Building social and RESTful frameworksBuilding social and RESTful frameworks
Building social and RESTful frameworks
brendonschwartz
 

Similar to Document databases (20)

NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Mongo db
Mongo dbMongo db
Mongo db
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaper
 
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
 
MongoDB
MongoDBMongoDB
MongoDB
 
Techorama - Evolvable Application Development with MongoDB
Techorama  - Evolvable Application Development with MongoDBTechorama  - Evolvable Application Development with MongoDB
Techorama - Evolvable Application Development with MongoDB
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
 
Extend db
Extend dbExtend db
Extend db
 
MongoDB - A next-generation database that lets you create applications never ...
MongoDB - A next-generation database that lets you create applications never ...MongoDB - A next-generation database that lets you create applications never ...
MongoDB - A next-generation database that lets you create applications never ...
 
Which no sql database
Which no sql databaseWhich no sql database
Which no sql database
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
Mongodb Introduction
Mongodb IntroductionMongodb Introduction
Mongodb Introduction
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
HadoopDB
HadoopDBHadoopDB
HadoopDB
 
Building social and RESTful frameworks
Building social and RESTful frameworksBuilding social and RESTful frameworks
Building social and RESTful frameworks
 
Database Management System (DBMS) | Computer Science
Database Management System (DBMS) | Computer ScienceDatabase Management System (DBMS) | Computer Science
Database Management System (DBMS) | Computer Science
 

More from Qframe (6)

DDD, CQRS, ES lessons learned
DDD, CQRS, ES lessons learnedDDD, CQRS, ES lessons learned
DDD, CQRS, ES lessons learned
 
Whats new windows phone 8 1
Whats new windows phone 8 1Whats new windows phone 8 1
Whats new windows phone 8 1
 
Community day mvvmcross
Community day mvvmcrossCommunity day mvvmcross
Community day mvvmcross
 
Mvvm crossevent basics
Mvvm crossevent basicsMvvm crossevent basics
Mvvm crossevent basics
 
Building Cross Platform Mobile Solutions
Building Cross Platform Mobile SolutionsBuilding Cross Platform Mobile Solutions
Building Cross Platform Mobile Solutions
 
Visug async
Visug asyncVisug async
Visug async
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Document databases

  • 2. Contents  noSQL  Culture shock  Document databases  Concepts  Benefits  Schema design  MongoDB  Internals  Use in .NET
  • 3. noSQL  Collective term for a range of db’s  Non-relational  Key/value pairs  key = field name
  • 5. Comparison Article - id - authorid - title - content Comment - id - articleid - message Author - id - name - email Article - _id - title - content - author - comments[] - _id - name - email  Relational  Document db
  • 6. Terminology  In parallel with SQL: Relational Document db Table Collection Row Document Column Field Index Index Join Embedding & linking Schema N/A
  • 7. Data integrity  Shift of responsibilities to the app  Manage data integrity and validity yourself  Database more efficient and more scalable DB data integrity & validity checks APPLICATION
  • 8. Concepts  Joins  No joins  Joins at "design time", not at "query time“  Due to embedded docs and arrays less joins are needed  Constraints  No foreign key constraints  Unique indexes  Transactions  No commit/rollback  Atomic operations  Multiple actions inside the same document  Incl. embedded documents
  • 9. Dynamic schema  No schema  Implied: definition in the app, not the db  A field can exist in certain docs and not in others  When indexing null as a value  Sparse index: exclude docs without that field  Writing to a non-existent collection or database  Lazy creation  Reading from a non-existent collection  Empty value returned
  • 10. Relations  Embedded fields  Can be queried, the parent doc is returned  Can be indexed  Can’t be used for ordering  Linking  Get the 2nd doc yourself in de app via a reference  Avoid where possible  Use for:  Many-to-many relations  Subdoc often needs to be modified
  • 11. Benefits  Scalable: good for a lot of data / traffic  Horizontal scaling: to more nodes  Good for web-apps  Performance  No joins and constraints  Dev/user friendly  Data is modeled to how the app is going to use it  No conversion between object oriented > relational  No static schema = agile
  • 12. Drawbacks  More mistake-prone  No data integrity checks  Database is app-specific  Less flexibility for shared usage  Data aggregation is harder  Less suitable for reporting
  • 14. Schema design  Start from application-specific queries  “What questions do I have?” vs “What answers”  “Data like the application wants it”  Base parent documents on:  The most common usage  What do I want returned?
  • 15. Schema design  Hybrid embed / link  Changing the author name is a seldom occurring action  First update author.name  Then update the articles async Article - _id - author - content - _id - name - email Author - _id - name - email
  • 16. Schema design  Data duplication & denormalisation  Pro  simplicity  optimalisation (less IO operations)  query processing  Con  more disk usage  data integrity  Embedded docs  Recommended < 250 kB
  • 17. Product Single collection inheritance Product - _id - price Book - author - title Album - artist - title Jeans - size - color Book - _id - price - author - title  Relational  Document db Jeans - _id - price - size - color
  • 18. Product Single collection inheritance Product - _id - price Book - author - title Album - artist - title Jeans - size - color _type: Book - _id - price - author - title  Relational  Document db _type: Jeans - _id - price - size - color
  • 19. One-to-many  Embedded array / array keys  Some queries get harder  You can index arrays!  Normalized approach  More flexibility  A lot less performance Article - _id - content - tags: {“foo”, “bar”} - comments: {“id1”, “id2”}
  • 20. Many-to-many  Using array keys  No join table  References on both sides  Advantage: simple queries articles.Where(p => p.CategoryIds.Contains(categoryId)) categories.Where(c => c.ArticleIds.Contains(articleId))  Disadvantage: duplication, update two docs Article - _id - content - category_ids : {“id1”, “id2”} Category - _id - name - article_ids: {“id7”, “id8”}
  • 21. Many-to-many  References on one side  Advantage: data in one place  Disadvantage: 2 queries articles.Where(p => p.CategoryIds.Contains(categoryId)) var article = articles.Single(p => p.Id == articleId) categories.Where(c => c.Id.In(article.CategoryIds)) Article - _id - content - category_ids : {“id1”, “id2”} Category - _id - name
  • 22. To sum up  A new mind set  Serialize complex .NET objects directly to the db  Data duplication and denormalisation are key  Big shift of responsibilities to the app  No built-in data integrity checks  Database has a single responsibility: storing data  Quicker and easier to scale
  • 23.
  • 24. MongoDB  Why MongoDB?  Largest user base, mature  Platform independent  Open source, free Source: Google Trends
  • 25. MongoDB: internals  Durability  By default through replication  Single server durability: less performance  Eventual consistency  Configure fsync: sync between memory and disk  by default every 60 sec.  Configure replicate before return
  • 26. MongoDB: internals  Safe mode  Turn off eventual consistency  sync directly to the disk  sufficiently replicate data, in replication sets  Calls GetLastError to determine whether the action was successful  Applies to actions without a return value  On connection or action level
  • 27. MongoDB: internals  Replication sets  Nodes that are copies of each other  Set-up of master and slave nodes  If the master goes down, the slave automatically takes over and promotes itself to master
  • 28.  Sharding  Scale out  Clusters of replica sets  Connected to  a central proxy  used by clients  config servers  contain meta-data  Write to multiple nodes MongoDB: internals
  • 29. MongoDB: internals  Sharding  Based on a shard key (= field)  Commands are sent to the shard that includes the relevant range of the data  Data is evenly distributed across the shards  Automatic reallocation of data when adding or removing servers
  • 30. MongoDB: internals  BSON  Data storage and network transfer format  Binary serialized JSON  System collections  db.systems.collections  db.systems.indexes  Geospatial indexing  Find results closest to coordinate  db.places.find({ loc: {$near: [50, 4], $maxDistance: 5} })
  • 32. Links  http://www.mongodb.org/display/DOCS/CSharp+Language+Cen ter  Quick-start  Documentation  LINQ  Serialization  http://mongly.com/  Free eBook  Interactive tutorial

Editor's Notes

  1. Documentscontain:ValuesArraysEmbedded docs
  2. 16 MB size limit in MongoDB