SlideShare a Scribd company logo
1 of 22
Download to read offline
1	
  
Directory Layout
•  Separate files per database
•  Aggressive preallocation
•  Files contain one or more extents

  -rw-------   1   ben   ben    64M   May   1   19:14   test.0!
  -rw-------   1   ben   ben   128M   May   1   19:14   test.1!
  -rw-------   1   ben   ben   256M   May   1   18:25   test.2!
  -rw-------   1   ben   ben   512M   May   1   19:14   test.3!
  -rw-------   1   ben   ben   1.0G   May   1   19:14   test.4!
  -rw-------   1   ben   ben   2.0G   May   1   18:58   test.5!
  -rw-------   1   ben   ben    16M   May   1   19:14   test.ns!




                                                        2	
  
Memory Mapping
 0x7fffffffffff	
  
                          STACK!
                            …!


                          LIBS!
                            …!
                         test.ns!     Disk	
  
                         test.0!
                         test.1!
                            …!
                           !
                            …!
                          HEAP!       {	
  …	
  }	
  
                         MONGOD!
                          NULL!
               0x0	
  
                                    Document	
  
 Process	
  Virtual	
  Memory	
  
Data Structures
•  DiskLoc
  •  Stores file number and offset of data on disk
  •  Record *r = mmap base + DiskLoc.offset!
  •  Max offset is 2^31 (2GB)!
•  NamespaceDetails
  •  Stores collection metadata!
•  Extent!
  •  Stores contiguous blocks within a namespace
  •  Max extent size is 2GB	
  
•  Record!
  •  Holds a BSON document or B-tree bucket
  •  DeletedRecord overwrites a Record!
  •  Includes Padding
Namespace Details
•    Holds metadata about a collection or index
•    Stored in 1KB buckets in <dbname>.ns file
•    .ns file fixed size of 16MB
•    Maintains document count
•    Contains heads of linked lists

      NamespaceDetails	
  
       firstExtent	
     lastExtent	
     _indexes[]	
     stats	
     freeList[]	
  
Extent Structure

  Extent	
         Extent	
  
    length	
         length	
  

     xNext	
          xNext	
  


     xPrev	
          xPrev	
  


  firstRecord	
     firstRecord	
  


  lastRecord	
     lastRecord	
  
Extents
>	
  db.foo.validate(	
  {	
  full	
  :	
  true	
  }	
  ).extents.forEach(	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  function(z){	
  print(	
  z.loc	
  +	
  "tt"	
  +	
  z.size	
  );	
  }	
  )	
  
0:3000 	
                                	
  20480	
  
0:12000                  	
              	
  81920	
  
0:26000                  	
              	
  327680	
  
0:76000                  	
              	
  1310720	
  
0:1da000                                 	
  5242880	
  
0:76a000                                 	
  6291456	
  
0:d6a000                                 	
  7553024	
  
0:16de000                                	
  9064448	
  
0:1f83000                                	
  10878976	
  
0:29e3000                                	
  13058048	
  
1:2000 	
                                	
  15671296	
  
1:ef4000                                 	
  18808832	
  
1:29e4000                                	
  22573056	
  
Index Extents

>	
  db.system.namespaces.find()	
  
{	
  "name"	
  :	
  "test.foo"	
  }	
  
{	
  "name"	
  :	
  "test.system.indexes"	
  }	
  
{	
  "name"	
  :	
  "test.foo.$_id_"	
  }	
  
	
  
>	
  db["foo.$_id_"].validate(	
  {	
  full	
  :	
  true	
  }	
  ).extents.forEach(	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  function(z){	
  print(	
  z.loc	
  +	
  "tt"	
  +	
  z.size	
  );	
  }	
  )	
  
0:9000 	
                                	
  36864	
  
0:1b6000                                 	
  147456	
  
0:6da000                                 	
  589824	
  
0:149e000                                	
  2359296	
  
1:20e4000                                	
  9437184	
  
Extents and Records

Extent	
  
   length	
  

    xNext	
  
                  Data	
  Record	
  
    xPrev	
       length	
     Document	
  
                               {	
  	
  
                  rNext	
  
 firstRecord	
                  	
  	
  _id:	
  “foo”,	
  
                               	
  	
  ...	
  	
  
                   rPrev	
     }	
  
 lastRecord	
  
Extents and Records

Extent	
  
   length	
  

    xNext	
  
                  Data	
  Record	
  
    xPrev	
       length	
     Document	
  
                               {	
  	
  
                  rNext	
  
 firstRecord	
                  	
  	
  _id:	
  “foo”,	
  
                               	
  	
  ...	
  	
  
                   rPrev	
     }	
  
 lastRecord	
  
Extents and Records

Extent	
  
   length	
  

    xNext	
  
                  Data	
  Record	
                          Data	
  Record	
  
    xPrev	
       length	
     Document	
                   length	
     Document	
  
                               {	
  	
                                   {	
  	
  
                  rNext	
                                   rNext	
  
 firstRecord	
                  	
  	
  _id:	
  “foo”,	
                  	
  	
  _id:	
  “foo”,	
  
                               	
  	
  ...	
  	
                         	
  	
  ...	
  	
  
                   rPrev	
     }	
                           rPrev	
     }	
  
 lastRecord	
  
BSON Format

        {	
  hello:	
  “world”	
  }	
  

  Doc	
  Length	
       Value	
  Type	
  

  x16x00x00x00 x02hellox00 !
  x06x00x00x00 worldx00x00!
  Value	
  Length	
  
Index Extents

Extent	
  
   length	
  
                  Index	
  Record	
                               Index	
  Record	
  
    xNext	
  


    xPrev	
       length	
                 Bucket	
               length	
            Bucket	
  
                                             parent	
                                  parent	
  
                   rNext	
                                         rNext	
  
 firstRecord	
                              numKeys	
                                  numKeys	
  
                   rPrev	
              K 	
  
                                        	
          	
     	
      rPrev	
     	
  
 lastRecord	
  



                               {	
  Document	
  }	
  
Index Extents                                                                       	
  
                                                                                    4       	
  
                                                                                            9



                                                                  	
  
                                                                  1      	
  
                                                                         3          	
  
                                                                                    5      	
  
                                                                                           6       	
  
                                                                                                   8             	
  
                                                                                                                 A        	
  
                                                                                                                          B




Extent	
  
   length	
  
                  Index	
  Record	
                                             Index	
  Record	
  
    xNext	
  


    xPrev	
       length	
                 Bucket	
                             length	
                                Bucket	
  
                                             parent	
                                                                    parent	
  
                   rNext	
                                                       rNext	
  
 firstRecord	
                              numKeys	
                                                                    numKeys	
  
                   rPrev	
              K 	
  
                                        	
          	
     	
                    rPrev	
                  	
  
 lastRecord	
  



                               {	
  Document	
  }	
  
Journaling
•  Write ahead logging
•  Operations written to journal before memory
  mapped regions
  •  Private view
  •  Shared view
•  Once journal written, data safe unless
   hardware problem
•  By default, journal flushed every 100ms,
   100mb of writes, or on write concern of j=true
  •  User configurable with --journalCommitInterval
Journal Format
JHeader	
  
                                 •  Section	
  contains	
  single	
  group	
  commit	
  
JSectHeader	
  [LSN	
  3]	
  
                                 •  Applied	
  all-­‐or-­‐nothing	
  
          DurOp	
  
          DurOp	
  

          DurOp	
               Op_DbContext	
          Set	
  database	
  context	
  for	
  
JSectFooter	
                   length	
                subsequent	
  operations	
  
                                offset	
  
JSectHeader	
  [LSN	
  7]	
  
                                fileNo	
  
          DurOp	
               data[length]	
  
          DurOp	
               length	
  
                                offset	
  
                                                         Write	
  Operation	
  
          DurOp	
               fileNo	
  
                                data[length]	
  
JSectFooter	
  
                                length	
  
…	
                             offset	
  
                                fileNo	
  
                                data[length]	
  
Journal Performance
•  On 99.9% read systems, no impact
•  Write performance degraded 5-30% when
   journal on same drive
•  Separate drive as low as 3%
Journal Admin
•  Journal stored in /dbpath/journal folder
•  If faster, three 1gb files may be preallocated
•  Can symlink to a different spindle
•  --journalCommitInterval* (2ms - 300ms)
•  When to journal
   •  Single node: required for data integrity
   •  Replica set: at least 1 node
   •  All nodes: removes possible need to resync
Fragmentation
•  Files may become fragmented over time if
   documents change size
•  Free lists also contribute to fragmentation
  •  2.0 reduced scanning to reasonable amounts
  •  2.2 will change allocation strategy
  •  Need to re-write free list to do online compaction
Compaction
•  1.8 and previous: repairDatabase
•  2.0+ : compact command
  •  Currently resets paddingFactor, but can be
     changed.
  •  Index (re)generation is now concurrent, so
     compaction can be N times faster
•  Generally causes some extra allocation
  •  Does not delete or truncate files
Planned Changes
•  Split data and indexes into different files
•  Indexes could by symlinked to a different
   drive (SSD)
•  Improved allocation strategy
Download	
  MongoDB	
  

http://www.mongodb.org/downloads	
  
               	
  




       Ben	
  Becker	
  
  ben.becker@10gen.com	
  

More Related Content

What's hot

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
Konstantin V. Shvachko
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
Sveta Smirnova
 

What's hot (20)

MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?
 
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
Presto in Treasure Data (presented at db tech showcase Sapporo 2015)
 
MongoDB and Fractal Tree Indexes
MongoDB and Fractal Tree IndexesMongoDB and Fractal Tree Indexes
MongoDB and Fractal Tree Indexes
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
 
mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancement
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기
 

Similar to MongoDB Journaling and the Storage Enginer

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
Erik Hatcher
 

Similar to MongoDB Journaling and the Storage Enginer (20)

Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Linux Resource Management - Мариян Маринов (Siteground)
Linux Resource Management - Мариян Маринов (Siteground)Linux Resource Management - Мариян Маринов (Siteground)
Linux Resource Management - Мариян Маринов (Siteground)
 
Linux resource limits
Linux resource limitsLinux resource limits
Linux resource limits
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
Python redis talk
Python redis talkPython redis talk
Python redis talk
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Performance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. JanowskiPerformance .NET Core - M. Terech, P. Janowski
Performance .NET Core - M. Terech, P. Janowski
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
File mangement
File mangementFile mangement
File mangement
 
#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and Protocols#Pharo Days 2016 Data Formats and Protocols
#Pharo Days 2016 Data Formats and Protocols
 
Rar
RarRar
Rar
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

MongoDB Journaling and the Storage Enginer

  • 2. Directory Layout •  Separate files per database •  Aggressive preallocation •  Files contain one or more extents -rw------- 1 ben ben 64M May 1 19:14 test.0! -rw------- 1 ben ben 128M May 1 19:14 test.1! -rw------- 1 ben ben 256M May 1 18:25 test.2! -rw------- 1 ben ben 512M May 1 19:14 test.3! -rw------- 1 ben ben 1.0G May 1 19:14 test.4! -rw------- 1 ben ben 2.0G May 1 18:58 test.5! -rw------- 1 ben ben 16M May 1 19:14 test.ns! 2  
  • 3. Memory Mapping 0x7fffffffffff   STACK! …! LIBS! …! test.ns! Disk   test.0! test.1! …! ! …! HEAP! {  …  }   MONGOD! NULL! 0x0   Document   Process  Virtual  Memory  
  • 4. Data Structures •  DiskLoc •  Stores file number and offset of data on disk •  Record *r = mmap base + DiskLoc.offset! •  Max offset is 2^31 (2GB)! •  NamespaceDetails •  Stores collection metadata! •  Extent! •  Stores contiguous blocks within a namespace •  Max extent size is 2GB   •  Record! •  Holds a BSON document or B-tree bucket •  DeletedRecord overwrites a Record! •  Includes Padding
  • 5. Namespace Details •  Holds metadata about a collection or index •  Stored in 1KB buckets in <dbname>.ns file •  .ns file fixed size of 16MB •  Maintains document count •  Contains heads of linked lists NamespaceDetails   firstExtent   lastExtent   _indexes[]   stats   freeList[]  
  • 6. Extent Structure Extent   Extent   length   length   xNext   xNext   xPrev   xPrev   firstRecord   firstRecord   lastRecord   lastRecord  
  • 7. Extents >  db.foo.validate(  {  full  :  true  }  ).extents.forEach(                      function(z){  print(  z.loc  +  "tt"  +  z.size  );  }  )   0:3000    20480   0:12000    81920   0:26000    327680   0:76000    1310720   0:1da000  5242880   0:76a000  6291456   0:d6a000  7553024   0:16de000  9064448   0:1f83000  10878976   0:29e3000  13058048   1:2000    15671296   1:ef4000  18808832   1:29e4000  22573056  
  • 8. Index Extents >  db.system.namespaces.find()   {  "name"  :  "test.foo"  }   {  "name"  :  "test.system.indexes"  }   {  "name"  :  "test.foo.$_id_"  }     >  db["foo.$_id_"].validate(  {  full  :  true  }  ).extents.forEach(                      function(z){  print(  z.loc  +  "tt"  +  z.size  );  }  )   0:9000    36864   0:1b6000  147456   0:6da000  589824   0:149e000  2359296   1:20e4000  9437184  
  • 9. Extents and Records Extent   length   xNext   Data  Record   xPrev   length   Document   {     rNext   firstRecord      _id:  “foo”,      ...     rPrev   }   lastRecord  
  • 10. Extents and Records Extent   length   xNext   Data  Record   xPrev   length   Document   {     rNext   firstRecord      _id:  “foo”,      ...     rPrev   }   lastRecord  
  • 11. Extents and Records Extent   length   xNext   Data  Record   Data  Record   xPrev   length   Document   length   Document   {     {     rNext   rNext   firstRecord      _id:  “foo”,      _id:  “foo”,      ...        ...     rPrev   }   rPrev   }   lastRecord  
  • 12. BSON Format {  hello:  “world”  }   Doc  Length   Value  Type   x16x00x00x00 x02hellox00 ! x06x00x00x00 worldx00x00! Value  Length  
  • 13. Index Extents Extent   length   Index  Record   Index  Record   xNext   xPrev   length   Bucket   length   Bucket   parent   parent   rNext   rNext   firstRecord   numKeys   numKeys   rPrev   K         rPrev     lastRecord   {  Document  }  
  • 14. Index Extents   4   9   1   3   5   6   8   A   B Extent   length   Index  Record   Index  Record   xNext   xPrev   length   Bucket   length   Bucket   parent   parent   rNext   rNext   firstRecord   numKeys   numKeys   rPrev   K         rPrev     lastRecord   {  Document  }  
  • 15. Journaling •  Write ahead logging •  Operations written to journal before memory mapped regions •  Private view •  Shared view •  Once journal written, data safe unless hardware problem •  By default, journal flushed every 100ms, 100mb of writes, or on write concern of j=true •  User configurable with --journalCommitInterval
  • 16. Journal Format JHeader   •  Section  contains  single  group  commit   JSectHeader  [LSN  3]   •  Applied  all-­‐or-­‐nothing   DurOp   DurOp   DurOp   Op_DbContext   Set  database  context  for   JSectFooter   length   subsequent  operations   offset   JSectHeader  [LSN  7]   fileNo   DurOp   data[length]   DurOp   length   offset   Write  Operation   DurOp   fileNo   data[length]   JSectFooter   length   …   offset   fileNo   data[length]  
  • 17. Journal Performance •  On 99.9% read systems, no impact •  Write performance degraded 5-30% when journal on same drive •  Separate drive as low as 3%
  • 18. Journal Admin •  Journal stored in /dbpath/journal folder •  If faster, three 1gb files may be preallocated •  Can symlink to a different spindle •  --journalCommitInterval* (2ms - 300ms) •  When to journal •  Single node: required for data integrity •  Replica set: at least 1 node •  All nodes: removes possible need to resync
  • 19. Fragmentation •  Files may become fragmented over time if documents change size •  Free lists also contribute to fragmentation •  2.0 reduced scanning to reasonable amounts •  2.2 will change allocation strategy •  Need to re-write free list to do online compaction
  • 20. Compaction •  1.8 and previous: repairDatabase •  2.0+ : compact command •  Currently resets paddingFactor, but can be changed. •  Index (re)generation is now concurrent, so compaction can be N times faster •  Generally causes some extra allocation •  Does not delete or truncate files
  • 21. Planned Changes •  Split data and indexes into different files •  Indexes could by symlinked to a different drive (SSD) •  Improved allocation strategy
  • 22. Download  MongoDB   http://www.mongodb.org/downloads     Ben  Becker   ben.becker@10gen.com