MongoDB Journaling and the Storage Enginer
 

Like this? Share it with your network

Share

MongoDB Journaling and the Storage Enginer

on

  • 3,013 views

 

Statistics

Views

Total Views
3,013
Views on SlideShare
3,013
Embed Views
0

Actions

Likes
4
Downloads
41
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MongoDB Journaling and the Storage Enginer Presentation Transcript

  • 1. 1  
  • 2. Directory Layout•  Separate files per database•  Aggressive preallocation•  Files contain one or more extents -rw------- 1 ben ben 64M May 1 19:14 test.0! -rw------- 1 ben ben 128M May 1 19:14 test.1! -rw------- 1 ben ben 256M May 1 18:25 test.2! -rw------- 1 ben ben 512M May 1 19:14 test.3! -rw------- 1 ben ben 1.0G May 1 19:14 test.4! -rw------- 1 ben ben 2.0G May 1 18:58 test.5! -rw------- 1 ben ben 16M May 1 19:14 test.ns! 2  
  • 3. Memory Mapping 0x7fffffffffff   STACK! …! LIBS! …! test.ns! Disk   test.0! test.1! …! ! …! HEAP! {  …  }   MONGOD! NULL! 0x0   Document   Process  Virtual  Memory  
  • 4. Data Structures•  DiskLoc •  Stores file number and offset of data on disk •  Record *r = mmap base + DiskLoc.offset! •  Max offset is 2^31 (2GB)!•  NamespaceDetails •  Stores collection metadata!•  Extent! •  Stores contiguous blocks within a namespace •  Max extent size is 2GB  •  Record! •  Holds a BSON document or B-tree bucket •  DeletedRecord overwrites a Record! •  Includes Padding
  • 5. Namespace Details•  Holds metadata about a collection or index•  Stored in 1KB buckets in <dbname>.ns file•  .ns file fixed size of 16MB•  Maintains document count•  Contains heads of linked lists NamespaceDetails   firstExtent   lastExtent   _indexes[]   stats   freeList[]  
  • 6. Extent Structure Extent   Extent   length   length   xNext   xNext   xPrev   xPrev   firstRecord   firstRecord   lastRecord   lastRecord  
  • 7. Extents>  db.foo.validate(  {  full  :  true  }  ).extents.forEach(                      function(z){  print(  z.loc  +  "tt"  +  z.size  );  }  )  0:3000    20480  0:12000    81920  0:26000    327680  0:76000    1310720  0:1da000  5242880  0:76a000  6291456  0:d6a000  7553024  0:16de000  9064448  0:1f83000  10878976  0:29e3000  13058048  1:2000    15671296  1:ef4000  18808832  1:29e4000  22573056  
  • 8. Index Extents>  db.system.namespaces.find()  {  "name"  :  "test.foo"  }  {  "name"  :  "test.system.indexes"  }  {  "name"  :  "test.foo.$_id_"  }    >  db["foo.$_id_"].validate(  {  full  :  true  }  ).extents.forEach(                      function(z){  print(  z.loc  +  "tt"  +  z.size  );  }  )  0:9000    36864  0:1b6000  147456  0:6da000  589824  0:149e000  2359296  1:20e4000  9437184  
  • 9. Extents and RecordsExtent   length   xNext   Data  Record   xPrev   length   Document   {     rNext   firstRecord      _id:  “foo”,      ...     rPrev   }   lastRecord  
  • 10. Extents and RecordsExtent   length   xNext   Data  Record   xPrev   length   Document   {     rNext   firstRecord      _id:  “foo”,      ...     rPrev   }   lastRecord  
  • 11. Extents and RecordsExtent   length   xNext   Data  Record   Data  Record   xPrev   length   Document   length   Document   {     {     rNext   rNext   firstRecord      _id:  “foo”,      _id:  “foo”,      ...        ...     rPrev   }   rPrev   }   lastRecord  
  • 12. BSON Format {  hello:  “world”  }   Doc  Length   Value  Type   x16x00x00x00 x02hellox00 ! x06x00x00x00 worldx00x00! Value  Length  
  • 13. Index ExtentsExtent   length   Index  Record   Index  Record   xNext   xPrev   length   Bucket   length   Bucket   parent   parent   rNext   rNext   firstRecord   numKeys   numKeys   rPrev   K         rPrev     lastRecord   {  Document  }  
  • 14. Index Extents   4   9   1   3   5   6   8   A   BExtent   length   Index  Record   Index  Record   xNext   xPrev   length   Bucket   length   Bucket   parent   parent   rNext   rNext   firstRecord   numKeys   numKeys   rPrev   K         rPrev     lastRecord   {  Document  }  
  • 15. Journaling•  Write ahead logging•  Operations written to journal before memory mapped regions •  Private view •  Shared view•  Once journal written, data safe unless hardware problem•  By default, journal flushed every 100ms, 100mb of writes, or on write concern of j=true •  User configurable with --journalCommitInterval
  • 16. Journal FormatJHeader   •  Section  contains  single  group  commit  JSectHeader  [LSN  3]   •  Applied  all-­‐or-­‐nothing   DurOp   DurOp   DurOp   Op_DbContext   Set  database  context  for  JSectFooter   length   subsequent  operations   offset  JSectHeader  [LSN  7]   fileNo   DurOp   data[length]   DurOp   length   offset   Write  Operation   DurOp   fileNo   data[length]  JSectFooter   length  …   offset   fileNo   data[length]  
  • 17. Journal Performance•  On 99.9% read systems, no impact•  Write performance degraded 5-30% when journal on same drive•  Separate drive as low as 3%
  • 18. Journal Admin•  Journal stored in /dbpath/journal folder•  If faster, three 1gb files may be preallocated•  Can symlink to a different spindle•  --journalCommitInterval* (2ms - 300ms)•  When to journal •  Single node: required for data integrity •  Replica set: at least 1 node •  All nodes: removes possible need to resync
  • 19. Fragmentation•  Files may become fragmented over time if documents change size•  Free lists also contribute to fragmentation •  2.0 reduced scanning to reasonable amounts •  2.2 will change allocation strategy •  Need to re-write free list to do online compaction
  • 20. Compaction•  1.8 and previous: repairDatabase•  2.0+ : compact command •  Currently resets paddingFactor, but can be changed. •  Index (re)generation is now concurrent, so compaction can be N times faster•  Generally causes some extra allocation •  Does not delete or truncate files
  • 21. Planned Changes•  Split data and indexes into different files•  Indexes could by symlinked to a different drive (SSD)•  Improved allocation strategy
  • 22. Download  MongoDB  http://www.mongodb.org/downloads     Ben  Becker   ben.becker@10gen.com