Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoDB WiredTiger Internals

10,340 views

Published on

This presentation drills down into the architectural design of MongoDB's storage layer api and detailed view of WiredTiger.

Published in: Software
  • Writing a good research paper isn't easy and it's the fruit of hard work. For help you can check writing expert. Check out, please ⇒ www.HelpWriting.net ⇐ I think they are the best
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Writing a good research paper isn't easy and it's the fruit of hard work. For help you can check writing expert. Check out, please ⇒ www.WritePaper.info ⇐ I think they are the best
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • You should find best essay writing services with great knowledge at reasonable price.because you are a student so you have to find a writer with academic knowledge. I recommend HelpWriting.net for best services.try this.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ♥♥♥ http://bit.ly/369VOVb ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ♥♥♥ http://bit.ly/369VOVb ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

MongoDB WiredTiger Internals

  1. 1. WiredTiger Internals
  2. 2. 2 I'm Excited
  3. 3. 3 Benefits
  4. 4. 4 Performance!
  5. 5. 5 Compression in Action (Flights database, ht Asya)
  6. 6. 6 Agenda Storage Engine API WiredTiger Architecture Overall Benefits
  7. 7. http://www.livingincebuforums.com/ipb/uploads/monthly_10_2011/post-198-0-67871200-1318223706.jpg Pluggable Storage Engine API
  8. 8. 8 MongoDB Architecture Content Repo IoT Sensor Backend Ad Service Customer Analytics Archive MongoDB Query Language (MQL) + Native Drivers MongoDB Document Data Model MMAP V1 WT In-Memory ? ? Supported in MongoDB 3.0 Future Possible Storage Engines Management Security Example Future State Beta
  9. 9. 9 Storage Engine API •  Allows to "plug-in" different storage engines –  Different use cases require different performance characteristics –  mmapv1 is not ideal for all workloads –  More flexibility •  Can mix storage engines on same replica set/sharded cluster •  Opportunity to integrate further ( HDFS, native encrypted, hardware optimized …)
  10. 10. Storage Engines
  11. 11. 2 Storage Engines Available … and more in the making! MMAPv1 WiredTiger
  12. 12. 12 MongoDB Architecture Content Repo IoT Sensor Backend Ad Service Customer Analytics Archive MongoDB Query Language (MQL) + Native Drivers MongoDB Document Data Model MMAP V1 WT In-Memory ? ? Supported in MongoDB 3.0 Future Possible Storage Engines Management Security Example Future State Beta
  13. 13. MMAPv1 https://angrytechnician.files.wordpress.com/2009/05/memory.jpg
  14. 14. 14 MMAPv1
  15. 15. 15 MMAPv1 •  Improved concurrency control •  Great performance on read-heavy workloads •  Data & Indexes memory mapped into virtual address space •  Data access is paged into RAM •  OS evicts using LRU •  More frequently used pages stay in RAM 3.0 Default
  16. 16. http://cdn.theatlantic.com/static/infocus/ngt051713/n10_00203194.jpg WiredTiger
  17. 17. mongod --storageEngine wiredTiger
  18. 18. 18 What is WiredTiger? •  Storage engine company founded by BerkeleyDB alums •  Recently acquired by MongoDB •  Available as a storage engine option in MongoDB 3.0
  19. 19. 19 Motivation for WiredTiger •  Take advantage of modern hardware: –  many CPU cores –  lots of RAM •  Minimize contention between threads –  lock-free algorithms, e.g., hazard pointers –  eliminate blocking due to concurrency control •  Hotter cache and more work per I/O –  compact file formats –  compression
  20. 20. WiredTiger Architecture
  21. 21. WiredTiger Architecture WiredTiger Engine Schema & Cursors Python API C API Java API Database Files Transactions Page read/write Logging Column storage Block management Row storage Snapshots Log Files Cache
  22. 22. WiredTiger Sessions Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal Your App Driver connection (mongod) Session Cursor Session Session Cursor Cursor
  23. 23. WiredTiger Sessions Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal Your App Driver •  3 main purposes •  Schema Operations •  Transaction Management •  Cursor Creation
  24. 24. WiredTiger Schema & Cursors Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal db.createCollection("somecollection") db.somecollection.drop() db.somecollection.find({...}) db.somecollection.remove({_id:111}) db.somecollection.stats() db.somecollection.explain().find({_id:111})
  25. 25. WiredTiger Schema & Cursors Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write Logging Cache Schema & Cursors Row Storage Block Management DB Files Log Files •  Schema Operations •  create, drop, truncate, verify … •  Cursors •  CRUD •  Data iteration •  Statistics
  26. 26. Collection WiredTiger Transactions Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal COMMAND INSERT { "_id": 111, "name": "Spock", "message": "Live Long and Prosper!" } Index {_id: 1} Index {name: 1} Index {message: "text"}
  27. 27. WiredTiger Transactions Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal •  Transaction per Session close, open, commit, rollback •  Ensures the atomicity/consistency of MongoDB operations •  Updates / Writes documents •  Updates indexes This is not multi-document transaction support!
  28. 28. In-memory Performance
  29. 29. Traditional B+tree (ht wikipedia)
  30. 30. Trees in cache non-resident child ordinary pointer root page internal page internal page root page leaf page leaf page leaf page leaf page
  31. 31. Cache Page in Disk root page leaf page Disk Page not in memory File Read
  32. 32. Cache Page in Memory root page Internal Page Internal Page Internal Page Internal Page Leaf Page Leaf Page Leaf Page Leaf Page
  33. 33. Traditional B-tree root page Internal Page Internal Page Internal Page Internal Page Leaf Page Leaf Page Leaf Page Leaf Page
  34. 34. Deadlock Avoidance root page Internal Page Internal Page Internal Page Internal Page Leaf Page Leaf Page Leaf Page Leaf Page
  35. 35. Cache Page in Memory root page Internal Page Internal Page Internal Page Internal Page Leaf Page Leaf Page Leaf Page Leaf Page Doc1Doc0 Doc5Doc2 Doc3 Doc4 Writer Reader Standard pointers between tree levels instead of traditional files system offset pointers
  36. 36. Page in Cache Disk Page images Page images Page images on-disk Page Image index Update on-disk Page Image index Updates db.collection.insert( {"_id": 123}) Clean Page Dirty Page Eviction Thread
  37. 37. Page in Cache Disk Page images Page images Page images on-disk Page Image index Update on-disk Page Image index Updates db.collection.insert( {"_id": 123}) Clean Page Dirty Page Index is build during read
  38. 38. Page in Cache Disk Page images Page images Page images on-disk Page Image index Updates' db.collection.insert( {"_id": 123}) Dirty Page Updates are held in a skip list Updates'' db.collection.insert( {"_id": 125}) Updates''' db.collection.insert( {"_id": 124}) Time
  39. 39. Reconciliation Disk Page images Page images Page images on-disk Page Image index Updates' Dirty Page Eviction Thread Updates'' Updates''' Reconciliation Cache full After X operations On a checkpoint
  40. 40. 40 In-memory performance •  Trees in cache are optimized for in-memory access •  Follow pointers to traverse a tree – no locking to access pages in cache •  Keep updates separate from clean data •  Do structural changes (eviction, splits) in background threads
  41. 41. 41 Takeaways •  Page and tree access is optimized •  Allows concurrent operations – updates skip lists •  Tuning options that you should be aware of: –  storage.wiredTiger.engineConfig.cacheSizeGB •  You can determine the space allocated for your cache •  Makes your deployment and sizing more predictable https://docs.mongodb.org/manual/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB Wiredtiger cache! MongoDB uses more memory than just the Storage Engine needs!
  42. 42. Document-level Concurrency
  43. 43. 43 What is Concurrency Control? •  Computers have – multiple CPU cores – multiple I/O paths •  To make the most of the hardware, software has to execute multiple operations in parallel •  Concurrency control has to keep data consistent •  Common approaches: – locking – keeping multiple versions of data (MVCC)
  44. 44. MVCC on-disk Page Image index WRITE B txn(1) Updates'''Updates' WRITE A txn(2) Updates'' Time WRITE A txn(3) Conflict Detection!
  45. 45. 45 Multiversion Concurrency Control (MVCC) •  Multiple versions of records kept in cache •  Readers see the committed version before the transaction started – MongoDB “yields” turn large operations into small transactions •  Writers can create new versions concurrent with readers •  Concurrent updates to a single record cause write conflicts – MongoDB retries with back-off
  46. 46. MVCC on-disk Page Image index Updates' WRITE B txn(1) Updates'' Updates''' WRITE A txn(2) Time WRITE A txn(3) READ A
  47. 47. Compression
  48. 48. 48 WiredTiger Page IO Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal on-disk Page Image index Updates''' Disk Page Reconciliation Page Allocation Splitting
  49. 49. 49 WiredTiger Page IO Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal on-disk Page Image index Updates''' Disk Page' Reconciliation Page Allocation Splitting Page''
  50. 50. 50 WiredTiger Page IO Document Data Model WiredTiger Engine Transactions Snapshosts Page read/write WAL Cache Schema & Cursors Row Storage Block Management DB Files Journal on-disk Page Image index Updates''' Disk Reconciliation Page Allocation Splitting Page Compression -  snappy (default) -  zlib -  none
  51. 51. 51 Compression •  WiredTiger uses snappy compression by default in MongoDB •  Supported compression algorithms: –  snappy [default]: good compression, low overhead –  zlib: better compression, more CPU –  none •  Indexes also use prefix compression –  stays compressed in memory
  52. 52. 52 Checksums •  A checksum is stored with every uncompressed page •  Checksums are validated during page read –  detects filesystem corruption, random bitflips •  WiredTiger stores the checksum with the page address (typically in a parent page) –  extra safety against reading a stale page image
  53. 53. 53 Compression in Action (Flights database, ht Asya)
  54. 54. 54 Takeaway •  Main feature that impacts the vast majority of MongoDB projects / use cases •  CPU bound •  Different algorithms for different workloads –  Collection level compression tuning •  Smaller Data Faster IO –  Not only improves the disk space footprint –  also IO –  and index traversing
  55. 55. Other Gems of WiredTiger
  56. 56. 56 Index per Directory •  Use different drives for collections and indexes –  Parallelization of write operations –  MMAPv1 only allows directoryPerDatabase mongod Collection A Index "name " Collection B Index "_id"
  57. 57. 57 Consistency without Journaling •  MMAPv1 uses write-ahead log (journal) to guarantee consistency •  WT doesn't have this need: no in-place updates –  Write-ahead log committed at checkpoints and with j:true –  Better for insert-heavy workloads –  By default journaling is enabled! •  Replication guarantees the durability
  58. 58. What's Next
  59. 59. 59
  60. 60. 60 What’s next for WiredTiger? •  Tune for (many) more workloads – avoid stalls during checkpoints with 100GB+ caches – make capped collections (including oplog) more efficient •  Adding encryption •  More advanced transactional semantics in the storage engine API
  61. 61. 61 Updates and Upgrades •  Can not –  Can't copy database files –  Can't just restart w/ same dbpath •  Yes we can! –  Initial sync from replica set works perfectly! –  mongodump/restore •  Rolling upgrade of replica set to WT: –  Shutdown secondary –  Delete dbpath –  Relaunch w/ --storageEngine=wiredTiger –  Wait for resync –  Rollover
  62. 62. https://tingbudongchine.files.wordpress.com/2012/08/lemonde1.jpeg Summary
  63. 63. 63 MongoDB 3.0 •  Pluggable Storage Engine API •  Storage Engines •  Large Replica Sets •  Big Polygon •  Security Enhancements – SCRAM •  Audit Trail •  Simplified Operations – Ops Manager •  Tools Rewrite
  64. 64. 64 3.2 is coming!
  65. 65. 65 Performance! http://www.mongodb.com/lp/white-paper/benchmark-report
  66. 66. http://cl.jroo.me/z3/v/D/C/e/a.baa-Too-many-bicycles-on-the-van.jpg Norberto Leite Technical Evangelist norberto@mongodb.com @nleite Obrigado!

×