Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Optimizing ForestDB for Flash-based SSD: Couchbase Connect 2015

1,987 views

Published on

As Solid-State Drives (SSDs) support orders of magnitude faster I/O operations than spinning hard disks, they have been replacing spinning disks and are being deployed rapidly in data centers. Consequently, it becomes quite crucial to optimize database storage engines by considering the unique characteristics and performance behaviors of SSDs. In this session, we explain how optimizations for ForestDB work. Learn how you can reduce write amplification and utilize parallel I/O capabilities in ForestDB-, and discuss other optimizations in the roadmap for the next generation storage engine.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Optimizing ForestDB for Flash-based SSD: Couchbase Connect 2015

  1. 1. OPTIMIZING FORESTDB FOR FLASH-BASED SSD Sang-Won Lee Professor, Sungkungkwan University Sundar Sridharan Senior Software Engineer, Couchbase Inc.
  2. 2. ©2015 Couchbase Inc. 2 Contents ▪ Introduction ▪ SHARE Interface in Flash-Based SSD for ForestDB ▪ ForestDB Optimizations at File System Layer ▪ Evaluation Results ▪ Future Work ▪ Summary
  3. 3. ©2015 Couchbase Inc. 3 Introduction ▪ It is all-flash storage era! ▪ Legacy of harddisk era at system softwares ▪ Suboptimal on top of flash storage ▪ ForestDB: next-generation KV engine of Couchbase ▪ Opportunities ▪ Exploit flash storage characteristics (SHARE Interface) ▪ Leverage modern CoW-based file systems
  4. 4. SHARE Interface in Flash-Based SSD for ForestDB
  5. 5. ©2015 Couchbase Inc. 5 Characteristics of Flash Storage (vs. Hard Disk) ▪ No-overwrite and FTL layer ▪ Overwrite is not allowed ▪ Another layer of address mapping inside flash storage ▪ Limited lifetime ▪ Write time in flash storage ~ write amount ▪ Write time in harddisk ~ mechanical disk head movement
  6. 6. ©2015 Couchbase Inc. 6 Copy-on-Write in ForestDB ▪ Document update ▪ Copy-on-Write, instead of in-place-update
  7. 7. ©2015 Couchbase Inc. 7 Copy-On-Write in ForestDB (2) ▪ Why CoW? ▪ 1) Write atomicity and 2) multi-version concurrency control ▪ A reasonable solution in HDD ▪ Problems with CoW in flash storage ▪ Tree-wandering  write amplification  low performance ▪ Flash storage lifetime
  8. 8. ©2015 Couchbase Inc. 8 Opportunities in Flash Storage ▪ Address mapping inside flash storage (by FTL)
  9. 9. ©2015 Couchbase Inc. 9 Opportunities in Flash Storage(2) ▪ SHARE interface: explicit address remapping
  10. 10. ©2015 Couchbase Inc. 10 Opportunities in Flash Storage (3) ▪ ForestDB Compaction with SHARE ▪ No write of valid documents to new file
  11. 11. ©2015 Couchbase Inc. 11 SHARE Implementation ▪ Firmware extension for SHARE ▪ OpenSSD Board (http://www.openssd-project.org/) ▪ Atomic and recoverable
  12. 12. ©2015 Couchbase Inc. 12 Performance Evaluation ▪ Normal time performance: YCSB’s workload-F
  13. 13. ©2015 Couchbase Inc. 13 Performance Evaluation (2) ▪ Compaction performance Elapsed Time (sec) Written Bytes (MB) Original ForestDB 227.5 1126.4 ForestDB with SHARE 88.4 150.6
  14. 14. ForestDB Optimizations at File System Layer
  15. 15. ©2015 Couchbase Inc. 15 Overview ▪ Motivation – the catch-22 ▪ Why B-Tree file system (Btrfs) ▪ How ForestDB solves the catch-22 using Btrfs ▪ Optimizing with Linux Asynchronous library (libaio) ▪ Performance Results
  16. 16. ©2015 Couchbase Inc. 16 Append-Only Key-Value Stores are Great! ▪ Consistency ▪ Stable access to multiple point-in-time snapshots of data ▪ Performance with Isolation ▪ Multi-Version Concurrency Control (MVCC) means readers and writers do not block each other ▪ Recoverability ▪ Can easily rollback entire database to a stable past state ▪ SSD Friendly ▪ Avoids in-place updates and Flash Layer Translations
  17. 17. ©2015 Couchbase Inc. 17 Append-Only KV Stores are Great!
  18. 18. ©2015 Couchbase Inc. 18 MVCC: Readers & Writer Run Unblocked!
  19. 19. ©2015 Couchbase Inc. 19 But... ▪ Disk can fill up with stale data ▪ Need to do garbage collection - Compaction
  20. 20. ©2015 Couchbase Inc. 20 Compactions Do Garbage Collection...
  21. 21. ©2015 Couchbase Inc. 21 Compactions for Garbage Collection
  22. 22. ©2015 Couchbase Inc. 22 What if size of active data exceeds free space available…. A Fundamental Problem with Disk Space Writer appends too much data
  23. 23. ©2015 Couchbase Inc. 23 A Fundamental Problem: Catch-22 “My disk is getting full... I want to free up space but don’t have enough free space to free up space!” Size of Active Data must be strictly lesser than free space available on disk!!
  24. 24. ©2015 Couchbase Inc. 24 B-Tree File System (Btrfs) ▪ Btrfs is a copy-on-write filesystem for Linux ▪ Development began in Oracle in 2007 and marked as stable since August 2014 (http://goo.gl/upukn4) ▪ Industry support from Facebook, Fujitsu, Fusion-IO, Intel, Netgear, Novel/SUSE, Oracle, Red Hat etc ▪ Available as an option in all major Linux distributions
  25. 25. ©2015 Couchbase Inc. 25 Btrfs Features (Short list) ▪ Max file size upto 16 exbibytes (1 exbibyte in ext4) ▪ Self healing due to copy-on-write nature ▪ Online defragmentation ▪ Online volume growth and shrinking ▪ Online block device addition and removal ▪ Block discards for improved wear levelling on SSDs using TRIM ▪ Transparent compression configurable with file or volume ▪ Online data scrubbing ▪ Send/receive of diffs ▪ Snapshots and subvolumes ▪ File Cloning!
  26. 26. ©2015 Couchbase Inc. 26 Btrfs Basics - Representation File P with reference counted extents
  27. 27. ©2015 Couchbase Inc. 27 Btrfs Feature - Copy File Range Copy file range api lets new File “Q” share physical disk extents from File “P”
  28. 28. ©2015 Couchbase Inc. 28 Btrfs Feature - Blocks shared across files Copy-On-Write lets new updates to happen on File Q
  29. 29. ©2015 Couchbase Inc. 29 Btrfs Basics - Deleting File Deleting file Q
  30. 30. ©2015 Couchbase Inc. 30 Btrfs Basics - Freeing up space Freeing up space
  31. 31. ©2015 Couchbase Inc. 31 ForestDB Compaction Using Btrfs Cloning Compaction works by using BTRFS to copy-on-write (clone) valid block-ranges from old file into new file...
  32. 32. ©2015 Couchbase Inc. 32 ForestDB Compaction Using Btrfs Cloning Deleting old file.fdb.0 frees up space only belonging to the stale blocks. Valid blocks of file.fdb.1 stay intact!
  33. 33. Performance Results Ubuntu 14.04, Btrfs v3.12, 4 CPU cores, 20GB SSD drive 8GB DRAM
  34. 34. ©2015 Couchbase Inc. 34 Performance (1) – ForestDB on Btrfs  ~1.25 - 2 X Faster!  ½ write amplification!
  35. 35. ©2015 Couchbase Inc. 35 Performance (2) – ForestDB on Btrfs  ~1.5 - 4 X Faster!  ½ write amplification!
  36. 36. ©2015 Couchbase Inc. 36 Performance (3) – ForestDB on Btrfs  ~2 X Faster!  ½ write amplification!
  37. 37. ©2015 Couchbase Inc. 37 Speeding up Reads with libaio ▪ Modern SSDs have multiple I/O channels ▪ Asynchronous I/O maximizes throughput ▪ Well suited for ForestDB compaction tasks
  38. 38. ©2015 Couchbase Inc. 38 Performance (4) ForestDB on Btrfs with libaio 13X faster! 7X faster! 4X faster!
  39. 39. ©2015 Couchbase Inc. 39 Advantages of Btrfs with libaio ▪ Efficiently uses disk space avoiding the catch-22 ▪ Reduces Write Amplification by 2 times ▪Longer SSD lifespan due to reduced wear ▪ Over 13 X faster compaction speeds ▪ Generic file system layer solution that applies to SSD as well as spinning disks
  40. 40. Future Work
  41. 41. ©2015 Couchbase Inc. 41 Future Work ▪ Optimize Btrfs clone feature for better performance ▪ Working with the Linux Btrfs community ▪ Optimize ForestDB to skip reading if cloning on compaction ▪ Adapt Ext4 file system to add the new system call that allows us to share physical blocks among multiple files
  42. 42. Summary
  43. 43. ©2015 Couchbase Inc. 43 Summary ▪ ForestDB with SHARE interface in SSD ▪ Speeds up compactions by 3X with 10X lower write amplification ▪ ForestDB with Btrfs clone feature in File system layer ▪ Speeds up compactions by 2X with 2X lower write amplification ▪ ForestDB with Btrfs clone feature with Linux libaio ▪ Speeds up compactions by 13X with 2X lower write amplification
  44. 44. ©2015 Couchbase Inc. 44 Questions? Sang-Won Lee, swlee@skku.edu Sundar Sridharan sundar@couchbase.com
  45. 45. ©2015 Couchbase Inc. 45 Initial Load Performance 3x ~ 6x less time
  46. 46. ©2015 Couchbase Inc. 46 Initial Load Performance 4x less write overhead
  47. 47. ©2015 Couchbase Inc. 47 Read-Only Performance 0 5000 10000 15000 20000 25000 30000 1 2 4 8 Operationspersecond # reader threads Throughput ForestDB LevelDB RocksDB 2x ~ 5x
  48. 48. ©2015 Couchbase Inc. 48 Write-Only Performance 0 2000 4000 6000 8000 10000 12000 1 4 16 64 256 Operationspersecond Write batch size (# documents) Throughput ForestDB LevelDB RocksDB - Small batch size (e.g., < 10) is not usually common 3x ~ 5x
  49. 49. ©2015 Couchbase Inc. 49 Write-Only Performance 0 50 100 150 200 250 300 350 400 450 1 4 16 64 256 Writeamplification (Normalizedtoasingledocsize) Write batch size (# documents) Write Amplification ForestDB LevelDB RocksDB ForestDB shows 4x ~ 20x less write amplification
  50. 50. ©2015 Couchbase Inc. 50 Mixed Workload Performance 0 2000 4000 6000 8000 10000 12000 1 2 4 8 Operationspersecond # reader threads Mixed (Unrestricted) Performance ForestDB LevelDB RocksDB 2x ~ 5x

×