Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Memory-optimized indexes: how they work – Couchbase Connect 2016

747 views

Published on

Memory-optimized indexes add fast and scalable secondary indexing capability in Couchbase 4.5. The performance of Global Secondary Indexes can now be accelerated by providing more DRAM and CPU cores. To achieve high-performance in-memory indexing, we designed a new storage engine called Nitro from the ground up, taking advantage of many CPU cores and large DRAM capacity available in modern commodity servers. Couchbase Multi-Dimensional Scaling (MDS) also plays a key role in achieving high scalability for secondary indexing. In this presentation, we cover the key architectural innovations and design of the Nitro storage engine used in memory-optimized indexes. We showcase the performance of memory-optimized indexes in terms of indexing latency, throughput, and scalability compared to regular indexes.

Published in: Software
  • Be the first to comment

Memory-optimized indexes: how they work – Couchbase Connect 2016

  1. 1. ©2016 Couchbase Inc. 1 The Couchbase Connect16 mobile app Take our in-app survey!
  2. 2. ©2016 Couchbase Inc. Memory-optimized index how they work Sarath Lakshman Senior Software Engineer, Couchbase 2
  3. 3. ©2016 Couchbase Inc.©2016 Couchbase Inc. Agenda • Architecture of Global Secondary Index • What exactly is Memory-Optimized Index ? • Architecture of Nitro Storage Engine • Scalability and Performance • Operational aspects of Memory-Optimized Index 3
  4. 4. ©2016 Couchbase Inc. 4 Global Secondary Index (GSI) An architecture overview
  5. 5. ©2016 Couchbase Inc.©2016 Couchbase Inc. GSI Overview • Speed up your N1QL queries using fast indexes ordered by secondary JSON fields • Workload isolation and independent scaling for document access/modifications and index operations • Ensure read availability by creating replica indexes • Global Indexes offer scalable performance, while local indexes degrade query performance as more nodes are added due to scatter/gather • Asynchronously updated indexes with high throughput and low latency 5
  6. 6. ©2016 Couchbase Inc.©2016 Couchbase Inc. Multi Dimensional Scaling (MDS) • Indexes can scale independently from document data • Workloads for different services are isolated 6 STORAGE Couchbase Server 1 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service STORAGE Couchbase Server 2 Managed Cache Cluster Manager Cluster Manager Data Service STORAGE Couchbase Server 3 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Data Service STORAGE Couchbase Server 4 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Query Service STORAGE Couchbase Server 5 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Index Service Managed Cache Storage Managed Cache Storage Storage STORAGE Couchbase Server 6 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Index Service Storage Managed Cache Managed Cache
  7. 7. ©2016 Couchbase Inc. 7 Components of GSI • Projector • Transforms document mutations into secondary index items and routes them to index nodes based on index definitions • Indexer • Updates indexes corresponding to the document changes • Provide point-in-time index scan snapshots • Handle index DDLs • GSI Client • Smart client which is aware of global indexes topology • Helps N1QL to interact with GSI indexes • Facilitates index scan operations • Manages scan connections pooling
  8. 8. ©2016 Couchbase Inc.©2016 Couchbase Inc. Indexer update pipeline 8 Index Service Index Port Mutation Queue Extraction Worker Index Queue Storage updater Worker ForestDB/Nitro Update index ONLY IF key has changed {“LastName” : “Adams”, “Phone” : “323-180-9978”} {“LastName” : “Adams”} {“Phone” : “323-180-9978”}
  9. 9. ©2016 Couchbase Inc. 9 Memory Optimized Index An introduction
  10. 10. ©2016 Couchbase Inc.©2016 Couchbase Inc. Why Memory-Optimized index ? • Performance! • Server hardware is constantly evolving with many CPU cores and large amount of DRAM • Single index write performance matters as it has to keep up with the rate of document mutations send by many data service nodes • Data service offers very high write performance demanding fast index updates • Indexes hold small subset of document data (eg., secondary field). Hence, it is possible hold indexes completely in memory • Disk oriented storage engine such as Standard Index are optimized for faster disk access with paging mechanism by assuming that entire dataset cannot fit in memory • Providing more DRAM and CPU cores will not speed up single index performance with standard indexes 10
  11. 11. ©2016 Couchbase Inc.©2016 Couchbase Inc. What exactly is Memory-Optimized Index ? • Memory Resident Index • Throw more DRAM, CPU cores – Can I scale single index performance linearly ?Yes • Designed for high performance and multicore scalability • Fast writes and low latency Index scans • Architecturally very different from disk-oriented storage engines • Fast backup and Recovery on Disk/SSD • Supports index snapshots at 20ms latency (200ms for standard index) • Avoid need for partitioning index for scaling throughput • Written in Golang/C • Every component of the index storage engine can scale seamlessly with many CPU cores 11
  12. 12. ©2016 Couchbase Inc. 12 Nitro Storage Engine The storage engine that powers Memory Optimized Indexes (MOI) A VLDB 2016 paper (Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global Secondary Index)
  13. 13. ©2016 Couchbase Inc.©2016 Couchbase Inc. Design considerations  MultipleWriters for high performance  Utilize the inherent parallelism in the Database Change Protocol (DCP)  Scalable single index write performance by using available CPU cores  Lock-free data structures for high concurrency  Writers and readers never block  Maximize utilization of multicore CPUs  Fast snapshots  Minimize latency for index queries/ reduce staleness of the index  Create read snapshots at the rate of 100/second  Leverage optimizations for memory resident data structures 13
  14. 14. ©2016 Couchbase Inc. 14 Nitro Architecture • Create backups from snapshots and recover nitro after restart/crash • Free items when GCed and not in reference • Remove items from skiplist which belongs to the unused snapshots • Create point-in-time immutable snapshots for index scans • Avoid phantoms and provide scan stability • Manage index snapshot versions in use • Implements Insert, Delete, Lookup, Range Iteration • Concurrent partitioned visitors • Concurrent bottom-up skiplist build
  15. 15. ©2016 Couchbase Inc.©2016 Couchbase Inc. Skiplist • Probabilistic balanced ordered search data structure • Search is similar to binary search over linked-lists (O(logn)) • Item granular operations unlike B+Tree (page oriented) • Lock-free skiplist is implemented by making use of atomic compare-and-swap, atomic-add- fetch 15
  16. 16. ©2016 Couchbase Inc.©2016 Couchbase Inc. Lock-free data structure fundamentals 16
  17. 17. ©2016 Couchbase Inc.©2016 Couchbase Inc. Lock-free data structure fundamentals 17 Step 1: Mark as deleted Step 2: Removal
  18. 18. ©2016 Couchbase Inc.©2016 Couchbase Inc. MultiVersions Management (MVCC) • Define lifetime metadata in each Skiplist node (ie, bornSn and deadSn) • Create Snapshot 1 • Create Snapshot 2 18 V=10 bornSn=1 deadSn=0 V=20 bornSn=1 deadSn=0 V=30 bornSn=1 deadSn=0 V=10 bornSn=1 deadSn=0 V=20 bornSn=1 deadSn=0 V=30 bornSn=1 deadSn=0 V=15 bornSn=2 deadSn=0 V=32 bornSn=2 deadSn=0
  19. 19. ©2016 Couchbase Inc.©2016 Couchbase Inc. MultiVersions Management (MVCC) • Create Snapshot 3 19 V=10 bornSn=1 deadSn=0 V=20 bornSn=1 deadSn=3 V=30 bornSn=1 deadSn=0 V=15 bornSn=2 deadSn=0 V=32 bornSn=2 deadSn=3 V=32 bornSn=3 deadSn=0
  20. 20. ©2016 Couchbase Inc.©2016 Couchbase Inc. MultiVersions Management (MVCC) • Index scan for Snapshot 1 20 V=10 bornSn=1 deadSn=0 V=20 bornSn=1 deadSn=3 V=30 bornSn=1 deadSn=0 V=15 bornSn=2 deadSn=0 V=32 bornSn=2 deadSn=3 V=32 bornSn=3 deadSn=0 Visibility: Iterator (Sn=1)
  21. 21. ©2016 Couchbase Inc.©2016 Couchbase Inc. MultiVersions Management (MVCC) • Index scan for Snapshot 2 21 V=10 bornSn=1 deadSn=0 V=20 bornSn=1 deadSn=3 V=30 bornSn=1 deadSn=0 V=15 bornSn=2 deadSn=0 V=32 bornSn=2 deadSn=3 V=32 bornSn=3 deadSn=0 Visibility: Iterator (Sn=2)
  22. 22. ©2016 Couchbase Inc.©2016 Couchbase Inc. MultiVersions Management (MVCC) • Index scan for Snapshot 3 22 V=10 bornSn=1 deadSn=0 V=20 bornSn=1 deadSn=3 V=30 bornSn=1 deadSn=0 V=15 bornSn=2 deadSn=0 V=32 bornSn=2 deadSn=3 V=32 bornSn=3 deadSn=0 Visibility: Iterator (Sn=3)
  23. 23. ©2016 Couchbase Inc. 23 Nitro MVCC vs Copy-On-Write B+Tree MVCC • A single item update to leaf node performs copy-on-write of the entire block (Eg. 4kb) • Since B+Tree has hierarchical structure, it also results in copy-on-write of all parent blocks recursively until the root block causing significant storage overhead (wandering tree problem) • Write optimized storage engines tries to amortize this cost by batching updates • Large batch sizes cause larger snapshot interval • Nitro has fixed storage overhead per item • Snapshotting is a lightweight operation
  24. 24. ©2016 Couchbase Inc.©2016 Couchbase Inc. Garbage Collection 24 V=1 bornSn=1 deadSn=2 V=2 bornSn=2 deadSn=0 V=3 bornSn=1 deadSn=0 V=4 bornSn=1 deadSn=2 V=5 bornSn=2 deadSn=3 V=6 bornSn=3 deadSn=0 V=7 bornSn=4 deadSn=0 V=8 bornSn=1 deadSn=0 V=9 bornSn=3 deadSn=0 V=10 bornSn=3 deadSn=4 Sn=1 Sn=2 Sn=3 Sn=4 Concurrent GC Concurrent SMR Garbage Collection Snapshot List rfcnt=0 rfcnt=1 rfcnt=0 rfcnt=2 V=1
  25. 25. ©2016 Couchbase Inc.©2016 Couchbase Inc. Safe Memory Reclamation • Early and alive accessors can potentially hold references to GCed items • Freeing GCed items/nodes can cause dangling references • The memory reclaimer has to make sure that no accessor is holding reference to GCed items • This problem does not occur with garbage collected languages • A lock-free SMR algorithm takes care of safe freeing of resources • Details of the SMR algorithm is available in the NitroVLDB16 paper 25
  26. 26. ©2016 Couchbase Inc.©2016 Couchbase Inc. Nitro Backup 26 File-1 Backup worker-1 Backup worker-2 Backup worker-3 File-2 File-3 GC Delta files non-intrusive backup
  27. 27. ©2016 Couchbase Inc. 27 Nitro Recovery • Concurrent bottom-up skiplist build • Avoids unnecessary CAS conflicts during concurrent insert • Snapshot number starts from Sn=1 • Once build is complete, additional items are inserted by replaying inserts from delta files concurrently File-1 File-2 File-3
  28. 28. ©2016 Couchbase Inc.©2016 Couchbase Inc. Benefits of Nitro • Lock-free operations allows storage engine to scale seamlessly with multicore CPUs • Single index performance can be scaled by assigning more update workers • The Nitro MVCC model provides fixed storage overhead per update/insert operation • Fast snapshotting capability allows very low indexing latency between Data service and Index service • Nitro provides a scalable lock-free garbage collector and safe memory reclaimer • Nitro features a scalable online concurrent backup and fast recovery mechanism 28
  29. 29. ©2016 Couchbase Inc. 29 Nitro GSI Integration
  30. 30. ©2016 Couchbase Inc. 30 GSI Data Structures  The storage engine needs to maintain two storage structures:  Reverse map  Index  Reverse map is used to lookup and remove previous index entry for the docid during the update  Index store maintains ordered index entries used by index scans
  31. 31. ©2016 Couchbase Inc. 31 Memory Optimized Index update pipeline  Scalable write performance using multiple writers  Simple hash table used for reverse map instead of Nitro (Avoid concurrency overheads)  Periodic backup persists only (indexItem, docid)  The reverse map can be reconstructed on the fly during recovery  End-to-end Indexing latency ~20ms HT Nitro INDEX hash(docid) % n writer-1 HT writer-2 HT writer-n .. Index Scan
  32. 32. ©2016 Couchbase Inc. 32 Storage Optimizations HT Nitro INDEX DocID Indexed Item emp_005 MountainView emp_008 Sunnyvale Index Entry MountainView:emp_005 Sunnywale:emp_008 CRC32 Hash Node Pointers hash1 hash2  Direct pointers from hash table to index entry  Storage needed for index maintenance reduced ~50%  Index item delete cost reduced from O(logn) to O(1)  Optimized multi-entry indexing from single document
  33. 33. ©2016 Couchbase Inc. 33 Performance & Scalability Lets us see the numbers!
  34. 34. ©2016 Couchbase Inc.©2016 Couchbase Inc. Nitro performance • Almost linear scaling of throughput with number of cores 34 Insert benchmark Lookup benchmark
  35. 35. ©2016 Couchbase Inc.©2016 Couchbase Inc. Nitro performance • Partitioning is not required to scale single index performance 35 Get with background Inserts Throughput scalability with partitions
  36. 36. ©2016 Couchbase Inc.©2016 Couchbase Inc. Memory Optimized Index vs Standard Index – End-to-End • 4 Data nodes, 1 Index node, 32 cores CPU (Intel(R) Xeon(R) E5-2630 v3 @ 2.40GHz) • Index service node keeps up with mutations from 4 Data service nodes 36 Operation Throughput Insert 1,658,031 Update 822,680 Delete 1,578,316 GSI index server throughput (items/sec) Single Index benchmark MOI Write Throughput = 1.6M/s 800k/s
  37. 37. ©2016 Couchbase Inc.©2016 Couchbase Inc. Nitro recovery performance 37
  38. 38. ©2016 Couchbase Inc. 38 Memory-Optimized Index Operational perspective
  39. 39. ©2016 Couchbase Inc.©2016 Couchbase Inc. Operational Aspects • Memory-Optimized Index can be configured using cluster-wide setting • What happens when an index node runs out of memory ? • What happens to the indexes once Couchbase Server is restarted ? • What is the recommended DRAM/CPU configuration for using MOI ? 39
  40. 40. ©2016 Couchbase Inc.©2016 Couchbase Inc. Summary • Couchbase GSI allows to scale data services and index services independently with workload isolation • Couchbase 4.5 features Memory-Optimized Indexes which can provide superior index performance by seamlessly scaling with many CPU cores and large amount of DRAM • Introduced Nitro storage engine with following features: • Multiple writers and lock-free operations • Fast snapshotting with lightweight MVCC and concurrent garbage collector • Concurrent non-intrusive fast backup and restore • Memory-Optimized Index leverages storage optimizations to reduce memory consumption for the index as well as generates compact file backups • Showcased Nitro and Memory-Optimized Index end-to-end performance • It takes only few minutes to build large indexes! • For more details on Nitro, refer NitroVLDB16 paper (http://www.vldb.org/pvldb/vol9/p1413- lakshman.pdf) 40
  41. 41. ©2016 Couchbase Inc. ThankYou! 41
  42. 42. ©2016 Couchbase Inc. 42 Share your opinion on Couchbase 1. Go here: http://gtnr.it/2eRxYWn 2. Create a profile 3. Provide feedback (~15 minutes)

×