Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Indexes: Batman of the data world – Connect Silicon Valley 2017

117 views

Published on

Speaker: Venkat Subramanian

Global Secondary Indexes (GSI) in Couchbase Server have had a phenomenal journey since their inception in 4.0. The indexing team has been at the forefront of innovations at Couchbase, with constant and meticulous outflow of features that solve some of the most important technical challenges so that business decisions could be made faster and with ease. GSIs help in the heavy lifting of data in order for the data access patterns to be streamlined. In this session, we will dive deep into the world of GSI in the realm of Couchbase Server and look at its capabilities, architecture, and management. We will also learn how GSI is an integral part of the Couchbase Data Platform and blends in with N1QL for efficient and performant queries.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Indexes: Batman of the data world – Connect Silicon Valley 2017

  1. 1. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. INDEXES BATMAN OF THE DATA WORLD Venkat Subramanian Product Manager @venkasub
  2. 2. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Library of Congress, Card Division. Washington DC (1919)
  3. 3. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
  4. 4. WHAT PROBLEMS DO INDEXES SOLVE? Lower Query Cost Lower Data Access Cost Lowest Query Latencies Steroids for Queries
  5. 5. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 5 Impact Sample Bucket : travel-sample (#rows: 31,591) Query: select * from `travel-sample` where type = "airline” create index type_idx on `travel-sample`(type); 3980 ms 35.61 ms Without Index With Index 112x faster!
  6. 6. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 6 Explain Plans ..... { "#operator": "PrimaryScan", "index": "#primary", "keyspace": "travel-sample", "namespace": "default", "using": "gsi" }, ....... Without Index ....... { "#operator": "IndexScan2", "index": "type_idx", "index_id": "d8037b4e61d5a6b6", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": ""airline"", "inclusion": 3, "low": ""airline"" } ] } ], "using": "gsi” With Index }, .......
  7. 7. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 7 select * from `travel-sample` where type = "airline"
  8. 8. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 8 select count(1) from `travel-sample` where type = "airline"
  9. 9. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 9 Varieties Primary Named Primary Secondary Composite Functional Array Covered Partial Adaptive
  10. 10. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 10 Impact of Covered Index Sample Bucket : travel-sample (#rows: 31,591) Query: select type, iata from `travel-sample` where type = "airline” create index type_idx on `travel-sample`(type, iata); Without Index With Index 6x faster!
  11. 11. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 11 MDS • Indexes can scale independently from document data • Workloads for different services are isolated • Independent Scalability for best Computational Capacity per Service
  12. 12. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 12 Decoupled / Non-Blocking Data-1 Data-2 Data-3 Idx- 1 Idx- 2 Data Nodes Index Nodes
  13. 13. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. “<<INSERT QUOTE>>”
  14. 14. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 14 Genesis CREATE INDEX idx_field ON bucket_name(field_name); CREATE INDEX `def_city` ON `travel-sample`(`city`)
  15. 15. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 15 Impermanence DROP INDEX idx_field; DROP INDEX `travel-sample`.def_city
  16. 16. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. WHAT’S NEW IN COUCHBASE SERVER 5.0? Performance Unparalleled performance at any scale Agility Unmatched agility & flexibility Manageability The easiest platform to manage
  17. 17. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 17 The Journey Replicas Launch! MOI Covered Indexes 4.1 4.0 4.5 Plasma Push down Oct-2017 v5.0
  18. 18. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. • Memory First • SSD/DRAM Optimized • Lock Free Data Structures • Persistent Snapshots • Data Greater Than Memory (DGM) Plasma – Performant & Efficient Storage Engine for GSI
  19. 19. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Benefits • Lower Latencies • Higher Throughput • Lower Memory and Disk Usage • Lower Write Amplification (longer SSD life) • Lower Initial & Incremental Load Times
  20. 20. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 20 3x lower 8x better Latency Throughput ForestDB ForestDBPlasma Plasma Plasma – Performance With Stale=False Latency : 50x Throughput : 120x
  21. 21. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 21 Plasma – Resource Usage 60% lower 90% lower MEMORY DISK Automatic Upgrade from FDB to Plasma For EE Customers ForestDB PlasmaForestDB Plasma
  22. 22. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 22 Index Types Released In Version Performance DGM Replicas EE CE Plasma 5.0 ✓ ✔ ✔ ✔ ✗ MOI 4.5 ✔ ✗ ✔ ✔ ✗ ForestDB 4.0 ✗ ✗ ✗ ✓ ✔
  23. 23. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 23 Copies of an Index • High Availability • Better Query Throughput
  24. 24. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 24 Equivalent Indexes create index idx1 on bucket(field1) create index idx2 on bucket(field1) create index idx3 on bucket(field1) Node1 Node2 Node3 select * from bucket where field1 is not missing idx1 on field1 idx2 on field1 idx3 on field1
  25. 25. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 25 Equivalent Indexes • Had to manually create index with different names • No placement rules • Index is unusable when nodes go down • USE INDEX • PREPARED STATEMENT • No Rebalance
  26. 26. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 26 create index idx on bucket(field1) with {“num_replica”: 2} (or) create index idx on bucket(field1) with {“nodes”:[“17.2.33.101:8091”, “127.2.33.102:8091”, “127.2.33.103:8091”]} Node1 Node2 Node3 From Equivalent Indexes to Index Replicas create index idx1 on bucket(field1) create index idx2 on bucket(field1) create index idx3 on bucket(field1) Node1 Node2 Node3 idx1 on field1 idx2 on field1 idx3 on field1 idx on field1 idx on field1 idx on field1
  27. 27. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 27 Advantages of Index Replicas Swap Rebalance Active Replicas, Rack/Zone Aware Load Balance Queries Easier Manageability Continue operations under failure conditions Lower TCO
  28. 28. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 28 Range Scan with Limit Range Scan with Offset & Limit Range Scan with Offset & Limit (covered) MOI 4.6 MOI 5.0130x 133x 20x Pagination Optimization – N1QL Throughput (queries/sec)
  29. 29. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 29 Index Pushdowns – N1QL Throughput (queries/sec) Composite Filters (covered) Composite Filters Index Projection Count Distinct MOI 4.5 MOI 5.0 11.5x 30x 2x 1.5x
  30. 30. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. # DEMO
  31. 31. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 31 Operations Scale Up Move to Larger Nodes Scale In Move to Smaller Nodes Scale Out Add Capacity/Nodes Scale Down Remove Capacity/Nodes
  32. 32. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. 32 Business Value of Indexes or How awesome is Batman! • Operates in the Dark • Location aware - Direct access • Does Heavy Lifting; then hands it over to Gordon … err N1QL • Agile and Agility • Keep Gotham clean
  33. 33. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. THANK YOU venkat@couchbase.com @venkasub

×