TUNING N1QL QUERY PERFORMANCE & SCALE
IN COUCHBASE SERVER 4.0
Cihan Biyikoglu
Dir. Product Management
1
©2015 Couchbase Inc. 2
Goals
 Deeper look at query performance and scale
 Look at Query and Index Service Scale Characteristics
 Understand Query Execution Flow
 Understand Index Usage
 Tune queries with a few techniques
©2015 Couchbase Inc. 3
Agenda
 Part I - Architectural Overview
 New Cluster Architecture with Couchbase Server 4.0
 Query Processing & Indexing
 Part II - Optimizing Queries
 Execution Plans and Operators
 Optimizing Queries - Filtering, Index Selection and Joins
 OptimizingApps - Consistency Dials
 QA
Demos & More Demos…
©2015 Couchbase Inc. 4
Disclaimer
Couchbase Server 4.0 and ForestDB are still
in development and the final version of the
products may not be identical in details
discussed on this session.
Architecture Overview
Part I
©2015 Couchbase Inc. 6
Couchbase Server Cluster Architecture
STORAGE
Couchbase Server 1
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 2
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 3
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 4
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 5
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 6
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
Managed Cache
Storage
Managed Cache
Storage
Managed Cache
Storage
Managed Cache
Storage
Managed Cache
Storage
©2014 Couchbase Inc.
Couchbase Server Cluster Architecture
STORAGE
Couchbase Server 1
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Managed Cache
Storage
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 2
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 3
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 4
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 5
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 6
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
ManagerCluster
Manager
Data Service
Index Service
Query Service
Managed Cache
Storage
Managed Cache
Storage
Managed Cache
Storage
Managed Cache
Storage
Managed Cache
Storage
Query Processing Overview
©2015 Couchbase Inc. 9
Query Execution
 Submitting Queries in N1QL
 Stateless Connectivity through REST
 Load-Balance across Query Service nodes
 Prepared vs Ad-hoc Query Execution
 Consistency Dials – more on this later…
©2015 Couchbase Inc. 10
Query Execution
 Parallelization factor is #cores on Query Service Node
Execution Flow
©2015 Couchbase Inc. 11
Query Service - Capacity Management
Scaling the Query Service
 Pro: Load Balance Queries across all nodes
 Con: Compete with Index and DataWorkloads
Index Service
Couchbase Cluster
Query Service
Data Service
node1 node8
©2015 Couchbase Inc. 12
Query Service - Capacity Management
Scaling the Query Service
 Added CPU: higher intra-query parallelization
 Added RAM: improved caching with larger result sets
 Added Node: better availability and load balancing
Couchbase Cluster
node1 node8
Data ServiceIndex Service
Query Service
Indexing Overview
©2015 Couchbase Inc. 14
Indexing in Couchbase Server 4.0
 Multiple Indexers
 GSI – Index Service
New indexing for N1QL for low latency queries without compromising on mutation performance
(insert/update/delete)
Independently partitioned and independently scalable indexes in Indexing Service
 Map/ReduceViews – Data Service
Powerful programmable indexer for complex reporting and indexing logic.
Full partition alignment and paired scalability with Data Service.
 SpatialView – Data Service
Incremental R-tree indexing for powerful bounding-box queries
Full partition alignment and paired scalability with Data Service
New
Index Scan
©2015 Couchbase Inc. 15
Which to choose – GSI vsViews
Workloads New GSI in v4.0 Map/ReduceViews
Complex
Reporting
Just InTime Pre-aggregated
Workload
Optimization
Optimized for Scan Latency &
Throughput
Optimized for Insertion
Flexible
Index Logic
N1QL Functions Javascript
Secondary
Lookups
Single Node Lookup Scatter-Gather
Tunable
Consistency
Staleness false or ok or
everything in between
Staleness false or ok
©2015 Couchbase Inc. 16
Which to choose – GSI vsViews
Capabilities New GSI in v4.0 Map/ReduceViews
Partitioning Model Independent – Indexing Service Aligned to Data – Data Service
Scale Model Independently Scale Index Service Scale with Data Service
Fetch with Index Key Single Node Scatter-Gather
Range Scan Single Node Scatter-Gather
Grouping,Aggregates With N1QL Built-in withViews API
Caching Managed Not Managed
Storage ForestDB Couchstore
Availability Multiple Identical Indexes load
balanced
Replica Based
©2015 Couchbase Inc. 17
Query Service - Capacity Management
Scaling the Index Service
 Pro: Load balance scans across all nodes
 Con: Compete with Query and DataWorkloads
Index Service
Couchbase Cluster
Query Service
Data Service
node1 node8
©2015 Couchbase Inc. 18
Index Service Capacity Management
Scaling the Index Service
 Added RAM: better caching of indexes
 Added CPU: faster index maintenance & parallelized index scans
 Add Faster IO Path: faster index persistence
 Added Node: better availability and load balancing
Couchbase Cluster
node1 node8
Data Service
Index Service
Query Service
Optimizing Queries
Part II
©2015 Couchbase Inc. 20
Execution Plans & Explain
 EXPLAIN query
 Plan is assembled into an execution flow expressed through the
operators
 Operators stream results up and down the stream
Sequence Parallel
Primary
Scan
Initial
Project
Fetch
Initial
ProjectFetch
Initial
ProjectFetch
…
Limit
©2015 Couchbase Inc. 21
Operators
 Main Operations
 Scans
PrimaryScan: Scan of the Primary Index based on document keys
IndexScan: Scan of the Secondary Index based on a predicate
 Fetch
Fetch: Reach into the Data service with a document key
 Projection Operations
InitialProject: reducing the stream size to the fields involved in query.
FinalProject: final shaping of the result to the requested JSON shape
©2015 Couchbase Inc. 22
Operators cont.
 Operator Assembly
Parallel: execute all child operations in parallel
Sequence: execute child items in a sequence
 Filtering Operators
Filter:Apply a filter expression (ex.WHERE field = “value”)
Limit: limit the number of items returned to N
Offset: start returning items from a specified item count
©2015 Couchbase Inc. 23
Operators cont.
 Join Operators
Join: Join left and right keyspaces on attributes and document key
Unnest: Join operation between a parent and a child with a nested
array where parent is repeated for each child array item.
Nest:Grouping operation between a parent and a child array where child
array is embedded into the parent.
DEMO
Execution Plans
Demo #1
CommonTechniques forTuning Queries
©2015 Couchbase Inc. 26
Minimize Items Scanned
 Primary Index Scan vs. Index Scan
 Primary Index can only filter on document keys thus typically means
“full-scan” of the bucket
 Secondary Index is typically done with predicates and are smaller in
size thus better to scan
Index Selection: Based on matching expressions matching in Index andWHERE clause
DEMO #2
SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;
Vs.
CREATE INDEX i_type on `beer-sample`(type) USING GSI;
SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;
©2015 Couchbase Inc. 27
Minimize Items Scanned
 HINT index usage to queries
 There can be multiple indexes with to choose from and you can hint
index choice to us.
SELECT name,updated FROM `beer-sample` USE INDEX(i_type using gsi)
WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;
©2015 Couchbase Inc. 28
Minimize Items Scanned
 Limit & Filters help eliminate rows early in the execution plan
 With Limit, Upstream operators are signaled to stop by limit when enough
rows accumulate
 Ex: Remember to Filter on Document type with buckets that contain
multiple types.
DEMO #3
SELECT b1.name as beer_name, b2.name as brewery_name, b2.country
FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 on KEYS b1.brewery_id
WHERE abv>0;
vs
SELECT b1.name as beer_name, b2.name as brewery_name, b2.country
FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 on KEYS b1.brewery_id
WHERE b1.type="beer” and abv>0;
©2015 Couchbase Inc. 29
Joins
 Joins are efficient by nature
 Left hand value is joined to the right hand document key with nested
loop.
Query: Get brewery location for each beer:
SELECT …
FROM `beer-sample` AS b1
JOIN `beer-sample` AS b2 on KEYS b1.brewery_id
WHERE b1.type="beer”;
For each document with type=“beer” take b1.brewery_id and look for and
equal document key in b2.
Optimizing Applications
©2015 Couchbase Inc. 31
New Consistency Settings!
 View Stale-ness
 Ok: unbounded – query what’s available in the index/view now
 False: query after all changes up to the request timestamp (and
maybe more) has been indexed for a given index or view.
 New Indexes with Couchbase Server 4.0
 Improves granularity of the consistency logical-timestamp.
 New: ScanConsistency can be set to any logical timestamp
Indicate stale=false to stale=ok and everything in between
©2015 Couchbase Inc. 32
Flexible Consistency Settings
 Time
t1 insert (k1, v1)
…
t2 do other business logic computation
…
t3 issue query/read on (k1,v1) with t3 vs t1
Catch up all the indexes
to t3 and then issue query
Identical to “stale=false”
Catch up all the indexes
to t1 and then issue query
Improved efficiency over
“stale=false”
Recap
©2015 Couchbase Inc. 34
Recap
 New Unique Query and Indexing Architecture
 Workload isolation with MDS gives you a great performance and scale
advancement.
 Familiar Concepts from your past life will help tune queries
 Understand Execution Plans
 Understand Indexes and Index Selection
 Filter & Limit aggressively
 Understand JOINs
 Use powerful new Consistency Dials for best efficiency
Couchbase.com/beta
Q&A
Cihan Biyikoglu
cihan@couchbase.com
@cihangirb
Thank you.

Tuning N1QL Query Performance with Couchbase Server 4.0

  • 1.
    TUNING N1QL QUERYPERFORMANCE & SCALE IN COUCHBASE SERVER 4.0 Cihan Biyikoglu Dir. Product Management 1
  • 2.
    ©2015 Couchbase Inc.2 Goals  Deeper look at query performance and scale  Look at Query and Index Service Scale Characteristics  Understand Query Execution Flow  Understand Index Usage  Tune queries with a few techniques
  • 3.
    ©2015 Couchbase Inc.3 Agenda  Part I - Architectural Overview  New Cluster Architecture with Couchbase Server 4.0  Query Processing & Indexing  Part II - Optimizing Queries  Execution Plans and Operators  Optimizing Queries - Filtering, Index Selection and Joins  OptimizingApps - Consistency Dials  QA Demos & More Demos…
  • 4.
    ©2015 Couchbase Inc.4 Disclaimer Couchbase Server 4.0 and ForestDB are still in development and the final version of the products may not be identical in details discussed on this session.
  • 5.
  • 6.
    ©2015 Couchbase Inc.6 Couchbase Server Cluster Architecture STORAGE Couchbase Server 1 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 2 Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 3 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 4 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 5 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 6 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service Managed Cache Storage Managed Cache Storage Managed Cache Storage Managed Cache Storage Managed Cache Storage
  • 7.
    ©2014 Couchbase Inc. CouchbaseServer Cluster Architecture STORAGE Couchbase Server 1 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 2 Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 3 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 4 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 5 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service STORAGE Couchbase Server 6 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service Index Service Query Service Managed Cache Storage Managed Cache Storage Managed Cache Storage Managed Cache Storage Managed Cache Storage
  • 8.
  • 9.
    ©2015 Couchbase Inc.9 Query Execution  Submitting Queries in N1QL  Stateless Connectivity through REST  Load-Balance across Query Service nodes  Prepared vs Ad-hoc Query Execution  Consistency Dials – more on this later…
  • 10.
    ©2015 Couchbase Inc.10 Query Execution  Parallelization factor is #cores on Query Service Node Execution Flow
  • 11.
    ©2015 Couchbase Inc.11 Query Service - Capacity Management Scaling the Query Service  Pro: Load Balance Queries across all nodes  Con: Compete with Index and DataWorkloads Index Service Couchbase Cluster Query Service Data Service node1 node8
  • 12.
    ©2015 Couchbase Inc.12 Query Service - Capacity Management Scaling the Query Service  Added CPU: higher intra-query parallelization  Added RAM: improved caching with larger result sets  Added Node: better availability and load balancing Couchbase Cluster node1 node8 Data ServiceIndex Service Query Service
  • 13.
  • 14.
    ©2015 Couchbase Inc.14 Indexing in Couchbase Server 4.0  Multiple Indexers  GSI – Index Service New indexing for N1QL for low latency queries without compromising on mutation performance (insert/update/delete) Independently partitioned and independently scalable indexes in Indexing Service  Map/ReduceViews – Data Service Powerful programmable indexer for complex reporting and indexing logic. Full partition alignment and paired scalability with Data Service.  SpatialView – Data Service Incremental R-tree indexing for powerful bounding-box queries Full partition alignment and paired scalability with Data Service New Index Scan
  • 15.
    ©2015 Couchbase Inc.15 Which to choose – GSI vsViews Workloads New GSI in v4.0 Map/ReduceViews Complex Reporting Just InTime Pre-aggregated Workload Optimization Optimized for Scan Latency & Throughput Optimized for Insertion Flexible Index Logic N1QL Functions Javascript Secondary Lookups Single Node Lookup Scatter-Gather Tunable Consistency Staleness false or ok or everything in between Staleness false or ok
  • 16.
    ©2015 Couchbase Inc.16 Which to choose – GSI vsViews Capabilities New GSI in v4.0 Map/ReduceViews Partitioning Model Independent – Indexing Service Aligned to Data – Data Service Scale Model Independently Scale Index Service Scale with Data Service Fetch with Index Key Single Node Scatter-Gather Range Scan Single Node Scatter-Gather Grouping,Aggregates With N1QL Built-in withViews API Caching Managed Not Managed Storage ForestDB Couchstore Availability Multiple Identical Indexes load balanced Replica Based
  • 17.
    ©2015 Couchbase Inc.17 Query Service - Capacity Management Scaling the Index Service  Pro: Load balance scans across all nodes  Con: Compete with Query and DataWorkloads Index Service Couchbase Cluster Query Service Data Service node1 node8
  • 18.
    ©2015 Couchbase Inc.18 Index Service Capacity Management Scaling the Index Service  Added RAM: better caching of indexes  Added CPU: faster index maintenance & parallelized index scans  Add Faster IO Path: faster index persistence  Added Node: better availability and load balancing Couchbase Cluster node1 node8 Data Service Index Service Query Service
  • 19.
  • 20.
    ©2015 Couchbase Inc.20 Execution Plans & Explain  EXPLAIN query  Plan is assembled into an execution flow expressed through the operators  Operators stream results up and down the stream Sequence Parallel Primary Scan Initial Project Fetch Initial ProjectFetch Initial ProjectFetch … Limit
  • 21.
    ©2015 Couchbase Inc.21 Operators  Main Operations  Scans PrimaryScan: Scan of the Primary Index based on document keys IndexScan: Scan of the Secondary Index based on a predicate  Fetch Fetch: Reach into the Data service with a document key  Projection Operations InitialProject: reducing the stream size to the fields involved in query. FinalProject: final shaping of the result to the requested JSON shape
  • 22.
    ©2015 Couchbase Inc.22 Operators cont.  Operator Assembly Parallel: execute all child operations in parallel Sequence: execute child items in a sequence  Filtering Operators Filter:Apply a filter expression (ex.WHERE field = “value”) Limit: limit the number of items returned to N Offset: start returning items from a specified item count
  • 23.
    ©2015 Couchbase Inc.23 Operators cont.  Join Operators Join: Join left and right keyspaces on attributes and document key Unnest: Join operation between a parent and a child with a nested array where parent is repeated for each child array item. Nest:Grouping operation between a parent and a child array where child array is embedded into the parent.
  • 24.
  • 25.
  • 26.
    ©2015 Couchbase Inc.26 Minimize Items Scanned  Primary Index Scan vs. Index Scan  Primary Index can only filter on document keys thus typically means “full-scan” of the bucket  Secondary Index is typically done with predicates and are smaller in size thus better to scan Index Selection: Based on matching expressions matching in Index andWHERE clause DEMO #2 SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10; Vs. CREATE INDEX i_type on `beer-sample`(type) USING GSI; SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;
  • 27.
    ©2015 Couchbase Inc.27 Minimize Items Scanned  HINT index usage to queries  There can be multiple indexes with to choose from and you can hint index choice to us. SELECT name,updated FROM `beer-sample` USE INDEX(i_type using gsi) WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;
  • 28.
    ©2015 Couchbase Inc.28 Minimize Items Scanned  Limit & Filters help eliminate rows early in the execution plan  With Limit, Upstream operators are signaled to stop by limit when enough rows accumulate  Ex: Remember to Filter on Document type with buckets that contain multiple types. DEMO #3 SELECT b1.name as beer_name, b2.name as brewery_name, b2.country FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 on KEYS b1.brewery_id WHERE abv>0; vs SELECT b1.name as beer_name, b2.name as brewery_name, b2.country FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 on KEYS b1.brewery_id WHERE b1.type="beer” and abv>0;
  • 29.
    ©2015 Couchbase Inc.29 Joins  Joins are efficient by nature  Left hand value is joined to the right hand document key with nested loop. Query: Get brewery location for each beer: SELECT … FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 on KEYS b1.brewery_id WHERE b1.type="beer”; For each document with type=“beer” take b1.brewery_id and look for and equal document key in b2.
  • 30.
  • 31.
    ©2015 Couchbase Inc.31 New Consistency Settings!  View Stale-ness  Ok: unbounded – query what’s available in the index/view now  False: query after all changes up to the request timestamp (and maybe more) has been indexed for a given index or view.  New Indexes with Couchbase Server 4.0  Improves granularity of the consistency logical-timestamp.  New: ScanConsistency can be set to any logical timestamp Indicate stale=false to stale=ok and everything in between
  • 32.
    ©2015 Couchbase Inc.32 Flexible Consistency Settings  Time t1 insert (k1, v1) … t2 do other business logic computation … t3 issue query/read on (k1,v1) with t3 vs t1 Catch up all the indexes to t3 and then issue query Identical to “stale=false” Catch up all the indexes to t1 and then issue query Improved efficiency over “stale=false”
  • 33.
  • 34.
    ©2015 Couchbase Inc.34 Recap  New Unique Query and Indexing Architecture  Workload isolation with MDS gives you a great performance and scale advancement.  Familiar Concepts from your past life will help tune queries  Understand Execution Plans  Understand Indexes and Index Selection  Filter & Limit aggressively  Understand JOINs  Use powerful new Consistency Dials for best efficiency
  • 35.
  • 36.
  • 37.

Editor's Notes

  • #10 Application has single logical connection to cluster (client object) Data is automatically sharded resulting in even document data distribution across cluster Each vbucket replicated 1, 2 or 3 times (“peer-to-peer” replication) Docs are automatically hashed by the client to a shard’ Cluster map provides location of which server a shard is on Every read/write/update/delete goes to same node for a given key Strongly consistent data access (“read your own writes”) A single Couchbase node can achieve 100k’s ops/sec so no need to scale reads
  • #16 Flip m/r index vs gsi on the graph
  • #17 Flip m/r index vs gsi on the graph
  • #21  ParentScan KeyScan ValueScan DummyScan CountScan IntersectScan
  • #23 Discard Stream Collect Channel