Couchbase N1QL: Index Advisor

INDEX ADVISOR:
RULES FOR
CREATING INDEXES
Keshav Murthy
Senior Director, Couchbase R&D

AGENDA
01
02
03
04
N1QL : Indexing
Index categories
Index types
Rules for Creating Indexes

4
Indexing for N1QL
RPM
Speed
Gear
Accelerator
Indexes
Indexes

6
Index Categories
Standard Secondary: Release 4.0 – 4.6 • Based on ForestDB
• Released with Couchbase 4.0
Memory Optimized Index: 4.5 and above • 100% of the index is in memory
• Index is written to disk for recovery only
• Predictable Performance
• Better mutation rate
Standard Secondary: Release 5.0 • Based on the lockless skiplist
• Released with Couchbase 5.0.

8
Indexes
Primary Index Index on the document key on the whole bucket
CREATE PRIMARY INDEX ON `travel-sample`
CREATE PRIMARY INDEX idx_customer_p1 ON `travel-sample`
Secondary Index Index on the key-value or document-key
CREATE INDEX idx_cx_name ON `travel-sample`(name);
Composite Index Index on more than one key-value
CREATE INDEX idx_cx2 ON `travel-sample`(state, city, geo.lat, geo.lon)
Functional or
Expression Index
Index on function or expression on key-values
CREATE INDEX idx_cxupper ON `travel-sample`(UPPER(state), UPPER(city),
geo.lat, geo.lon)
Partial index Index subset of items in the bucket
CREATE INDEX idx_cx3 ON `travel-sample` (state, city)
WHERE type = 'hotel';
CREATE INDEX idx_cx4 ON `travel-sample` (state, city, name.lastname)
WHERE type = 'hotel' and country = 'United Kingdom'
ARRAY INDEX Index individual elements of the arrays
CREATE INDEX idx_cx5 ON `travel-sample` (ALL public_likes)
CREATE INDEX idx_cx6 ON `travel-sample` (DISTINCT public_likes)
ARRAY INDEX on CREATE INDEX idx_cx7 ON `travel-sample` (ALL TOKENS(public_likes))
WHERE type = ‘comments’;

9
Setup
drop index `travel-sample`.def_schedule_utc;
drop index `travel-sample`.def_city;
drop index `travel-sample`.def_name_type;
drop index `travel-sample`.def_route_src_dst_day;
drop index `travel-sample`.def_icao;
drop index `travel-sample`.def_primary;
drop index `travel-sample`.def_type;
drop index `travel-sample`.def_sourceairport;

10
Primary Index
• Index on the document key on the whole bucket
• CREATE PRIMARY INDEX ON `travel-sample`
• CREATE PRIMARY INDEX idx_customer_p1 ON `travel-sample`
• In Couchbase you typically have multiple keyspaces in a single bucket
• type = ‘hotel’, type = ‘reviews’, ….
• It’s fine to have the primary index when the predicates are only on META().id
• E.g. YCSB Benchmark WHERE meta().id > “value”
select * from system:indexes where name = ‘#primary’;
"indexes": {
"datastore_id": "http://127.0.0.1:8091",
"id": "f6e3c75d6f396e7d",
"index_key": [],
"is_primary": true,
"keyspace_id": "travel-sample",
"name": "#primary",
"namespace_id": "default",
"state": "online",
"using": "gsi"
}
Dockey Docs
"h:123" {"type":"hotel”, ...}
"h:123" {"type":"hotel”, ...}
"h:123" {"type":"hotel”, ...}
"r:123" {"type":"reviews", ...}
"r:123" {"type":”reviews", ...}
"a:123" {"type":"airport", ...}
"a:123" {"type":"airport", ...}
"a:123" {"type":"airport", ...}
"l:123" {"type":"landmark",
...}
...}
...}

11
Secondary Index
• Index on the key-value or document-key
CREATE INDEX idx_cx_name ON `travel-sample`(name);
CREATE INDEX idx_cx_name ON `travel-sample` (name, META().id);
SELECT * FROM `travel-sample` WHERE name = 'Westin Bonaventure';
SELECT * FROM `travel-sample` WHERE name = 'Westin Bonaventure'
AND META().id LIKE 'hotel%';

12
Composite Index
• Index on more than one key-value
• CREATE INDEX idx_cx2 ON `travel-sample` (state, city, name.lastname)
• Query needs to have predicate on leading keys to use the index
1. SELECT * FROM `travel-sample` WHERE state = 'CA';
2. SELECT * FROM `travel-sample` WHERE state = 'CA' AND city = 'Windsor';
3. SELECT * FROM `travel-sample` WHERE state = 'CA' AND city = 'Windsor' AND name.lastname =
'smith';
4. SELECT * FROM `travel-sample` WHERE city = 'Windsor' AND name.lastname = 'smith';
5. SELECT * FROM `travel-sample` WHERE name.lastname = 'smith';
6. SELECT * FROM `travel-sample` WHERE state = 'CA' AND name.lastname = 'smith';
7. SELECT * FROM `travel-sample` WHERE state IS NOT MISSING AND city = 'Windsor' AND
name.lastname = 'smith';

13
Functional or Expression Index
Index on function or expression on key-values
• CREATE INDEX idx_cxupper on `travel-sample` (UPPER(state), UPPER(city), UPPER(name));
• SELECT * FROM `travel-sample` WHERE UPPPER(state) = 'CA';
• CREATE INDEX idx_cxupper on `travel-sample` (UPPER(state) || UPPER(city));
• SELECT * FROM `travel-sample` WHERE UPPPER(state) || UPPER(city) = 'CAMOUNTAINVIEW’;

14
Partial Index
Index subset of items in the bucket
CREATE INDEX idx_cx3 ON `travel-sample` (state, city, name.lastname)
WHERE type = 'hotel';
CREATE INDEX idx_cx4 ON CUSTOMER (state, city, name.lastname)
WHERE type = 'hotel' and country = 'United States’ AND ratings > 2;
• WHERE clause helps you index your keyspaces
SELECT * FROM `travel-sample` WHERE state = 'CA' AND status = 'premium';
SELECT * FROM `travel-sample` WHERE state = 'CA' AND
type = 'hotel' and country = 'United States' AND ratings > 2 ;
SELECT * FROM `travel-sample` WHERE type = 'hotel' AND
type = ‘hotel’ AND country = 'United Kingdom' AND ratings >= 4;
• Helps create index on subset of the data
• Manually divide your keyspace indexes into distinct ranges
• Manual work around for range partitioned index
• Query looks for predicate that’s a SUBSET of one index.
• Each query block will use ONE index.
• Cross index query and automatic partitioning is in the roadmap
• EVERY COUCHBASE INDEX SHOULD BE A PARTIAL INDEX. Except PRIMARY.

15
Array Index
Index individual elements of the arrays
"public_likes": [ "Julius Tromp I", "Corrine Hilll",
"Jaeden McKenzie", "Vallie Ryan", "Brian Kilback",
"Lilian McLaughlin", "Ms. Moses Feeney", "Elnora Trantow"
]
CREATE INDEX idx_cx5 ON `travel-sample`(ALL public_likes)
CREATE INDEX idx_cx6 ON `travel-sample`(ALL public_likes)
"cards": [
{
"type": "visa",
"cardnum": "5827-2842-2847-3909",
"expiry": "2019-03"
},
{
"type": "master",
"cardnum": "6274-2542-5847-3949",
"expiry": "2018-12"
}
]
SELECT *
FROM `travel-sample`
WHERE ANY x in public_likes SATISFIES x = "Vallie Ryan" END;
SELECT *
WHERE EVERY x in public_likes SATISFIES x = "Vallie Ryan" END;
SELECT *
WHERE ANY AND EVERY x in public_likesSATISFIES x = "Vallie Ryan" END;
"#operator": "DistinctScan",
"scan": {
"#operator": "IndexScan2",
"index": "iz1",
"index_id": "78b05b69dffa2d1f",
"index_projection": {
"primary_key": true
},
"keyspace": "travel-sample",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "“Vallie
Ryan"",
"inclusion": 3,
"low": "“Vallie
Ryan""
}
]
}
],

16
Array Index
CREATE INDEX idx_cx6 ON `travel-sample`(ALL TOKENS(public_likes)) WHERE type
= ‘hotel’;
• Array indexing can be created on expressions
• Array indexing can be created on nested arrays (arrays of arrays)
• The query predicate has to match the array expressions.
• Sizing
• For scalar values, you’d have one index entry per document.
• For ARRAYs, you’ll have N index entries, N = number of elements in an array
• When you have three hobbies, there will be three index entries.
select t.name, t.country, t.public_likes
FROM `travel-sample` t
WHERE t.type = 'hotel’
AND ANY p IN TOKENS(public_likes) SATISFIES p = 'Vallie' END;

17
Array Index
"schedule" : [
{
"day" : 0,
"special_flights" : [
{
"flight" : "AI111",
"utc" : ”1:11:11”
},
{
"flight" : "AI222",
"utc" : ”2:22:22"
}
]
},
...
]
CREATE INDEX inested ON `travel-sample`
(DISTINCT ARRAY
(DISTINCT ARRAY y.flight
FOR y IN x.special_flights END)
FOR x IN schedule END) WHERE type = "route" ;
• Array indexing can be created on nested arrays of arrays
• Arry of the array references can be expressions
CREATE INDEX inested1 ON `travel-sample`
(DISTINCT ARRAY
(DISTINCT ARRAY y
FOR y IN OBJECT_PAIRS(x.special_flights) END)
FOR x IN schedule END) WHERE type = "route" ;

18
Flexible Index
• When you really need the flexibility!
CREATE INDEX idx_cx6 ON `travel-sample`
(ALL PAIRS(SELF)) where type = 'hotel';
select count(1) from `travel-sample` use index
where free_breakfast = true and type = 'hotel' ;
select * from `travel-sample` use index (idx_cx6)
where free_breakfast = true
and free_internet = true
and free_parking = true
and type = 'hotel' ;
• We do the intersection within the SAME index
• Gives you flexibility
• To exploit the intersect scan, you can also create index on
specific fields or create multiple indexes with single column
index.
"#operator": "IntersectScan",
"scans": [
{
"#operator": "DistinctScan",
"scan": {
"index": "idx_cx6",
"index_id": "6ad923710d0f6d4b",
"primary_key": true
},
"keyspace": "travel-sample",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "["free_breakfast", true]",
"inclusion": 3,
"low": "["free_breakfast", true]"
}
]
}
],
...
"range": [
{
"high": "["free_internet", true]",
"inclusion": 3,
"low": "["free_internet", true]"
}
]
}
...
"range": [
{
"high": "["free_parking", true]",
"inclusion": 3,
"low": "["free_parking", true]"
}
]
...

20
Rule #1: USE KEYs
• If you have the document key, you have options
• Use the SDK to fetch the documents directly
• SELECT * FROM `travel-sample` USE KEYS ["landmark_37588"];
• SELECT * FROM `travel-sample` USE KEYS ["landmark_37588", "landmark_37603" ];
• Use the SDK to fetch the documents directly
• JOINs are also done using the document keys.
• SELECT * FROM ORDERS o INNER JOIN CUSTOMER c ON KEYS o.id;
• SELECT * FROM ORDERS o USE KEYS ["ord::382"] INNER JOIN CUSTOMER c ON KEYS o.id;

21
Rule #2: USE COVERING INDEX
• Design your indexes so queries can be answered just with
index scans.
• Index scans have projection, optimizing the amount
data transferred from index to query.
• Avoiding document fetch has additional savings on
memory and cpu.
• Additional keys on the index need not be in the
“leading N keys” order.
CREATE INDEX idx_cx3 ON CUSTOMER(state, city,
name.lastname) WHERE status = 'premium';
SELECT * FROM CUSTOMER
WHERE state = 'CA’ AND status = 'premium';
SELECT status, state, city FROM CUSTOMER
WHERE state = 'CA' AND status = 'premium';
{
"covers": [
"cover ((`CUSTOMER`.`state`))",
"cover ((`CUSTOMER`.`city`))",
"cover (((`CUSTOMER`.`name`).`lastname`))",
"cover ((meta(`CUSTOMER`).`id`))"
],
"filter_covers": {
"cover ((`CUSTOMER`.`status`))": "premium"
},
"index": "idx_cx3",
"index_id": "18f8209144215971",
"entry_keys": [
0,
1
]
},

22
Rule #2: USE COVERING INDEX
Clients
7. Query result
2. Parse, Analyze, create Plan 6. Evaluate: Documents to results
3. Scan Request;
index filters
Index
Service
Query
Service
Data
Service
4. Get qualified doc keys
No Fetch
1. Query result

23
Rule #3: USE THE INDEX REPLICATION
1. Same index definition, but multiple instances
2. Prior to 5.0, use equivalent indexes: same definition, but a different name.
3. With 5.0, simply specify the num_replica value
4. Used for load balancing
5. Used for high availability (failover)
CREATE INDEX idx_cx3 ON CUSTOMER(state, city, name.lastname) WHERE status = 'premium'
WITH {"nodes":["node1:8091", "node2:8091", "node3:8091"]};
CREATE INDEX idx_cx3 ON CUSTOMER(state, city, name.lastname) WHERE status = 'premium'
WITH {"num_replica":3};
curl -u Administrtor:password node5:9102/settings -d "{"indexer.settings.num_replica": 2 }"

24
Rule #4: INDEX BY WORKLOAD, NOT BY BUCKET/KEYSPACE
• Not every keyspace needs indexes.
• Analyze the queries to find the common predicates and access patterns on the keyspace.
• Find out the relative frequency and latency + throughtput SLA fo the queries.
• Indexing isn’t alternative for best practices. E.g. Prepared statements
• In Couchbase, every index should have WHERE clause (except the primary index)
• If you have PRIMARY index in production, you may be asking for trouble.

25
Rule #5: INDEX BY PREDICATE, NOT BY PROJECTION
• When it comes to index selection, predicates rule.
• First rule is to ensure query predicate is a subset of the WHERE clause of the index
• Then, find the matching index for rest of the predicates
• Match the leading N keys of the index
• In a composite key index, index scans only exploit consecutive leading N keys
• Other keys can be exploited for post-scan filtering and projection.
• Query should match and pushdown (spans in the explain) as much as possible
• Index the keys used in the ON KEYS clause of JOINs
• Index keys can be any scalar or array expression

26
Rule #6: ADD INDEXES TO MEET THE SLAs
• Performance at scale matters most
• While an index has to serve multiple queries, specialized indexes to speed up is fine
• SPECIFIC index with complex WHERE clauses to make the index smaller
• USE INDEX directive will ensure index selection
• USE INDEX can take multiple indexes for HA
• Query performance optimization at scale depends not only on index selection but also on
efficient use of CPU on the query
• Stay tuned for PROFILING

27
Rule #7: INDEX TO AVOID SORTING
• Matching of the keys in the ORDER BY and leading N keys will avoid sorting.
• Optimizer automatically takes care of equality predicates on leading keys
• Optimizer approach: Predicate matching first and avoiding ORDER BY later
• Exploiting index ordering is even more beneficial with PAGINATION
• Push both OFFSET and LIMIT to index scan when all the predicates can be pushed
• We pushdown COUNT, MIN, MAX to the index scan
• We exploit both ASC and DESC keys
• Optimizing pagination queries is usually the most critical part of tuning

28
Rule #8: Number of indexes
• When you initially create the indexes, use the defered_build to share DCP stream
• There are no real limit to the number of indexes themselves
• Number of replicas can go up to (number of index nodes – 1)

29
Rule #9: Index during INSERT, DELETE, UPDATE
• INSERTs are done directly to KV, Indexes are maintained asynchronously
• DELETE, UPDATE uses indexes to filter the data.
• Index updates is still asynchronous.

30
Rule #10: Indexing on objects, arrays, etc
• JSON is nested with objects and arrays.
• There can ONLY be one array key in an index definition.
• This array key can be on
• An array of scalars : ALL hobbies
• An expression returning array of scalars: ALL TOKENS(comments)
• An array of scalars within arrays of arrays
• An array constructed from an expression: ALL ARRAY v FOR v IN [c1, c2, c3] END
• Consider the sizing for array indexes

31
Rule #11: INDEXKEY ORDER AND PREDICATE TYPES
CREATE INDEX idx_order ON
CUSTOMER ( state, zipcode, salary, age, address, cid) WHERE type = ‘premium’;
SELECT cid, address
FROM CUSTOMER
WHERE state = ‘CA’
AND zipcode IN [29482, 29284, 29482, 28472]
AND salary < 50000
AND age > 45 ;

32
Rule #11: INDEXKEY ORDER AND PREDICATE TYPES
SELECT cid, address FROM CUSTOMER
WHERE state = ‘CA’ and type = ‘premium’
AND zipcode IN [29482, 29284, 29482, 28472] AND salary < 50000 AND age > 45 ;
CREATE INDEX idx_order ON
CUSTOMER ( state, zipcode, salary, age, address, cid) WHERE type = ‘premium’;
1. EQUALITY
2. IN
3. LESS THAN
4. BETWEEN
5. GREATHER THAN
6. Array predicates
7. Look at expressions to move to WHERE clause

33
Rule #12
Understand how to read
EXPLAIN and PROFILING
Full working examples with explanations in the book & online article
by Sitaram Vemulapalli & Marco Greco
https://dzone.com/articles/understanding-index-scans-in-couchbase-50-n1ql-que
https://blog/couchbase.com

34
AND MORE THINGS TO REMEMBER
• USE INFER to understand your dataset
• Use the index sizing spreadsheet
• Understand the index join to exploit join from parent to child
• Study more on pagination. E.g. KEYSET PAGINATION
• Consider SPLIT, TOKENS, FTS instead of LIKE predicate ‘%joe%’
• LIKE ‘joe%’ can be optimized well
• Consider intersection when the predicate usage is non-deterministic

* END: RULES FOR
INDEX CREATION

36
INDEX ADVISOR:
Rules for creating indexes

Couchbase N1QL: Index Advisor

Recommended

Recommended

More Related Content

Similar to Couchbase N1QL: Index Advisor

Similar to Couchbase N1QL: Index Advisor (20)

More from Keshav Murthy

More from Keshav Murthy (20)

Recently uploaded

Recently uploaded (20)

Couchbase N1QL: Index Advisor

Editor's Notes