Queries need indexes to speed up and optimize resource utilization. What indexes to create and what rules to follow to create right indexes to optimize the workload? This presentation gives the rules for those.
6. 6
Index Categories
Standard Secondary: Release 4.0 – 4.6 • Based on ForestDB
• Released with Couchbase 4.0
Memory Optimized Index: 4.5 and above • 100% of the index is in memory
• Index is written to disk for recovery only
• Predictable Performance
• Better mutation rate
Standard Secondary: Release 5.0 • Based on the lockless skiplist
• Released with Couchbase 5.0.
8. 8
Indexes
Primary Index Index on the document key on the whole bucket
CREATE PRIMARY INDEX ON `travel-sample`
CREATE PRIMARY INDEX idx_customer_p1 ON `travel-sample`
Secondary Index Index on the key-value or document-key
CREATE INDEX idx_cx_name ON `travel-sample`(name);
Composite Index Index on more than one key-value
CREATE INDEX idx_cx2 ON `travel-sample`(state, city, geo.lat, geo.lon)
Functional or
Expression Index
Index on function or expression on key-values
CREATE INDEX idx_cxupper ON `travel-sample`(UPPER(state), UPPER(city),
geo.lat, geo.lon)
Partial index Index subset of items in the bucket
CREATE INDEX idx_cx3 ON `travel-sample` (state, city)
WHERE type = 'hotel';
CREATE INDEX idx_cx4 ON `travel-sample` (state, city, name.lastname)
WHERE type = 'hotel' and country = 'United Kingdom'
ARRAY INDEX Index individual elements of the arrays
CREATE INDEX idx_cx5 ON `travel-sample` (ALL public_likes)
CREATE INDEX idx_cx6 ON `travel-sample` (DISTINCT public_likes)
ARRAY INDEX on CREATE INDEX idx_cx7 ON `travel-sample` (ALL TOKENS(public_likes))
WHERE type = ‘comments’;
9. 9
Setup
drop index `travel-sample`.def_schedule_utc;
drop index `travel-sample`.def_city;
drop index `travel-sample`.def_name_type;
drop index `travel-sample`.def_route_src_dst_day;
drop index `travel-sample`.def_icao;
drop index `travel-sample`.def_primary;
drop index `travel-sample`.def_type;
drop index `travel-sample`.def_sourceairport;
10. 10
Primary Index
• Index on the document key on the whole bucket
• CREATE PRIMARY INDEX ON `travel-sample`
• CREATE PRIMARY INDEX idx_customer_p1 ON `travel-sample`
• In Couchbase you typically have multiple keyspaces in a single bucket
• type = ‘hotel’, type = ‘reviews’, ….
• It’s fine to have the primary index when the predicates are only on META().id
• E.g. YCSB Benchmark WHERE meta().id > “value”
select * from system:indexes where name = ‘#primary’;
"indexes": {
"datastore_id": "http://127.0.0.1:8091",
"id": "f6e3c75d6f396e7d",
"index_key": [],
"is_primary": true,
"keyspace_id": "travel-sample",
"name": "#primary",
"namespace_id": "default",
"state": "online",
"using": "gsi"
}
Dockey Docs
"h:123" {"type":"hotel”, ...}
"h:123" {"type":"hotel”, ...}
"h:123" {"type":"hotel”, ...}
"r:123" {"type":"reviews", ...}
"r:123" {"type":”reviews", ...}
"a:123" {"type":"airport", ...}
"a:123" {"type":"airport", ...}
"a:123" {"type":"airport", ...}
"l:123" {"type":"landmark",
...}
"l:123" {"type":"landmark",
...}
"l:123" {"type":"landmark",
...}
11. 11
Secondary Index
• Index on the key-value or document-key
CREATE INDEX idx_cx_name ON `travel-sample`(name);
CREATE INDEX idx_cx_name ON `travel-sample` (name, META().id);
SELECT * FROM `travel-sample` WHERE name = 'Westin Bonaventure';
SELECT * FROM `travel-sample` WHERE name = 'Westin Bonaventure'
AND META().id LIKE 'hotel%';
12. 12
Composite Index
• Index on more than one key-value
• CREATE INDEX idx_cx2 ON `travel-sample` (state, city, name.lastname)
• Query needs to have predicate on leading keys to use the index
1. SELECT * FROM `travel-sample` WHERE state = 'CA';
2. SELECT * FROM `travel-sample` WHERE state = 'CA' AND city = 'Windsor';
3. SELECT * FROM `travel-sample` WHERE state = 'CA' AND city = 'Windsor' AND name.lastname =
'smith';
4. SELECT * FROM `travel-sample` WHERE city = 'Windsor' AND name.lastname = 'smith';
5. SELECT * FROM `travel-sample` WHERE name.lastname = 'smith';
6. SELECT * FROM `travel-sample` WHERE state = 'CA' AND name.lastname = 'smith';
7. SELECT * FROM `travel-sample` WHERE state IS NOT MISSING AND city = 'Windsor' AND
name.lastname = 'smith';
13. 13
Functional or Expression Index
Index on function or expression on key-values
• CREATE INDEX idx_cxupper on `travel-sample` (UPPER(state), UPPER(city), UPPER(name));
• SELECT * FROM `travel-sample` WHERE UPPPER(state) = 'CA';
• CREATE INDEX idx_cxupper on `travel-sample` (UPPER(state) || UPPER(city));
• SELECT * FROM `travel-sample` WHERE UPPPER(state) || UPPER(city) = 'CAMOUNTAINVIEW’;
14. 14
Partial Index
Index subset of items in the bucket
CREATE INDEX idx_cx3 ON `travel-sample` (state, city, name.lastname)
WHERE type = 'hotel';
CREATE INDEX idx_cx4 ON CUSTOMER (state, city, name.lastname)
WHERE type = 'hotel' and country = 'United States’ AND ratings > 2;
• WHERE clause helps you index your keyspaces
SELECT * FROM `travel-sample` WHERE state = 'CA' AND status = 'premium';
SELECT * FROM `travel-sample` WHERE state = 'CA' AND
type = 'hotel' and country = 'United States' AND ratings > 2 ;
SELECT * FROM `travel-sample` WHERE type = 'hotel' AND
type = ‘hotel’ AND country = 'United Kingdom' AND ratings >= 4;
• Helps create index on subset of the data
• Manually divide your keyspace indexes into distinct ranges
• Manual work around for range partitioned index
• Query looks for predicate that’s a SUBSET of one index.
• Each query block will use ONE index.
• Cross index query and automatic partitioning is in the roadmap
• EVERY COUCHBASE INDEX SHOULD BE A PARTIAL INDEX. Except PRIMARY.
15. 15
Array Index
Index individual elements of the arrays
"public_likes": [ "Julius Tromp I", "Corrine Hilll",
"Jaeden McKenzie", "Vallie Ryan", "Brian Kilback",
"Lilian McLaughlin", "Ms. Moses Feeney", "Elnora Trantow"
]
CREATE INDEX idx_cx5 ON `travel-sample`(ALL public_likes)
CREATE INDEX idx_cx6 ON `travel-sample`(ALL public_likes)
"cards": [
{
"type": "visa",
"cardnum": "5827-2842-2847-3909",
"expiry": "2019-03"
},
{
"type": "master",
"cardnum": "6274-2542-5847-3949",
"expiry": "2018-12"
}
]
SELECT *
FROM `travel-sample`
WHERE ANY x in public_likes SATISFIES x = "Vallie Ryan" END;
SELECT *
FROM `travel-sample`
WHERE EVERY x in public_likes SATISFIES x = "Vallie Ryan" END;
SELECT *
FROM `travel-sample`
WHERE ANY AND EVERY x in public_likesSATISFIES x = "Vallie Ryan" END;
"#operator": "DistinctScan",
"scan": {
"#operator": "IndexScan2",
"index": "iz1",
"index_id": "78b05b69dffa2d1f",
"index_projection": {
"primary_key": true
},
"keyspace": "travel-sample",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "“Vallie
Ryan"",
"inclusion": 3,
"low": "“Vallie
Ryan""
}
]
}
],
16. 16
Array Index
CREATE INDEX idx_cx6 ON `travel-sample`(ALL TOKENS(public_likes)) WHERE type
= ‘hotel’;
• Array indexing can be created on expressions
• Array indexing can be created on nested arrays (arrays of arrays)
• The query predicate has to match the array expressions.
• Sizing
• For scalar values, you’d have one index entry per document.
• For ARRAYs, you’ll have N index entries, N = number of elements in an array
• When you have three hobbies, there will be three index entries.
select t.name, t.country, t.public_likes
FROM `travel-sample` t
WHERE t.type = 'hotel’
AND ANY p IN TOKENS(public_likes) SATISFIES p = 'Vallie' END;
17. 17
Array Index
"schedule" : [
{
"day" : 0,
"special_flights" : [
{
"flight" : "AI111",
"utc" : ”1:11:11”
},
{
"flight" : "AI222",
"utc" : ”2:22:22"
}
]
},
...
]
CREATE INDEX inested ON `travel-sample`
(DISTINCT ARRAY
(DISTINCT ARRAY y.flight
FOR y IN x.special_flights END)
FOR x IN schedule END) WHERE type = "route" ;
• Array indexing can be created on nested arrays of arrays
• Arry of the array references can be expressions
CREATE INDEX inested1 ON `travel-sample`
(DISTINCT ARRAY
(DISTINCT ARRAY y
FOR y IN OBJECT_PAIRS(x.special_flights) END)
FOR x IN schedule END) WHERE type = "route" ;
18. 18
Flexible Index
• When you really need the flexibility!
CREATE INDEX idx_cx6 ON `travel-sample`
(ALL PAIRS(SELF)) where type = 'hotel';
select count(1) from `travel-sample` use index
where free_breakfast = true and type = 'hotel' ;
select * from `travel-sample` use index (idx_cx6)
where free_breakfast = true
and free_internet = true
and free_parking = true
and type = 'hotel' ;
• We do the intersection within the SAME index
• Gives you flexibility
• To exploit the intersect scan, you can also create index on
specific fields or create multiple indexes with single column
index.
"#operator": "IntersectScan",
"scans": [
{
"#operator": "DistinctScan",
"scan": {
"#operator": "IndexScan2",
"index": "idx_cx6",
"index_id": "6ad923710d0f6d4b",
"index_projection": {
"primary_key": true
},
"keyspace": "travel-sample",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "["free_breakfast", true]",
"inclusion": 3,
"low": "["free_breakfast", true]"
}
]
}
],
...
"range": [
{
"high": "["free_internet", true]",
"inclusion": 3,
"low": "["free_internet", true]"
}
]
}
...
"range": [
{
"high": "["free_parking", true]",
"inclusion": 3,
"low": "["free_parking", true]"
}
]
...
20. 20
Rule #1: USE KEYs
• If you have the document key, you have options
• Use the SDK to fetch the documents directly
• SELECT * FROM `travel-sample` USE KEYS ["landmark_37588"];
• SELECT * FROM `travel-sample` USE KEYS ["landmark_37588", "landmark_37603" ];
• Use the SDK to fetch the documents directly
• JOINs are also done using the document keys.
• SELECT * FROM ORDERS o INNER JOIN CUSTOMER c ON KEYS o.id;
• SELECT * FROM ORDERS o USE KEYS ["ord::382"] INNER JOIN CUSTOMER c ON KEYS o.id;
21. 21
Rule #2: USE COVERING INDEX
• Design your indexes so queries can be answered just with
index scans.
• Index scans have projection, optimizing the amount
data transferred from index to query.
• Avoiding document fetch has additional savings on
memory and cpu.
• Additional keys on the index need not be in the
“leading N keys” order.
CREATE INDEX idx_cx3 ON CUSTOMER(state, city,
name.lastname) WHERE status = 'premium';
SELECT * FROM CUSTOMER
WHERE state = 'CA’ AND status = 'premium';
SELECT status, state, city FROM CUSTOMER
WHERE state = 'CA' AND status = 'premium';
{
"#operator": "IndexScan2",
"covers": [
"cover ((`CUSTOMER`.`state`))",
"cover ((`CUSTOMER`.`city`))",
"cover (((`CUSTOMER`.`name`).`lastname`))",
"cover ((meta(`CUSTOMER`).`id`))"
],
"filter_covers": {
"cover ((`CUSTOMER`.`status`))": "premium"
},
"index": "idx_cx3",
"index_id": "18f8209144215971",
"index_projection": {
"entry_keys": [
0,
1
]
},
22. 22
Rule #2: USE COVERING INDEX
Clients
7. Query result
2. Parse, Analyze, create Plan 6. Evaluate: Documents to results
3. Scan Request;
index filters
Index
Service
Query
Service
Data
Service
4. Get qualified doc keys
No Fetch
1. Query result
23. 23
Rule #3: USE THE INDEX REPLICATION
1. Same index definition, but multiple instances
2. Prior to 5.0, use equivalent indexes: same definition, but a different name.
3. With 5.0, simply specify the num_replica value
4. Used for load balancing
5. Used for high availability (failover)
CREATE INDEX idx_cx3 ON CUSTOMER(state, city, name.lastname) WHERE status = 'premium'
WITH {"nodes":["node1:8091", "node2:8091", "node3:8091"]};
CREATE INDEX idx_cx3 ON CUSTOMER(state, city, name.lastname) WHERE status = 'premium'
WITH {"num_replica":3};
curl -u Administrtor:password node5:9102/settings -d "{"indexer.settings.num_replica": 2 }"
24. 24
Rule #4: INDEX BY WORKLOAD, NOT BY BUCKET/KEYSPACE
• Not every keyspace needs indexes.
• Analyze the queries to find the common predicates and access patterns on the keyspace.
• Find out the relative frequency and latency + throughtput SLA fo the queries.
• Indexing isn’t alternative for best practices. E.g. Prepared statements
• In Couchbase, every index should have WHERE clause (except the primary index)
• If you have PRIMARY index in production, you may be asking for trouble.
25. 25
Rule #5: INDEX BY PREDICATE, NOT BY PROJECTION
• When it comes to index selection, predicates rule.
• First rule is to ensure query predicate is a subset of the WHERE clause of the index
• Then, find the matching index for rest of the predicates
• Match the leading N keys of the index
• In a composite key index, index scans only exploit consecutive leading N keys
• Other keys can be exploited for post-scan filtering and projection.
• Query should match and pushdown (spans in the explain) as much as possible
• Index the keys used in the ON KEYS clause of JOINs
• Index keys can be any scalar or array expression
26. 26
Rule #6: ADD INDEXES TO MEET THE SLAs
• Performance at scale matters most
• While an index has to serve multiple queries, specialized indexes to speed up is fine
• SPECIFIC index with complex WHERE clauses to make the index smaller
• USE INDEX directive will ensure index selection
• USE INDEX can take multiple indexes for HA
• Query performance optimization at scale depends not only on index selection but also on
efficient use of CPU on the query
• Stay tuned for PROFILING
27. 27
Rule #7: INDEX TO AVOID SORTING
• Matching of the keys in the ORDER BY and leading N keys will avoid sorting.
• Optimizer automatically takes care of equality predicates on leading keys
• Optimizer approach: Predicate matching first and avoiding ORDER BY later
• Exploiting index ordering is even more beneficial with PAGINATION
• Push both OFFSET and LIMIT to index scan when all the predicates can be pushed
• We pushdown COUNT, MIN, MAX to the index scan
• We exploit both ASC and DESC keys
• Optimizing pagination queries is usually the most critical part of tuning
28. 28
Rule #8: Number of indexes
• When you initially create the indexes, use the defered_build to share DCP stream
• There are no real limit to the number of indexes themselves
• Number of replicas can go up to (number of index nodes – 1)
29. 29
Rule #9: Index during INSERT, DELETE, UPDATE
• INSERTs are done directly to KV, Indexes are maintained asynchronously
• DELETE, UPDATE uses indexes to filter the data.
• Index updates is still asynchronous.
30. 30
Rule #10: Indexing on objects, arrays, etc
• JSON is nested with objects and arrays.
• There can ONLY be one array key in an index definition.
• This array key can be on
• An array of scalars : ALL hobbies
• An expression returning array of scalars: ALL TOKENS(comments)
• An array of scalars within arrays of arrays
• An array constructed from an expression: ALL ARRAY v FOR v IN [c1, c2, c3] END
• Consider the sizing for array indexes
31. 31
Rule #11: INDEXKEY ORDER AND PREDICATE TYPES
CREATE INDEX idx_order ON
CUSTOMER ( state, zipcode, salary, age, address, cid) WHERE type = ‘premium’;
SELECT cid, address
FROM CUSTOMER
WHERE state = ‘CA’
AND zipcode IN [29482, 29284, 29482, 28472]
AND salary < 50000
AND age > 45 ;
32. 32
Rule #11: INDEXKEY ORDER AND PREDICATE TYPES
SELECT cid, address FROM CUSTOMER
WHERE state = ‘CA’ and type = ‘premium’
AND zipcode IN [29482, 29284, 29482, 28472] AND salary < 50000 AND age > 45 ;
CREATE INDEX idx_order ON
CUSTOMER ( state, zipcode, salary, age, address, cid) WHERE type = ‘premium’;
1. EQUALITY
2. IN
3. LESS THAN
4. BETWEEN
5. GREATHER THAN
6. Array predicates
7. Look at expressions to move to WHERE clause
33. 33
Rule #12
Understand how to read
EXPLAIN and PROFILING
Full working examples with explanations in the book & online article
by Sitaram Vemulapalli & Marco Greco
https://dzone.com/articles/understanding-index-scans-in-couchbase-50-n1ql-que
https://blog/couchbase.com
34. 34
AND MORE THINGS TO REMEMBER
• USE INFER to understand your dataset
• Use the index sizing spreadsheet
• Understand the index join to exploit join from parent to child
• Study more on pagination. E.g. KEYSET PAGINATION
• Consider SPLIT, TOKENS, FTS instead of LIKE predicate ‘%joe%’
• LIKE ‘joe%’ can be optimized well
• Consider intersection when the predicate usage is non-deterministic
Primary index
Secondary Index
Array Index
Functional index
Partial Index
Primary index
Secondary Index
Array Index
Functional index
Partial Index
select t.name, t.country, t.public_likes FROM `travel-sample` t UNNEST t.public_likes plWHERE t.type = 'hotel'AND pl = 'Vallie Ryan';
select t.name, t.country, t.public_likesFROM `travel-sample` t WHERE t.type = 'hotel'AND ANY p in TOKENS(public_likes) SATISFIES p = 'Vallie' end;