Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
INDEX ADVISOR:
RULES FOR
CREATING INDEXES
Keshav Murthy
Senior Director, Couchbase R&D
AGENDA
01
02
03
04
N1QL : Indexing
Index categories
Index types
Rules for Creating Indexes
1 N1QL INDEXING
4
Indexing for N1QL
RPM
Speed
Gear
Accelerator
Indexes
Indexes
2 INDEXING CATEGORIES
6
Index Categories
Standard Secondary: Release 4.0 – 4.6 • Based on ForestDB
• Released with Couchbase 4.0
Memory Optimize...
3 INDEXING TYPES
8
Indexes
Primary Index Index on the document key on the whole bucket
CREATE PRIMARY INDEX ON `travel-sample`
CREATE PRIMA...
9
Setup
drop index `travel-sample`.def_schedule_utc;
drop index `travel-sample`.def_city;
drop index `travel-sample`.def_n...
10
Primary Index
• Index on the document key on the whole bucket
• CREATE PRIMARY INDEX ON `travel-sample`
• CREATE PRIMAR...
11
Secondary Index
• Index on the key-value or document-key
CREATE INDEX idx_cx_name ON `travel-sample`(name);
CREATE INDE...
12
Composite Index
• Index on more than one key-value
• CREATE INDEX idx_cx2 ON `travel-sample` (state, city, name.lastnam...
13
Functional or Expression Index
Index on function or expression on key-values
• CREATE INDEX idx_cxupper on `travel-samp...
14
Partial Index
Index subset of items in the bucket
CREATE INDEX idx_cx3 ON `travel-sample` (state, city, name.lastname)
...
15
Array Index
Index individual elements of the arrays
"public_likes": [ "Julius Tromp I", "Corrine Hilll",
"Jaeden McKenz...
16
Array Index
CREATE INDEX idx_cx6 ON `travel-sample`(ALL TOKENS(public_likes)) WHERE type
= ‘hotel’;
• Array indexing ca...
17
Array Index
"schedule" : [
{
"day" : 0,
"special_flights" : [
{
"flight" : "AI111",
"utc" : ”1:11:11”
},
{
"flight" : "...
18
Flexible Index
• When you really need the flexibility!
CREATE INDEX idx_cx6 ON `travel-sample`
(ALL PAIRS(SELF)) where ...
4 RULES FOR CREATING
INDEXES
20
Rule #1: USE KEYs
• If you have the document key, you have options
• Use the SDK to fetch the documents directly
• SELE...
21
Rule #2: USE COVERING INDEX
• Design your indexes so queries can be answered just with
index scans.
• Index scans have ...
22
Rule #2: USE COVERING INDEX
Clients
7. Query result
2. Parse, Analyze, create Plan 6. Evaluate: Documents to results
3....
23
Rule #3: USE THE INDEX REPLICATION
1. Same index definition, but multiple instances
2. Prior to 5.0, use equivalent ind...
24
Rule #4: INDEX BY WORKLOAD, NOT BY BUCKET/KEYSPACE
• Not every keyspace needs indexes.
• Analyze the queries to find th...
25
Rule #5: INDEX BY PREDICATE, NOT BY PROJECTION
• When it comes to index selection, predicates rule.
• First rule is to ...
26
Rule #6: ADD INDEXES TO MEET THE SLAs
• Performance at scale matters most
• While an index has to serve multiple querie...
27
Rule #7: INDEX TO AVOID SORTING
• Matching of the keys in the ORDER BY and leading N keys will avoid sorting.
• Optimiz...
28
Rule #8: Number of indexes
• When you initially create the indexes, use the defered_build to share DCP stream
• There a...
29
Rule #9: Index during INSERT, DELETE, UPDATE
• INSERTs are done directly to KV, Indexes are maintained asynchronously
•...
30
Rule #10: Indexing on objects, arrays, etc
• JSON is nested with objects and arrays.
• There can ONLY be one array key ...
31
Rule #11: INDEXKEY ORDER AND PREDICATE TYPES
CREATE INDEX idx_order ON
CUSTOMER ( state, zipcode, salary, age, address,...
32
Rule #11: INDEXKEY ORDER AND PREDICATE TYPES
SELECT cid, address FROM CUSTOMER
WHERE state = ‘CA’ and type = ‘premium’
...
33
Rule #12
Understand how to read
EXPLAIN and PROFILING
Full working examples with explanations in the book & online arti...
34
AND MORE THINGS TO REMEMBER
• USE INFER to understand your dataset
• Use the index sizing spreadsheet
• Understand the ...
* END: RULES FOR
INDEX CREATION
36
INDEX ADVISOR:
Rules for creating indexes
Upcoming SlideShare
Loading in …5
×

Couchbase N1QL: Index Advisor

1,965 views

Published on

Queries need indexes to speed up and optimize resource utilization. What indexes to create and what rules to follow to create right indexes to optimize the workload? This presentation gives the rules for those.

Published in: Software
  • Be the first to comment

Couchbase N1QL: Index Advisor

  1. 1. INDEX ADVISOR: RULES FOR CREATING INDEXES Keshav Murthy Senior Director, Couchbase R&D
  2. 2. AGENDA 01 02 03 04 N1QL : Indexing Index categories Index types Rules for Creating Indexes
  3. 3. 1 N1QL INDEXING
  4. 4. 4 Indexing for N1QL RPM Speed Gear Accelerator Indexes Indexes
  5. 5. 2 INDEXING CATEGORIES
  6. 6. 6 Index Categories Standard Secondary: Release 4.0 – 4.6 • Based on ForestDB • Released with Couchbase 4.0 Memory Optimized Index: 4.5 and above • 100% of the index is in memory • Index is written to disk for recovery only • Predictable Performance • Better mutation rate Standard Secondary: Release 5.0 • Based on the lockless skiplist • Released with Couchbase 5.0.
  7. 7. 3 INDEXING TYPES
  8. 8. 8 Indexes Primary Index Index on the document key on the whole bucket CREATE PRIMARY INDEX ON `travel-sample` CREATE PRIMARY INDEX idx_customer_p1 ON `travel-sample` Secondary Index Index on the key-value or document-key CREATE INDEX idx_cx_name ON `travel-sample`(name); Composite Index Index on more than one key-value CREATE INDEX idx_cx2 ON `travel-sample`(state, city, geo.lat, geo.lon) Functional or Expression Index Index on function or expression on key-values CREATE INDEX idx_cxupper ON `travel-sample`(UPPER(state), UPPER(city), geo.lat, geo.lon) Partial index Index subset of items in the bucket CREATE INDEX idx_cx3 ON `travel-sample` (state, city) WHERE type = 'hotel'; CREATE INDEX idx_cx4 ON `travel-sample` (state, city, name.lastname) WHERE type = 'hotel' and country = 'United Kingdom' ARRAY INDEX Index individual elements of the arrays CREATE INDEX idx_cx5 ON `travel-sample` (ALL public_likes) CREATE INDEX idx_cx6 ON `travel-sample` (DISTINCT public_likes) ARRAY INDEX on CREATE INDEX idx_cx7 ON `travel-sample` (ALL TOKENS(public_likes)) WHERE type = ‘comments’;
  9. 9. 9 Setup drop index `travel-sample`.def_schedule_utc; drop index `travel-sample`.def_city; drop index `travel-sample`.def_name_type; drop index `travel-sample`.def_route_src_dst_day; drop index `travel-sample`.def_icao; drop index `travel-sample`.def_primary; drop index `travel-sample`.def_type; drop index `travel-sample`.def_sourceairport;
  10. 10. 10 Primary Index • Index on the document key on the whole bucket • CREATE PRIMARY INDEX ON `travel-sample` • CREATE PRIMARY INDEX idx_customer_p1 ON `travel-sample` • In Couchbase you typically have multiple keyspaces in a single bucket • type = ‘hotel’, type = ‘reviews’, …. • It’s fine to have the primary index when the predicates are only on META().id • E.g. YCSB Benchmark WHERE meta().id > “value” select * from system:indexes where name = ‘#primary’; "indexes": { "datastore_id": "http://127.0.0.1:8091", "id": "f6e3c75d6f396e7d", "index_key": [], "is_primary": true, "keyspace_id": "travel-sample", "name": "#primary", "namespace_id": "default", "state": "online", "using": "gsi" } Dockey Docs "h:123" {"type":"hotel”, ...} "h:123" {"type":"hotel”, ...} "h:123" {"type":"hotel”, ...} "r:123" {"type":"reviews", ...} "r:123" {"type":”reviews", ...} "a:123" {"type":"airport", ...} "a:123" {"type":"airport", ...} "a:123" {"type":"airport", ...} "l:123" {"type":"landmark", ...} "l:123" {"type":"landmark", ...} "l:123" {"type":"landmark", ...}
  11. 11. 11 Secondary Index • Index on the key-value or document-key CREATE INDEX idx_cx_name ON `travel-sample`(name); CREATE INDEX idx_cx_name ON `travel-sample` (name, META().id); SELECT * FROM `travel-sample` WHERE name = 'Westin Bonaventure'; SELECT * FROM `travel-sample` WHERE name = 'Westin Bonaventure' AND META().id LIKE 'hotel%';
  12. 12. 12 Composite Index • Index on more than one key-value • CREATE INDEX idx_cx2 ON `travel-sample` (state, city, name.lastname) • Query needs to have predicate on leading keys to use the index 1. SELECT * FROM `travel-sample` WHERE state = 'CA'; 2. SELECT * FROM `travel-sample` WHERE state = 'CA' AND city = 'Windsor'; 3. SELECT * FROM `travel-sample` WHERE state = 'CA' AND city = 'Windsor' AND name.lastname = 'smith'; 4. SELECT * FROM `travel-sample` WHERE city = 'Windsor' AND name.lastname = 'smith'; 5. SELECT * FROM `travel-sample` WHERE name.lastname = 'smith'; 6. SELECT * FROM `travel-sample` WHERE state = 'CA' AND name.lastname = 'smith'; 7. SELECT * FROM `travel-sample` WHERE state IS NOT MISSING AND city = 'Windsor' AND name.lastname = 'smith';
  13. 13. 13 Functional or Expression Index Index on function or expression on key-values • CREATE INDEX idx_cxupper on `travel-sample` (UPPER(state), UPPER(city), UPPER(name)); • SELECT * FROM `travel-sample` WHERE UPPPER(state) = 'CA'; • CREATE INDEX idx_cxupper on `travel-sample` (UPPER(state) || UPPER(city)); • SELECT * FROM `travel-sample` WHERE UPPPER(state) || UPPER(city) = 'CAMOUNTAINVIEW’;
  14. 14. 14 Partial Index Index subset of items in the bucket CREATE INDEX idx_cx3 ON `travel-sample` (state, city, name.lastname) WHERE type = 'hotel'; CREATE INDEX idx_cx4 ON CUSTOMER (state, city, name.lastname) WHERE type = 'hotel' and country = 'United States’ AND ratings > 2; • WHERE clause helps you index your keyspaces SELECT * FROM `travel-sample` WHERE state = 'CA' AND status = 'premium'; SELECT * FROM `travel-sample` WHERE state = 'CA' AND type = 'hotel' and country = 'United States' AND ratings > 2 ; SELECT * FROM `travel-sample` WHERE type = 'hotel' AND type = ‘hotel’ AND country = 'United Kingdom' AND ratings >= 4; • Helps create index on subset of the data • Manually divide your keyspace indexes into distinct ranges • Manual work around for range partitioned index • Query looks for predicate that’s a SUBSET of one index. • Each query block will use ONE index. • Cross index query and automatic partitioning is in the roadmap • EVERY COUCHBASE INDEX SHOULD BE A PARTIAL INDEX. Except PRIMARY.
  15. 15. 15 Array Index Index individual elements of the arrays "public_likes": [ "Julius Tromp I", "Corrine Hilll", "Jaeden McKenzie", "Vallie Ryan", "Brian Kilback", "Lilian McLaughlin", "Ms. Moses Feeney", "Elnora Trantow" ] CREATE INDEX idx_cx5 ON `travel-sample`(ALL public_likes) CREATE INDEX idx_cx6 ON `travel-sample`(ALL public_likes) "cards": [ { "type": "visa", "cardnum": "5827-2842-2847-3909", "expiry": "2019-03" }, { "type": "master", "cardnum": "6274-2542-5847-3949", "expiry": "2018-12" } ] SELECT * FROM `travel-sample` WHERE ANY x in public_likes SATISFIES x = "Vallie Ryan" END; SELECT * FROM `travel-sample` WHERE EVERY x in public_likes SATISFIES x = "Vallie Ryan" END; SELECT * FROM `travel-sample` WHERE ANY AND EVERY x in public_likesSATISFIES x = "Vallie Ryan" END; "#operator": "DistinctScan", "scan": { "#operator": "IndexScan2", "index": "iz1", "index_id": "78b05b69dffa2d1f", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "“Vallie Ryan"", "inclusion": 3, "low": "“Vallie Ryan"" } ] } ],
  16. 16. 16 Array Index CREATE INDEX idx_cx6 ON `travel-sample`(ALL TOKENS(public_likes)) WHERE type = ‘hotel’; • Array indexing can be created on expressions • Array indexing can be created on nested arrays (arrays of arrays) • The query predicate has to match the array expressions. • Sizing • For scalar values, you’d have one index entry per document. • For ARRAYs, you’ll have N index entries, N = number of elements in an array • When you have three hobbies, there will be three index entries. select t.name, t.country, t.public_likes FROM `travel-sample` t WHERE t.type = 'hotel’ AND ANY p IN TOKENS(public_likes) SATISFIES p = 'Vallie' END;
  17. 17. 17 Array Index "schedule" : [ { "day" : 0, "special_flights" : [ { "flight" : "AI111", "utc" : ”1:11:11” }, { "flight" : "AI222", "utc" : ”2:22:22" } ] }, ... ] CREATE INDEX inested ON `travel-sample` (DISTINCT ARRAY (DISTINCT ARRAY y.flight FOR y IN x.special_flights END) FOR x IN schedule END) WHERE type = "route" ; • Array indexing can be created on nested arrays of arrays • Arry of the array references can be expressions CREATE INDEX inested1 ON `travel-sample` (DISTINCT ARRAY (DISTINCT ARRAY y FOR y IN OBJECT_PAIRS(x.special_flights) END) FOR x IN schedule END) WHERE type = "route" ;
  18. 18. 18 Flexible Index • When you really need the flexibility! CREATE INDEX idx_cx6 ON `travel-sample` (ALL PAIRS(SELF)) where type = 'hotel'; select count(1) from `travel-sample` use index where free_breakfast = true and type = 'hotel' ; select * from `travel-sample` use index (idx_cx6) where free_breakfast = true and free_internet = true and free_parking = true and type = 'hotel' ; • We do the intersection within the SAME index • Gives you flexibility • To exploit the intersect scan, you can also create index on specific fields or create multiple indexes with single column index. "#operator": "IntersectScan", "scans": [ { "#operator": "DistinctScan", "scan": { "#operator": "IndexScan2", "index": "idx_cx6", "index_id": "6ad923710d0f6d4b", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "["free_breakfast", true]", "inclusion": 3, "low": "["free_breakfast", true]" } ] } ], ... "range": [ { "high": "["free_internet", true]", "inclusion": 3, "low": "["free_internet", true]" } ] } ... "range": [ { "high": "["free_parking", true]", "inclusion": 3, "low": "["free_parking", true]" } ] ...
  19. 19. 4 RULES FOR CREATING INDEXES
  20. 20. 20 Rule #1: USE KEYs • If you have the document key, you have options • Use the SDK to fetch the documents directly • SELECT * FROM `travel-sample` USE KEYS ["landmark_37588"]; • SELECT * FROM `travel-sample` USE KEYS ["landmark_37588", "landmark_37603" ]; • Use the SDK to fetch the documents directly • JOINs are also done using the document keys. • SELECT * FROM ORDERS o INNER JOIN CUSTOMER c ON KEYS o.id; • SELECT * FROM ORDERS o USE KEYS ["ord::382"] INNER JOIN CUSTOMER c ON KEYS o.id;
  21. 21. 21 Rule #2: USE COVERING INDEX • Design your indexes so queries can be answered just with index scans. • Index scans have projection, optimizing the amount data transferred from index to query. • Avoiding document fetch has additional savings on memory and cpu. • Additional keys on the index need not be in the “leading N keys” order. CREATE INDEX idx_cx3 ON CUSTOMER(state, city, name.lastname) WHERE status = 'premium'; SELECT * FROM CUSTOMER WHERE state = 'CA’ AND status = 'premium'; SELECT status, state, city FROM CUSTOMER WHERE state = 'CA' AND status = 'premium'; { "#operator": "IndexScan2", "covers": [ "cover ((`CUSTOMER`.`state`))", "cover ((`CUSTOMER`.`city`))", "cover (((`CUSTOMER`.`name`).`lastname`))", "cover ((meta(`CUSTOMER`).`id`))" ], "filter_covers": { "cover ((`CUSTOMER`.`status`))": "premium" }, "index": "idx_cx3", "index_id": "18f8209144215971", "index_projection": { "entry_keys": [ 0, 1 ] },
  22. 22. 22 Rule #2: USE COVERING INDEX Clients 7. Query result 2. Parse, Analyze, create Plan 6. Evaluate: Documents to results 3. Scan Request; index filters Index Service Query Service Data Service 4. Get qualified doc keys No Fetch 1. Query result
  23. 23. 23 Rule #3: USE THE INDEX REPLICATION 1. Same index definition, but multiple instances 2. Prior to 5.0, use equivalent indexes: same definition, but a different name. 3. With 5.0, simply specify the num_replica value 4. Used for load balancing 5. Used for high availability (failover) CREATE INDEX idx_cx3 ON CUSTOMER(state, city, name.lastname) WHERE status = 'premium' WITH {"nodes":["node1:8091", "node2:8091", "node3:8091"]}; CREATE INDEX idx_cx3 ON CUSTOMER(state, city, name.lastname) WHERE status = 'premium' WITH {"num_replica":3}; curl -u Administrtor:password node5:9102/settings -d "{"indexer.settings.num_replica": 2 }"
  24. 24. 24 Rule #4: INDEX BY WORKLOAD, NOT BY BUCKET/KEYSPACE • Not every keyspace needs indexes. • Analyze the queries to find the common predicates and access patterns on the keyspace. • Find out the relative frequency and latency + throughtput SLA fo the queries. • Indexing isn’t alternative for best practices. E.g. Prepared statements • In Couchbase, every index should have WHERE clause (except the primary index) • If you have PRIMARY index in production, you may be asking for trouble.
  25. 25. 25 Rule #5: INDEX BY PREDICATE, NOT BY PROJECTION • When it comes to index selection, predicates rule. • First rule is to ensure query predicate is a subset of the WHERE clause of the index • Then, find the matching index for rest of the predicates • Match the leading N keys of the index • In a composite key index, index scans only exploit consecutive leading N keys • Other keys can be exploited for post-scan filtering and projection. • Query should match and pushdown (spans in the explain) as much as possible • Index the keys used in the ON KEYS clause of JOINs • Index keys can be any scalar or array expression
  26. 26. 26 Rule #6: ADD INDEXES TO MEET THE SLAs • Performance at scale matters most • While an index has to serve multiple queries, specialized indexes to speed up is fine • SPECIFIC index with complex WHERE clauses to make the index smaller • USE INDEX directive will ensure index selection • USE INDEX can take multiple indexes for HA • Query performance optimization at scale depends not only on index selection but also on efficient use of CPU on the query • Stay tuned for PROFILING
  27. 27. 27 Rule #7: INDEX TO AVOID SORTING • Matching of the keys in the ORDER BY and leading N keys will avoid sorting. • Optimizer automatically takes care of equality predicates on leading keys • Optimizer approach: Predicate matching first and avoiding ORDER BY later • Exploiting index ordering is even more beneficial with PAGINATION • Push both OFFSET and LIMIT to index scan when all the predicates can be pushed • We pushdown COUNT, MIN, MAX to the index scan • We exploit both ASC and DESC keys • Optimizing pagination queries is usually the most critical part of tuning
  28. 28. 28 Rule #8: Number of indexes • When you initially create the indexes, use the defered_build to share DCP stream • There are no real limit to the number of indexes themselves • Number of replicas can go up to (number of index nodes – 1)
  29. 29. 29 Rule #9: Index during INSERT, DELETE, UPDATE • INSERTs are done directly to KV, Indexes are maintained asynchronously • DELETE, UPDATE uses indexes to filter the data. • Index updates is still asynchronous.
  30. 30. 30 Rule #10: Indexing on objects, arrays, etc • JSON is nested with objects and arrays. • There can ONLY be one array key in an index definition. • This array key can be on • An array of scalars : ALL hobbies • An expression returning array of scalars: ALL TOKENS(comments) • An array of scalars within arrays of arrays • An array constructed from an expression: ALL ARRAY v FOR v IN [c1, c2, c3] END • Consider the sizing for array indexes
  31. 31. 31 Rule #11: INDEXKEY ORDER AND PREDICATE TYPES CREATE INDEX idx_order ON CUSTOMER ( state, zipcode, salary, age, address, cid) WHERE type = ‘premium’; SELECT cid, address FROM CUSTOMER WHERE state = ‘CA’ AND zipcode IN [29482, 29284, 29482, 28472] AND salary < 50000 AND age > 45 ;
  32. 32. 32 Rule #11: INDEXKEY ORDER AND PREDICATE TYPES SELECT cid, address FROM CUSTOMER WHERE state = ‘CA’ and type = ‘premium’ AND zipcode IN [29482, 29284, 29482, 28472] AND salary < 50000 AND age > 45 ; CREATE INDEX idx_order ON CUSTOMER ( state, zipcode, salary, age, address, cid) WHERE type = ‘premium’; 1. EQUALITY 2. IN 3. LESS THAN 4. BETWEEN 5. GREATHER THAN 6. Array predicates 7. Look at expressions to move to WHERE clause
  33. 33. 33 Rule #12 Understand how to read EXPLAIN and PROFILING Full working examples with explanations in the book & online article by Sitaram Vemulapalli & Marco Greco https://dzone.com/articles/understanding-index-scans-in-couchbase-50-n1ql-que https://blog/couchbase.com
  34. 34. 34 AND MORE THINGS TO REMEMBER • USE INFER to understand your dataset • Use the index sizing spreadsheet • Understand the index join to exploit join from parent to child • Study more on pagination. E.g. KEYSET PAGINATION • Consider SPLIT, TOKENS, FTS instead of LIKE predicate ‘%joe%’ • LIKE ‘joe%’ can be optimized well • Consider intersection when the predicate usage is non-deterministic
  35. 35. * END: RULES FOR INDEX CREATION
  36. 36. 36 INDEX ADVISOR: Rules for creating indexes

×