Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Couchbase Tutorial: Big data Open Source Systems: VLDB2018

74 views

Published on

Introduction to Couchbase and N1QL tutorial on query and analytics service.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Couchbase Tutorial: Big data Open Source Systems: VLDB2018

  1. 1. COUCHBASE Keshav Murthy Senior Director, Couchbase R&D Rio de Janeiro, Brazil August, 27th, 2018
  2. 2. AGENDA 02 03 Introduction to N1QL Part 1: Setup, Getting Started and Querying 01 Introduction to Couchbase 04 Part 2: Querying and Modifying Complex Data 05 Part 3: Indexing and Query Tuning 06 Part 4: Inversion of JSON hierarchies 07 Part 5: Explore the analytics Service
  3. 3. 1 INTRODUCTION TO COUCHBASE
  4. 4. Couchbase Data Platform Develop with Agility. Deploy at any scale. World’sFirst Engagement Database
  5. 5. 5 Architecture App App Couchbase App App Couchbase App App Couchbase App App Couchbase Couchbase Cluster App App Couchbase Couchbase Single Node Deployment Couchbase Cluster Deployment
  6. 6. 6 Couchbase Server Cluster Service Deployment STORAGE Couchbase Server 1 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Managed Cache Storage Analytics Service STORAGE Couchbase Server 2 Managed Cache Cluster ManagerCluster Manager Data Service STORAGE Couchbase Server 3 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Data Service STORAGE Couchbase Server 4 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Query Service STORAGE Couchbase Server 5 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Query Service STORAGE Couchbase Server 6 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Index Service Managed Cache Storage Managed Cache Storage Storage STORAGE Couchbase Server 6 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster ManagerCluster Manager Eventing Storage Managed Cache Managed Cache SDK SDK
  7. 7. ©2017 Couchbase. All rights reserved. 7Sample Production Deployment NODE 1 NODE 12 Cluster Manager Data Full Text Search Analytics Global Index Query Built for Change at Scale Application Eventing
  8. 8. 8 COUCHBASE • Buckets • Stores JSON documents. • Each JSON document has a unique key (primary key) • Document is hash-distributed into multiple nodes • Resource manager • Up to 10 buckets • Data Model • JSON • Simple key-value
  9. 9. 9 COUCHBASE • Data Manipulation • Direct key-value : get, set, sub-doc, extended attributes • Views : map-reduce views, written in Javascript • Query : N1QL language and engine. More shortly • Using the indexing service • FTS : Full-Text-Service for JSON • Analytics : N1QL for analytics • Couchbase implementation of SQL++ • Copies the data & changes from data service • Uses AsterixDB for data mgmt & query processing • Post Action • Eventing : Run Javascript procedure upon data change
  10. 10. 10 COUCHBASE – Open source • https://github.com/couchbase • AsterixDB used inside analytics service is an Apache Project • https://asterixdb.apache.org/
  11. 11. 2 N1QL = SQL + JSON
  12. 12. 12 ResultSet Relations/Tuples
  13. 13. 13 { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } LoyaltyInfo Results Orders CUSTOMER • NoSQL systems provide specialized APIs • Key-Value get and set • Each task requires custom built program • Should test & maintain it
  14. 14. 14 Find High-Value Customers with Orders > $10000 Query customer objects from database • Complex codes and logic • Inefficient processing on client side For each customer object Find all the order objects for the customer Calculate the total amount for each order Sum up the grand total amount for all orders If grand total amount > $10000, Extract customer data Add customer to the high-value customer list Sort the high-value customer list LOOPING OVER MILLIONS OF CUSTOMERS IN APPLICATION!!!
  15. 15. 15 { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } LoyaltyInfo ResultDocuments Orders CUSTOMER
  16. 16. 16 N1QL = SQL + JSON Give developers and enterprises an expressive, powerful, and complete language for querying, transforming, and manipulating JSON data.
  17. 17. 17 Why SQL for NoSQL?
  18. 18. 18 N1QL : Data Types from JSON Data Type Example Numbers { "id": 5, "balance":2942.59 } Strings { "name": "Joe", "city": "Morrisville" } Boolean { "premium": true, "balance pending": false} Null { "last_address": Null } Array { "hobbies": ["tennis", "skiing", "lego"]} Object { "address": {"street": "1, Main street", "city": Morrisville, "state":"CA", "zip":"94824"}} MISSING Arrays of objects of arrays [ { "type": "visa", "cardnum": "5827-2842-2847-3909", "expiry": "2019-03" }, { "type": "master", "cardnum": "6274-2542-5847-3949", "expiry": "2018-12" } ]
  19. 19. 19 N1QL: Data Manipulation Statements •SELECT Statement- •UPDATE … SET … WHERE … •DELETE FROM … WHERE … •INSERT INTO … ( KEY, VALUE ) VALUES … •INSERT INTO … ( KEY …, VALUE … ) SELECT … •MERGE INTO … USING … ON … WHEN [ NOT ] MATCHED THEN … Note: Couchbase provides per-document atomicity.
  20. 20. 20 N1QL: SELECT Statement SELECT * FROM customers c WHERE c.address.state = 'NY' AND c.status = 'premium' ORDER BY c.address.zip Project Everything From the bucket customers Sort order Predicate (Filters)
  21. 21. 21 N1QL: SELECT Statement SELECT customers.id, customers.NAME.lastname, customers.NAME.firstname Sum(orderline.amount) FROM orders UNNEST orders.lineitems AS orderline INNER JOIN customers ON (orders.custid = META(customers).id) WHERE customers.state = 'NY' GROUP BY customers.id, customers.NAME.lastname, customers.NAME.firstname HAVING sum(orderline.amount) > 10000 ORDER BY sum(orderline.amount) DESC • Dotted sub-document reference • Names are CASE-SENSITIVE UNNEST to flatten the arrays JOINS with Document KEY of customers
  22. 22. 22 N1QL: SELECT Statement Highlights • Querying across relationships • INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN (5.5) • Subqueries • Aggregation (HUGE PERFORMANCE IMPROVEMENT IN 5.5) • MIN, MAX • SUM, COUNT, AVG, ARRAY_AGG [ DISTINCT ] • Combining result sets using set operators • UNION, UNION ALL, INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL
  23. 23. 23 N1QL : Query Operators [ 1 of 2 ] •USE KEYS … • Direct primary key lookup bypassing index scans • Ideal for hash-distributed datastore • Available in SELECT, UPDATE, DELETE •JOINs • INNER, LEFT OUTER, limited RIGHT-OUTER • Nested loop JOIN is the default • HASH JOIN for significantly better performance with larger amount of data. • Ideal for hash-distributed datastore
  24. 24. 24 N1QL : Query Operators [ 2 of 2 ] • UNNEST • Flattening JOIN that surfaces nested objects as top-level documents • Ideal for decomposing JSON hierarchies • Example: Flatten customer document to customer-orders •NEST • Does the opposite of UNNEST • Special JOIN that embeds external child documents under their parent • Ideal for JSON encapsulation •JOIN, NEST, and UNNEST can be chained in any combination
  25. 25. 25 UNNEST { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ] } "c": { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ] }, "type" : "master", "cardnum" : "6274-2842-2847-3909” } SELECT c, b.type, b.cardnum FROM customer c UNNEST c.Billing AS b "c": { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ] }, "type" : "visa", "cardnum" : "5827-2842-2847-3909” }
  26. 26. 26 N1QL : Expressions for JSON Ranging over collections • WHERE ANY c IN children SATISFIES c.age > 10 END • WHERE EVERY r IN ratings SATISFIES r > 3 END Mapping with filtering • ARRAY c.name FOR c IN children WHEN c.age > 10 END Deep traversal, SET, and UNSET • WHERE ANY node WITHIN request SATISFIES node.type = “xyz” END • UPDATE doc UNSET c.field1 FOR c WITHIN doc END Dynamic Construction • SELECT { “a”: expr1, “b”: expr2 } AS obj1, name FROM … // Dynamic object • SELECT [ a, b ] FROM … // Dynamic array Nested traversal • SELECT x.y.z, a[0] FROM a.b.c … IS [ NOT ] MISSING • WHERE name IS MISSING
  27. 27. 27 Global Secondary Indexes Primary Index Index on the document key on the whole bucket CREATE PRIMARY INDEX ON `travel-sample` CREATE PRIMARY INDEX idx_customer_p1 ON `travel-sample` Secondary Index Index on the key-value or document-key CREATE INDEX idx_cx_name ON `travel-sample`(name); Composite Index Index on more than one key-value CREATE INDEX idx_cx2 ON `travel-sample`(state, city, geo.lat, geo.lon) Functional or Expression Index Index on function or expression on key-values CREATE INDEX idx_cxupper ON `travel-sample`(UPPER(state), UPPER(city), geo.lat, geo.lon) Partial index Index subset of items in the bucket CREATE INDEX idx_cx3 ON `travel-sample` (state, city) WHERE type = 'hotel'; CREATE INDEX idx_cx4 ON `travel-sample` (state, city, name.lastname) WHERE type = 'hotel' and country = 'United Kingdom' ARRAY INDEX Index individual elements of the arrays CREATE INDEX idx_cx5 ON `travel-sample` (ALL public_likes) CREATE INDEX idx_cx6 ON `travel-sample` (DISTINCT public_likes) ARRAY INDEX on expressions CREATE INDEX idx_cx7 ON `travel-sample` (ALL TOKENS(public_likes)) WHERE type = ‘comments’;
  28. 28. 28 N1QL: Query Execution Flow Clients 1. Submit the query over REST API 8. Query result 2. Parse, Analyze, create Plan 7. Evaluate: Documents to results 3. Scan Request; index filters 6. Fetch the documents Index Service Query Service Data Service 4. Get qualified doc keys 5. Fetch Request, doc keys SELECT c_id, c_first, c_last, c_max FROM CUSTOMER WHERE c_id = 49165; { "c_first": "Joe", "c_id": 49165, "c_last": "Montana", "c_max" : 50000 }
  29. 29. 29 N1QL: Inside the Query Service Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service
  30. 30. 30 SQL is English for Relational Database SQL Invented by Don Chamberlin & Raymond Boyce at IBM N1QL, based on SQL, is English for JSON N1QL was invented by Gerald Sangudi at Couchbase SQL Instance Database Table Row Column Index Datatypes N1QL Cluster Bucket Bucket, Keyspace Document Attribute Index JSON Datatypes SQL Input and Output: Set(s) of Tuples N1QL STMT CREATE BUCKET CREATE INDEX None SELECT INSERT UPDATE DELETE MERGE Subqueries JOIN GROUP BY ORDER BY OFFSET, LIMIT EXPLAIN PREPARE EXECUTE GRANT ROLE REVOKE ROLE INFER PREPARE EXECUTE FLUSH Tuples SQL Model Set of JSON N1QL Model Set of Tuples Set of JSON N1QL Tooling Web Console Monitoring Profiling Dev workbench SDK Simba, Cdata BI Slamdata SQL Tooling ODBC, JDBC, .NET Hibernate BI Tools erwin TOAD N1QLResources query.couchbase.com SQL Indexes Primary Key Secondary Key Composite Range Partitioned Expression (Functional) Spatial Search N1QL Indexes Primary Secondary Composite Range Partitioned Partial Expression (Functional) Spatial Array Index Replica(HA) Adaptive SQL Logic 3 valued logic TRUE, FALSE, NULL/UNKNOWN N1QL Logic 4 valued logic TRUE, FALSE, NULL/UNKNOWN, MISSING SQL Transactions ACID Multi-Statement Savepoints Commit/Rollback Redo, Undo N1QL Transactions Single Document atomicity SQL Datatypes Numeric Boolean Decimal Character Date Time Timezone BLOB Spatial JSON N1QL Datatype Numeric Boolean Array Character Object Null JSON Conversion Functions SQL Optimizer Rule Based Cost Based Index Selection Query Rewrites NL, Hash, Merge join N1QL Optimizer Rule based Index Selection NL, Hash join SQL ACID ATOMIC Consistent Isolated Durable N1QL BASE Single doc Atomic Consistent Data* Optimistic Concurrency N1QL Index Scan Consistency* Unbounded AT_PLUS REQUEST_PLUS SQL Engine (SMP Scale UP) N1QL Engine (MPP Cluste Scale OUT) Additional SQL Features Triggers Stored Procedures XML Constraints SQL STMT CREATE TABLE CREATE INDEX ALTER TABLE SELECT INSERT UPDATE DELETE MERGE Subqueries JOIN GROUP BY ORDER BY OFFSET, LIMIT EXPLAIN PLAN PREPARE EXECUTE GRANT REVOKE DESCRIBE PREPARE EXECUTE TRUNCATE N1QL Input and Output: Set(s) of JSON
  31. 31. READ THIS!
  32. 32. READ THIS!
  33. 33. 3 PART 1: SETUP, GETTING STARTED AND QUERYING
  34. 34. 34 Setup • https://URL-SHOWN-IN-THE-TUTORIAL
  35. 35. 35 Setup • https://selfservice-4.rightscale.com/catalog
  36. 36. 36 Setup • https://selfservice-4.rightscale.com/catalog
  37. 37. 37 More workshop Part 2: Querying and Modifying Complex Data Part 3: Indexing and Query Tuning Part 4: Inversion of JSON hierarchies
  38. 38. 7 COUCHBASE ANALYTICS
  39. 39. 39 QUERY SERVICE Online search and booking, reviews and ratings • Property and room detail pages • Cross-sell links, up-sell links • Stars & likes & associated reviews • Their booking history Query Service behind every page display and click/navigation ANALYTICS SERVICE Reporting, Trend Analysis, Data Exploration • Daily discount availability report • Cities with highest room occupancy rates • Hotels with biggest single day drops • How many searches turn into bookings grouped by property rating? grouped by family size? Business Analysts ask these questions without knowing in advance every aspect of the question Query and Analytics Services - Examples
  40. 40. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 40 Shadow data for processing What is Couchbase Analytics? Fast Ingest Complex Queries on large datasets Real-time Insights for Business Teams DATA DATA DATA ANALYTICS ANALYTICS ANALYTICS ANALYTICS MPP architecture: parallelization among core and servers
  41. 41. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 41 Travel-sample model.
  42. 42. 42 Analytics: Setting up CREATE BUCKET travel WITH {"name":"travel-sample"}; CREATE DATASET hotel ON travel WHERE `type` = "hotel"; CREATE DATASET airline ON travel WHERE `type` = "airline"; CREATE DATASET airport ON travel WHERE `type` = "airport"; CREATE DATASET route ON travel WHERE `type` = "route"; CREATE DATASET landmarkON travel WHERE `type` = "landmark"; CONNECT BUCKET travel;
  43. 43. 43 Analytics: Queries SELECT airport.faa, count(*) route_count FROM airport LEFT OUTER JOIN route ON (airport.faa = route.sourceairport) GROUP BY airport.faa ORDER BY route_count desc
  44. 44. 44 Analytics: Queries SELECT airport.faa, airline.callsign,count(*) route_count FROM airport LEFT OUTER JOIN route ON (airport.faa = route.sourceairport) LEFT OUTERJOIN airline ON (route.airlineid = META(airline).id) GROUP BY airport.faa, airline.callsign ORDER BY route_count desc
  45. 45. 45 Analytics: Queries SELECT airport.faa, airline.callsign,count(*) route_count FROM airport INNER JOIN route ON (airport.faa = route.sourceairport) INNER JOIN airline ON (route.airlineid = META(airline).id) GROUP BY airport.faa, airline.callsign ORDER BY route_count desc
  46. 46. Couchbase Data Platform Develop with Agility. Deploy at any scale. World’sFirst Engagement Database
  47. 47. ©2017 Couchbase. All rights reserved. 47Sample Production Deployment NODE 1 NODE 12 Cluster Manager Data Full Text Search Analytics Global Index Query Built for Change at Scale Application Eventing
  48. 48. * N1QL FEATURES IN COUCHBASE 5.0 AND 5.5
  49. 49. 49 Couchbase N1QL and GSI features Query-Indexing Features • Large Indexing Keysize • Index key collation: ASC, DESC on each key • Index replicas, just like data replication • New storage engine: Plasma Query Language & Infrastructure • Subquery Expressions • Additional Date & time functions • Bitwise functions • CURL() within N1QL Query Optimizer • Complex Filters Pushdown • Pagination optimization • Optimization for ASC, DESC keys • Query-Index API optimization (projection, etc.) • Index projections, Intersect scans • Adaptive Indexes Security, Administration & Functionality • Security: RBAC: Statement level security • Query Monitoring, Profiling with UI • Query work bench and UI: Fully upgraded • Query UI: Visual Explain • Query on Ephemeral buckets • Application Continuity, Seamless Upgrade Performance • Core daily workload • YCSB • YCSB-JSON for Engagement Database http://query.couchbase.com
  50. 50. 50 Query-Indexing Enhancements Index key collation: ASC, DESC on each key • Prior to 5.0, each index key was sorted and kept in ASCENDING order only • To sort the key in descending order, you did • CREATE INDEX i1 ON t(c1 ASC, -c2, -c3) • SELECT * FROM t WHERE c1 = 10 and -c2 < -20 ORDER BY c1, -c2 • Query formulations becomes confusing • Cannot use this trick on all data types and expressions In Couchbase 5.0: • CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC) • SELECT * FROM t WHERE c1 = 10 and c2 < 20 ORDER BY c1,c2 DESC • You need to create an index to match the ORDER BY order • Reverse scans are still unsupported
  51. 51. 51 Query-Indexing Enhancements Large Indexing Keysize • Prior to 5.0, the sum of index key size could be up to 4096 bytes • This was controlled by the setting • For ARRAY keys, sum of all array key sizes could be up to 10240. • This is controlled by the setting max_array_seckey_size In Couchbase 5.0: • The total keysize could be pretty high – high up to 20 MB • This is true for single key, composite key, expressions and array indexes as well. • Simply do nothing, except create the index and issue the query. • The index entries that exceed 20MB will still generate error in the index log
  52. 52. 52 Query-Indexing Enhancements Index replicas, just like data replication • Prior to 5.0, you could create multiple indexes with same keys & condition • This is needed for load balancing and index high availabilitt CREATE INDEX i1 ON t(c1, c2, c3) CREATE INDEX i2 ON t(c1, c2, c3) CREATE INDEX i3 ON t(c1, c2, c3) • Indexer automatically recognizes these to be equivalent and does load balancing on all o these. In Couchbase 5.0: • Simply create one index and set the num_replica at CREATE or ALTER time • CREATE INDEX i1 ON t(c1, c2, c3) WITH {"num_replica":2} • Number of replicas can be up to number of nodes in the cluster • You can ALTER the number of replica dynamically
  53. 53. 53 Query-Indexing Enhancements New storage engine: Plasma • Index size can be arbitrarily large • Uses lock-free skip list • All the performance benefits of MOI – Memory Optimized Index • Automatically does IO as needed • From usage point of view: • Choose the standard secondary Index during installation • simply create any kind of index and use it.
  54. 54. 54 Query Language & Infrastructure Subquery Expressions • Provides rich functionality and Powerful subquery-expressions • Can be used in FROM-clause, projection, LET/WHERE-clauses etc., SELECT word, cnt FROM ARRAY split(i) FOR i IN (SELECT raw name FROM `travel-sample` WHERE type = "hotel") END AS words UNNEST words w GROUP BY w LETTING cnt = COUNT(w) ORDER BY cnt DESC;
  55. 55. 55 Query Language & Infrastructure Additional Date, time, timestamp functions • JSON does not directly support date and time related data types • Store the date and time in extended ISO 8901 format • "2017-10-16T18:44:43.308-07:00” • Need extract, conversion and arithmetic functions • Detailed article with all the functions and Oracle to Couchbase mapping https://dzone.com/articles/comparing-oracle-and-n1ql-support-for-the-date-tim • If you can’t do something, let us know!
  56. 56. 56 Query Language & Infrastructure CURL() within N1QL • CURL (URL, [options]) • The first argument is the URL, which represents any URL that points to a JSON endpoint. • Only URLs with the http:// or the https:// protocol are supported. • Redirection is disabled. • The second argument is a list of options. • This is a JSON object that contains a list of curl options and their corresponding values. • For a full list of options that we support, please refer to the Dzone article on CURL in N1QL by Isha Kandaswamy •
  57. 57. 57 CURL() from N1QL • Search for Santa Cruz in Spain using my Google dev api key SELECT CURL("GET","https://maps.googleapis.com/maps/api/geocode/json", {"data":"address=santa+cruz&components=country:ES&key=AIzaSyCT6niGCMsgegJkQ SYasfoLZ4_rSO59XQQ"}) ; • Live translate your text to another language. SELECT ginfo FROM ( SELECT r.content as english, curl("https://translation.googleapis.com/language/translate/v2?key=PUT YOUR KEYS HERE", {"request": "POST", "header":"Content-Type: application/json", "data": mydata }) AS french FROM `travel-sample` h USE KEYS "hotel_10142" UNNEST h.reviews r LET mydata = '{ "q":"' || r.content || '", "target": "fr"}') AS ginfo
  58. 58. 58 Query Language & Infrastructure CURL() within N1QL
  59. 59. 59 Query Language & Infrastructure BITWISE Functions • All bitwise functions can only take a number. All numbers are 64 bit signed numbers (integers). • If the Number is not an integer and for other data types, we throw an error. • When looking at the value in binary form, bit 1 is the Least Significant Bit (LSB) and bit 32 is the Most Significant Bit. (MSB) Bit 32 → 0000 0000 0000 0000 0000 0000 0000 0000 ← Bit 1 (LSB) BitAND BitOR BitNOT BitXOR BitSHIFT BitSET BitCLEAR BitTEST/ IsBitSET
  60. 60. 60 Query Optimizer & Execution: Stable Scans • IndexScan use to do single range scan (i.e single Span) • If the query has multiple ranges (i.e. OR, IN, NOT clauses) Query service used to do separate IndexScan for each range. • This causes Indexer can use different snapshot for each scan (make it unstable scan) • Number of IndexScans can grow and result increase in index connections • In 5.0.0 multiple ranges are passed into indexer and indexer uses same snapshot for all the ranges. • This makes stable Scan for given IndexScan (i.e. IndexScan2 in the EXPLAIN). • This will not make stable scan for query due to Subqueries, Joins etc • Example: CREATE INDEX ix1 ON default(k0); EXPLAIN SELECT META().id FROM default WHERE k0 IN [10,12,13];
  61. 61. 61 Query Optimizer & Execution: Pushdown Composite Filters • For composite Index the spans that pushed to indexer contains single range for all composite keys together. • Indexer will not applying range for each part of the key separately. This result in lot of false positives. • In 5.0.0 with IndexScan2 we push the each index key range separately and indexer will apply keys separately. • This results in no/less false positives and aides push more information to indexer. CREATE INDEX ix1 ON default(k0,k1); EXPLAIN SELECT meta().id FROM default WHERE k0 BETWEEN 0 AND 100 AND k1 = 200;
  62. 62. 62 Query Optimizer: ORDER, OFFSET, LIMIT pushdown • Pagination queries can contain any combination of ORDER, LIMIT, OFFSET clauses. • Performance of these queries are critical to applications. • When Predicates are completely and exactly pushed to indexer, by pushing offset, limit to indexer can improve query performance significantly. If that happened IndexScan2 section of EXPLAIN will have limit,offset. • If query ORDER BY matches index key order query can avoid index sort and performance can be improved significantly. If that happened order operator is not present in the EXPLAIN. • Example: CREATE INDEX ix1 ON default(k0,k1); EXPLAIN SELECT meta().id FROM default WHERE k0 > 10 AND k1 > 20 ORDER BY k0 LIMIT 10 OFFSET 100;
  63. 63. 63 Query Optimizer: MAX pushdown • If the MAX arguments matched with Index leading key exploit index order for MAX. • MAX can only DESC on index key. • MIN can only use ASC on index key. • Example : CREATE INDEX ix5 ON default(k0 DESC); SELECT MAX(k0) FROM default WHERE k0 > 10; • Above query able to exploit index order. In that case IndexScan2 section of EXPLAIN will have “limit” 1.
  64. 64. 64 Query Optimizer: Index Projection • The index can have many keys but query might be interested only subset of keys. • By only requesting required information can save lot of network transportation, memory, cpu, backfill etc. All this can help in performance and scaling the cluster. • The requested information can be found in “IndexScan2” Section of EXPLAIN as “index_projection” "index_projection": { "entry_keys": [1, 5 ], "primary_key": true } CREATE INDEX ix1 ON default(k0,k1,k2,k3,k4, k5); EXPLAIN SELECT meta().id, k1, k5 FROM default WHERE k0 > 10 AND k1 > 20;
  65. 65. 65 Query Optimizer: Index Projection CREATE INDEX ix1 ON default(k0,k1); Covered query SELECT k0 FROM default WHERE k0 = 10 AND k1 = 100; "index_projection": {"entry_keys": [0,1]} SELECT k0 FROM default WHERE k0 = 10; "index_projection": {"entry_keys": [0]} SELECT k0 ,META().idFROM default WHERE k0 = 10; "index_projection": {"entry_keys": [0],“primary_key”: true} Non-covered query SELECT k0 ,k5 FROM default WHERE k0 = 10 AND k1 = 100; "Index_projetion": { “primary_key”: true }
  66. 66. 66 Query Execution: CAS & Expiration • In 5.0.0 META().cas, META().expiration can be indexed and used in queries. • Example: • CREATE INDEX ix1 ON default( meta().id, meta().cas, meta().expiration); • SELECT meta().id , meta().cas, meta().expiration FROM default where meta().id > "" • Note: META().expiration will work in covered queries. For non covered queries it gives 0
  67. 67. 67 Query Execution: COUNT (DISTINCT expr) • If the expr matched with Index leading key COUNT DISTINCT can be pushed to indexer • Complete predicate needs to pushed to indexer exactly • No false positives are possible • No group or JOIN • Only single projection • Example : CREATE INDEX ix5 ON default(k0); SELECT COUNT(DISTINCT k0) FROM default WHERE k0 > 10; • Above query uses IndexCountDistinctScan2
  68. 68. 68 Customer Scenario • Customer document has 100 fields • They have multiple business entities sharing the same data • Each entity want to FILTER, GROUP, ORDER on distinct criteria • For Index selection, order of the keys in the composite index is important. Fields: c1 through c100 Filter fields: c1 through c50 Group, order and projection: Any from c1 through c100 SELECT c1, c2, c3, COUNT(c10), SUM(c5) FROM CUSTOMER WHERE c4 = "CXT-MULTI" AND c8 = "iPhone6" AND c9 BETWEEN 10 IN 20 GROUP BY c1, c2, c3; SELECT c12, COUNT(c19), SUM(c15) FROM CUSTOMER WHERE c44 = "CXT-MULTI" AND c18 = "Gpixel 2" AND c29 BETWEEN 10 IN 20 GROUP BY c12;
  69. 69. 69 Customer Scenario • What indexes to create for this? SELECT c1, c2, c3, COUNT(c10), SUM(c5) FROM CUSTOMER WHERE c4 = "CXT-MULTI" AND c8 = "iPhone6" AND c9 BETWEEN 10 IN 20 GROUP BY c1, c2, c3; CREATE INDEX i1 ON CUSTOMER(c8, c4, c9) CREATE INDEX i1 ON CUSTOMER(c8, c4, c9, c1, c2, c3, c10, c5); For Covering the query What about this? SELECT c12, COUNT(c19), SUM(c15) FROM CUSTOMER WHERE c44 = "CXT-MULTI" AND c18 = "Gpixel 2" AND c29 BETWEEN 10 IN 20 GROUP BY c12;
  70. 70. 70 Large, wide, composite indexes Filter fields: c1 through c50 To support all combinations of 50 predicates via composite indexes, you’ll need LOT of indexes. 50! =30414093201713378043612608166064768844377641568 960512000000000000
  71. 71. 71 Customer Scenario Solution: Intersection • Option 1 • Create indexes on individual fields • Scan individual indexes • Apply the full set of predicates (boolean expression from WHERE clause) • Then do the post processing. CREATE INDEX i1 on CUSTOMER(c1); CREATE INDEX i2 on CUSTOMER(c2); CREATE INDEX i3 on CUSTOMER(c3); • Option 2 • Too many indexes to maintain and manage. • Don’t even talk about equivalent indexes for each of these. CREATE INDEX i1to50 on CUSTOMER(DISTINCT PAIRS({c1, c2, c3, c4, c5,c6, c7, c8, c9, c10, c11, c23, c13, c14, …});
  72. 72. 72 Solution: Intersection • Option 3 • Too many keys to manage/specify • The document is flexible. I want the index to be flexible. CREATE INDEX ixpairon CUSTOMER(DISTINCT PAIRS(self)); SELECT * FROM CUSTOMER WHERE a = 10 and b < 20 and c between 30 and 40; "#operator": "IntersectScan", "scans": [ { "#operator": "DistinctScan", "scan": { "#operator": "IndexScan2", "index": "ixpair", "index_id": "466c0c5c4c3b21c1", "index_projection": { "primary_key": true }, "keyspace": "test", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "["a", 10]", "inclusion": 3, "low": "["a", 10]" } "range": [ { "high": "["b", 20]", "inclusion": 1, "low": "["b", false]" } "range": [ { "high": "[successor("c")]", "inclusion": 1, "low": "["c", 30]" } ]
  73. 73. 73 Flexible Indexing • This is not a silver bullet, yet. • TRY THIS OUT • SIZING is a concern because we {“Key“:“value“} • Give us feedback
  74. 74. 74 SECURITY : GRANT and REVOKE to roles • Query_select, query_insert, query_update, query_delete roles • Parameterized: query_select[customers] or query_insert[*] • Query_manage_index[foo] • Create, delete, build indexes on bucket foo • Query_system_catalog • Full access to the system tables (which are controlled now) • Query_external_access • Allows access to CURL() function (disabled by default) GRANT cluster_admin TO spock GRANT query_select ON default TO kirk REVOKE query_insert, query_delete ON bridge, engineering FROM mccoy, scotty
  75. 75. 75 Monitoring in UI 75Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
  76. 76. 76 Profiling in UI 76Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.
  77. 77. 77 Profiling • We can collect execution timings and document processed on a per operator basis • If the functionality is turned on, timings are reported • with the metrics at the end of execution • in system:active_requests • in system:completed_requests • Profiling is turned on • at the request level via the “profile” REST API parameter, EG from cbq: • set –profile timings; • at the node level via the “profile” command line parameter or admin settings REST API parameter • takes 3 values, “off”, “phases”, “timings” • “phases” supplies total times for each operator class • “timings” supplies detailed information for each operator
  78. 78. 78 Profiling cbq> select * from `travel-sample` where source-airport is not missing; … "executionTimings": { "~children": [ { "#operator": "IndexScan2", "#stats": { "#itemsOut": 24024, "#phaseSwitches": 96099, "execTime": "55.370283ms", "kernTime": "5.397199311s" }, "index": "def_sourceairport", "index_id": "29702e564c9d2ca4", "index_projection": { "primary_key": true }, "keyspace": "travel-sample", "namespace": "default", "spans": [ { "exact": true, "range": [ { "inclusion": 1,
  79. 79. 79 Developer Tooling
  80. 80. 80 N1QL Performance: 5.0 vs. 4.5 • Run internally • YCSB is the public YCSB • other queries are written on Couchbase dataset • 50% higher throughput in YCSB workload E • 10-40x faster pagination queries • 10-30x better performance of queries with composite filters • 10-40x faster queries with COUNT function • 6-9x better performance of basic queries (Q1 & Q2) • 55x faster queries with UNNEST clause
  81. 81. 81 N1QL Performance: 5.0 vs. 4.5 • Up to 10x faster array indexing • Fast text search with TOKENS() • 10x better performance of lookup and index joins • Query performance on Windows is on par with Linux • Up to 100K index scans per second in DGM scenarios
  82. 82. 4 N1QL FEATURES IN COUCHBASE 5.5
  83. 83. 83 Language Features • ANSI Joins support • INNER JOIN • LEFT OUTER • RIGHT OUTER • NEST and UNNEST • JOIN on arrays Security & Infra Features • PREPARE Infrastructure • N1QL Auditing • X.509 Support • IPV6 Support • Backfill Performance Features • GROUP BY performance • Aggregation performance • Index Partitioning • parallelization with Partitioned index • Query pipeline performance • Hash join • YCSB-JSON Query Workbench Features • Visual Explain improvements • Tabular document editor • Parameters for Query • Easy copy results to Excel N1QL & Indexing features in Couchbase 5.5
  84. 84. 84 5.5 Features: ANSI JOIN What? • ANSI standard for SQL join specification • Supported in all major relational databases Why? • Lowering barrier for migration to Couchbase • Especially from relational databases • Address limitation of N1QL joins • Lookup join and index join requires joining on document key • Parent-child or child-parent join only • Only equi-join • Proprietary syntax How? • ON-clause to specify join condition, which can be any expression
  85. 85. 85 ANSI JOIN Examples SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c INNER JOIN orders o ON c.customerId = o.customerId; SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c LEFT OUTER JOIN orders o ON c.customerId = o.customerId SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c RIGHT OUTER JOIN orders o ON c.customerId = o.customerId SELECT meta(brewery).id brewery_id, brewery.name brewery_name FROM `beer-sample` brewery INNER JOIN `beer-sample` beer ON beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.type = "beer" WHERE brewery.type = "brewery" AND brewery.state = ”Kansas"
  86. 86. 86 ANSI JOIN Syntax SELECT … FROM keyspace1 <join_type> JOIN keyspace2 ON <join_expression> WHERE <filter_expression> • Supported JOIN Types • INNER, LEFT OUTER, RIGHT OUTER • ON-clause specifies join condition • <join_expression> is evaluated at time of join • Can have multiple JOIN clauses in one query block • WHERE-clause specifies filter condition • <filter_expression> is evaluated after the join is done, or “post-join” • One per query block
  87. 87. 87 ANSI JOIN : Designing Indexes • ANSI joins use indexes on both sides of the join. • JOINs are evaluated LEFT to RIGHT • For the first keyspace, optimizer chooses the index based on predicates in the WHERE clause and the ON clause. • For the second keyspace, only the ON clause is considered. • There should be an index on at least one key. • Composite lookup is used if there are matching composite keys • For hash join, need to have index on the build side. The probe side can make use of primary index, but not advisable.
  88. 88. 88 N1QL : Arrays Array { "hobbies": ["tennis", "skiing", "lego"]} { "orders": [582, 9721, 3814]} Object { "address": {"street": "1, Main street", "city": Morrisville, "state":"CA", "zip":"94824"} } Arrays of objects of arrays [ { "type": "visa", "cardnum": "5827-2842-2847-3909", "expiry": "2019-03" }, { "type": "master", "cardnum": "6274-2542-5847-3949", "expiry": "2018-12" } ]
  89. 89. 89 ANSI JOIN Support for Arrays • Array is an important construct in NoSQL world • Although the SQL standard does not include array support, we added support for arrays in our implementation of ANSI JOIN • Support arrays on the left-hand-side of ANSI JOIN, on the right-hand- side of ANSI JOIN, or on both sides of ANSI JOIN • Right-hand-side: use array index • Left-hand-side: use IN clause or UNNEST operation • Both sides: combination of above
  90. 90. 90 Play with ANSI JOIN Support for Arrays - Setup CREATE PRIMARY INDEX ON product; "product01", {"productId": "product01", "category": "Toys", "name": "Truck", "unitPrice": 9.25} "product02", {"productId": "product02", "category": "Kitchen", "name": "Bowl", "unitPrice": 5.50} "product03", {"productId": "product03", "category": "utensil", "name": "Spoons", "unitPrice": 2.40} CREATE PRIMARY INDEX ON purchase; "purchase01", {"purchaseId": "purchase01", "customerId": "customer01", "lineItems": [ {"productId": "product01", "count": 3}, {"productId": "product02", "count": 1} ], "purchasedAt": "2017-11-24T15:03:22”} "purchase02", {"purchaseId": "purchase02", "customerId": "customer02", "lineItems": [ {"productId": "product03", "count": 2} ], "purchasedAt": "2017-11-27T09:08:37”}
  91. 91. 91 ANSI JOIN Support for Arrays – Right-hand-side • Utilize array index defined on the right-hand-side keyspace CREATE INDEX purchase_ix1 ON purchase(DISTINCT ARRAY l.productId FOR l IN lineItems END) USING GSI SELECT p.name, pu.purchasedAt FROM product p JOIN purchase pu ON ANY l IN pu.lineItems SATISFIES l.productId = p.productId END
  92. 92. 92 ANSI JOIN Support for Arrays – Left-hand-side with UNNEST • Utilize UNNEST to flatten the left-hand-side array first CREATE INDEX product_ix1 ON product(productId) USING GSI SELECT p.name, pu.purchasedAt FROM purchase pu UNNEST pu.lineItems AS pl JOIN product p ON pl.productId = p.productId
  93. 93. 93 ANSI JOIN Support for Arrays – Left-hand-side with IN • Utilize IN-clause with array SELECT p.name, pu.purchasedAt FROM purchase pu JOIN product p ON p.productId IN ARRAY l.productId FOR l IN pu.lineItems END
  94. 94. 94 Difference Between UNNEST and IN-clause • UNNEST first make copies of the left-hand-side document, one for each element of the array. There is no copying if using IN-clause • If there are duplicates in the array: • UNNEST makes copies for all duplicates • IN-clause does not care about duplicates • If performing LEFT OUTER JOIN • UNNEST makes copies and preserves all copies • IN-clause only preserves the original document
  95. 95. 95 ANSI JOIN Support for Arrays – Both-side with UNNEST SELECT pu1.purchaseId pid1, pu2.purchaseId pid2 FROM purchase pu1 UNNEST pu1.lineItems AS pl JOIN purchase pu2 ON ANY l IN pu2.lineItems SATISFIES l.productId = pl.productId END
  96. 96. 96 ANSI JOIN Support for Arrays – Both-side with IN-clause SELECT pu1.purchaseId pid1, pu2.purchaseId pid2 FROM purchase pu1 JOIN purchase pu2 ON ANY l2 IN pu2.lineItems SATISFIES l2.productId IN ARRAY l1.productId FOR l1 IN pu1.lineItems END END
  97. 97. 97 Block nested loop join SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer” WHERE brewery.type = "brewery" AND brewery.state = "California"
  98. 98. 98 HASH join SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer USE HASH(probe) ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer” WHERE brewery.type = "brewery" AND brewery.state = "California"
  99. 99. 99 HASH join SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer USE HASH(build) ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer” WHERE brewery.type = "brewery" AND brewery.state = "California"
  100. 100. 100 Hash JOIN • beer is the build side. • Scan beer to create the hash table • Brewery automatically becomes the probe. • Each keyspace is scanned once. SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer USE HASH(build) ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer” WHERE brewery.type = "brewery" AND brewery.state = "California"
  101. 101. 101 5.5 Features: HASH JOIN • Enterprise Edition only – Performance feature • ANSI JOIN query only • Only considered when (new) USE HASH hint is specified • USE HASH(build) or USE HASH(probe) • Specify USE HASH hint on right-hand side keyspace • Can combine USE HASH with USE INDEX or USE KEYS • Requires equality join predicate(s) • Hash join is preferred when USE HASH is specified, however, if hash join cannot be generated for some reason, nested-loop join is considered • Beneficial for “large” joins SELECT DISTINCT route.destinationairport FROM `travel-sample` airport INNER JOIN `travel-sample` route USE HASH(probe) INDEX(route_airports) ON airport.faa = route.sourceairport AND route.type = "route" WHERE airport.type = "airport" AND airport.city = "San Jose";
  102. 102. 102 HASH join • At least one equality expression between two keyspaces • When this is absent, silently defaults to nested loop join. • Ditto on community edition • Requires memory to build the hash table on the BUILD side. • Unsuitable when only few documents qualify and can be done via index lookup. • The build side has be the keyspace with LESS qualified documents. • This selection is manual until cost based decisions can be made automatically • Typically good for reporting kind of queries. • Once the query latency goes to seconds, consider hash joins!
  103. 103. 103 5.5: Hash JOIN
  104. 104. 104 Grouping and Aggregation Performance (EE) CUSTOMER PROBLEM OR SCENARIO • Grouping and Aggregation are expensive operations • Latencies are high and cluster is not scaling BENEFITS • High scalability • Low query latencies • Low TCO • Automatic: No changes to query or index SOLUTION • If query is covered by index, Let Indexer perform Grouping and Aggregation. • Eliminate network transportation and disk I/0 due to backfill. These are slowest operations. EXAMPLE: • CREATE INDEX idx ON `travel- sample`(type); • SELECT type, COUNT(1) AS typecount FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type;
  105. 105. 105 Aggregation Queries SELECT type FROM `travel-sample` WHERE type is NOT NULL GROUP BY type; SELECT COUNT(1) AS typecount FROM `travel-sample` WHERE type IS NOT NULL; SELECT type, COUNT(1) AS cnt FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type;
  106. 106. 106 Aggregation Example
  107. 107. 107 What is Group and Aggregation? • Grouping and Aggregations are part of SQL syntax. Used in Dashboards, Analytical, Reporting queries. • Query contains GROUP BY clause • Example : SELECT type FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type; • Query contains Aggregate functions. • Aggregate functions are MIN(), MAX(), COUNT(), SUM(), AVG(), ARRAY_AGG() • Aggregate functions can only be in Projection, Having, Letting clause • Aggregate functions can be present with or without GROUP BY. • When GROUP BY is absent it considered as single group. • SELECT type, COUNT(1) AS cnt FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type; • Returns how many documents present in each group.
  108. 108. 108 N1QL: Inside the Query Service Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service
  109. 109. 109 CREATE INDEX idx ON `travel-sample`(type); Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service SELECT type, COUNT(type) AS typecount FROM `travel-sample` WHERE type is not missing GROUP BY TYPE ORDER BY typecount Covered Query Path Note: Grayed out phases (Fetch, Join) are skipped in the query because the query is completely covered by the index scan.
  110. 110. 110 CREATE INDEX idx ON `travel-sample`(type); Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service SELECT type, COUNT(type) AS typecount FROM `travel-sample` WHERE type is not missing GROUP BY TYPE ORDER BY typecount Query with GROUP & Aggregate Pushdown Note: Grayed out phases (Fetch, Join, filter, pre-aggregate, aggregate, sort) are skipped in the query because indexer can in Couchbase 5.5, when the query qualifies.
  111. 111. 111 5.5: Index grouping and aggregation Performance SELECT store, AVG(quantity) AS avg_quantity FROM store_sales WHERE DATE_PART_STR(sold_date, 'year') = $1 AND customer.state = $2 GROUP BY store;
  112. 112. 112 5.5: Index grouping and aggregation Performance SELECT DATE_PART_STR(sold_date, 'year') AS year, item.brand, SUM(sales_price) AS sum_agg FROM store_sales WHERE item.manufacturer_id = $1 AND DATE_PART_STR(sold_date, 'month') = $1 GROUP BY DATE_PART_STR(sold_date, 'year'), item.brand ORDER BY DATE_PART_STR(sold_date, 'year'), sum_agg DESC, item.brand;
  113. 113. 113 5.5: Index Partitioning Manageability: Scale out GSI Index create index route on bucket(airline, flight, source_airport, destination_airport) partition by hash(airline) • Scale out partitions as cluster size grows • Partition key must be immutable -- meta().id, immutable secondary key (e.g. airline) Performance: Partition Elimination Select flight from bucket where airline is not missing and source_airport = “SFO” • Scatter-gather across all partitions Select flight from bucket where airline in [“UA”, “AA”] and source_airport = “SFO” • Partition Elimination when predicate contains partition key : Only scan the partitions specified in predicate for faster range query response Performance: Parallelize Aggregate Scan Select count(flight) from bucket where airline is not missing group by source_airport, destination_airport • Ability to parallelize scan on aggregate query across partitions
  114. 114. 114 5.5 Features: Index Partitioning Scan availability: Index scan can span partitions across replica create index route on bucket(airline, flight, source_airport, destination_airport) partition by hash(airline) with {“num_replica”:1} • A single index scan can pick any available partition across all replica Manageability: Repair Lost Partition • If partitions are lost due to node failover, those partitions can be repaired (rebuilt) on remaining nodes during rebalancing Performance: Scan Load Balancing • Scan traffic is load balanced on partitioned index across replica • Partition of a replica can be skipped if it is falling behind from other replica
  115. 115. 115 5.5: BACKFILL SETTING CUSTOMER PROBLEM OR SCENARIO • Results from index scan come back faster than query can consume. • These results are saved in a temporary file, known as backfill • In 5.0, this location was /tmp by default. • Customers can (and have) run out of space. • Make the backfill location configurable. • Settable in Web console or REST API • By default the path is /tmp – this is now settable • By default the Quota is • If set to -1 it means unlimited • If set to 0 then backfill is disabled • There is no upper limit, it depends on user system SOLUTION
  116. 116. 116 Vulcan: ALTER INDEX CUSTOMER PROBLEM OR SCENARIO • An imbalance occurs due to a particular index growing faster than expected and is needed on a different node. • An imbalance occurs due to a cluster of indexes being dropped on a single node. • A machine is scheduled for removal, so its indexes need to move off its current node. • The automated process of rebalancing does not give the expected results. SOLUTION Alter Index support to change the placement of an existing index or replica among different GSI nodes. For example, if a node fails and you need to move it from node 172.23.130.24 to node 172.23.130.25 ALTER INDEX `travel-sample`.idx1 WITH {"action":"move","nodes": ["172.23.120.25:8091"]}
  117. 117. 117 5.5 Features: Query Workbench Improvements Auto Explain / Visual Explain • EXPLAIN is automatically run before every query • User can check plans to see why query ran slow or fast • Improved Query Plan Visualization • Layout in any direction, better panning/zooming • Easier to read • Improved tooltips High Performance Tabular View • Tabular results view used to get slow with 750kb data • Now scales to > 100MB • Column headers always visible Export/Copy as Tab-separated Text • Send results to Excel by exporting or copying as tab-separated values. Copy button
  118. 118. 118 Vulcan Features: Query Workbench Improvements Positional and Named Parameters • Run prepared queries • Parameters specified in Preferences dialog Tabular Document Editor, with N1QL! • Previous document editor text only • Only supported searching by document key • New version: • Shows document in editable table • Can filter documents with N1QL WHERE clause • One button to copy, delete, edit, save document
  119. 119. 119 5.5: N1QL Auditing Problem: no idea who is doing what in the system. Solution: N1QL auditing • Auditing available for all statement types • SELECT, INSERT, UPDATE, … • Auditing also available for all API endpoints of query engine. • /admin/stats, /admin/config, /admin/prepareds, … • Configurable from UI • Security/Audit tab • Selectable • Choose what query types to audit • Whitelist of trusted users who will not be audited • Cost varies depending on query type and how much to audit • Worst case: many small queries, everything audited (approx 20% throughput loss)
  120. 120. 120 5.5 Features : Prepared Statement • Prepared statements now automatically distributed across N1QL nodes • Both in push and pull configuration • N1QL service monitors resource usage prior of execution • Statements silently prepared again if indexes or keyspaces change • …and automatically distributed to other nodes if reprepared
  121. 121. 121 5.5 Features: X.509 support • Couchbase Server uses X.509 certificates to encrypt its client-server communication • Query service automatically refreshes certificates when server certificates are updated • Query doesn’t mandate the presence of authorization headers • It supports client certificate authorization • To run a query successfully with client certificates – curl --cacert ./root/ca.pem --cert-type PEM --cert ./client/client/chain.pem --key-type PEM –key ./client/client/client.key https://localhost:18093/query/ser vice -d "statement=select * from system:keyspaces"
  122. 122. 122 5.5: IPv6 support • Make query IPv6 compliant. • Server passes in --ipv6 to query which takes the value true or false. This determines the mode in which the query service needs to operate. The default value is false (IPv4). • IPv6 equivalents of 127.0.0.1 and 0.0.0.0 is ‘::1’ and ‘::’ respectively. • Construction of URLs • If we are using hostnames or fully qualified domain names there will be no difference. • For constructing URLs with raw IPv6 addresses – • the IPv6 address must be enclosed within ‘[‘ and ‘]’ brackets. When we construct urls using localhost for example. For example: http://[::1]:8091/pools/default • CBQ shell also supports connection to IPV6 addresses. curl --cacert ./root/ca.pem --cert-type PEM --cert ./client/client/chain.pem --key-type PEM –key ./client/client/client.key https://localhost:18093/query/service -d "statement=select * from system:keyspaces"
  123. 123. 123 5.5 Features : Curl Whitelist CUSTOMER PROBLEM OR SCENARIO • The curl_whitelist.json file needed to be created on every query node in … /var/lib/couchbase/n1qlcerts/ by the user • Could have different values for each query node • Needed to be part of cbcollect info SOLUTION • UI now supports setting curl whitelist. • This is propagated to all query nodes
  124. 124. 124 5.5: Query performance Query workload - TCO Improvement Queries/Sec • Average N1QL throughput improved by 50%+ • Latency improved by 20% in performance test bed • Memory consumption reduced substantially • Faster document loading from KV • Memory and CPU usage improvements in projector as well Index grouping and aggregation • Latency 5 times to 10 times lower • Throughput 10 times to 20 times higher Workload 5.0 5.5 Improvement USE KEYS 33K 64K 95% Equality predicate 22K 33K 55%
  125. 125. READ THIS!
  126. 126. 5 ANSI JOINS OVERVIEW
  127. 127. 127 ANSI JOIN What? • ANSI standard for SQL join specification • Supported in all major relational databases Why? • Lowering barrier for migration to Couchbase • Express business questions easily • Especially from relational databases • Address limitation of N1QL joins • Lookup join and index join requires joining on document key • Parent-child or child-parent join only • Only equi-join • Proprietary syntax How? SELECT beer.name beername, brewery.name breweryname FROM `beer-sample` beer INNER JOIN `beer-sample` brewery ON (beer.brewery_id = brewery.myid AND beer.name LIKE "A%") WHERE beer.type = "beer";
  128. 128. 128 HASH JOIN • Enterprise Edition only • ANSI JOIN query only • Only considered when (new) USE HASH hint is specified • USE HASH(build) or USE HASH(probe) • Specify USE HASH hint on right-hand side keyspace • Can combine USE HASH with USE INDEX or USE KEYS • Requires equality join predicate(s) • Hash join is preferred when USE HASH is specified, however, if hash join cannot be generated for some reason, nested-loop join is considered • Beneficial for “large” joins SELECT DISTINCT route.destinationairport FROM `travel-sample` airport JOIN `travel-sample` route USE HASH(probe) INDEX(route_airports)
  129. 129. * ANSI JOINS DETAILS
  130. 130. 130 What is ANSI JOIN? • ANSI standard for SQL join specification (SQL-99 standard) • Supported in all major relational databases • Clean separation of join conditions and filter conditions
  131. 131. 131 Why Support ANSI JOIN? • Lowering barrier for migration to Couchbase • Especially from relational databases • Address limitations of N1QL joins • Supports any expression in the ON clause. • Enables flexible data modeling • Queries can do more than child-to-parent and parent-to-children queries.
  132. 132. 132 N1QL Join Limitations • Requires document key from one side of the join to be produced by the other side of the join • Somewhat similar to primary_key/foreign_key join in relational world • parent-to-child or child-to-parent only • Only equi-join on document key (implicit join to meta().id) • single equi-join expression only – implied through join syntax • Join syntax is proprietary • Lack of flexibility
  133. 133. 133 N1QL Join – Lookup Join SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM orders o INNER JOIN customer c ON KEYS o.customerId SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM orders o INNER JOIN customer c ON (o.customerId = META(c).id); Logically equivalent to:
  134. 134. 134 N1QL Join – Index Join CREATE INDEX orders_ix1 ON orders(customerId) SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c INNER JOIN orders o ON KEY o.customerId FOR c • Uses ON KEY… FOR clause • Requires an index defined on the ON KEY expression, which corresponds to document key of the left-hand-side document • Use document key from left-hand-side document to probe the index to retrieve right-hand-side document, and perform the join
  135. 135. 135 How is ANSI JOIN Different? • Join can be done on arbitrary expressions • Don’t have to join on document key • Complex expressions (SUBSTR(), REPLACE(), etc)) • Non-equal join conditions (>, <, >=, <=, !) • Multiple join conditions with AND, OR, NOT, etc. • Standardized syntax • Ease of migration from relational databases • Separation of join conditions and filter conditions
  136. 136. 136 ANSI JOIN Syntax SELECT … FROM keyspace1 <join_type> JOIN keyspace2 ON <join_expression> WHERE <filter_expression> • JOIN TYPE: INNER, LEFT OUTER, RIGHT OUTER • ON-clause specifies join condition • <join_expression> is evaluated at time of join • One per join, can have multiple ON-clauses in a query block • WHERE-clause specifies filter condition • <filter_expression> is evaluated after the join is done, or “post-join” • One per query block
  137. 137. 137 ANSI JOIN Variations • INNER JOIN • default • LEFT OUTER JOIN • Preserves left-hand-side document if join condition not met • RIGHT OUTER JOIN • Preserves right-hand-side document if join condition not met • FULL OUTER JOIN • Preserves both sides if join condition not met • CROSS JOIN • No join condition (no ON clause), “Cartesian Join”
  138. 138. 138 ANSI JOIN Support in Couchbase 5.5 • INNER JOIN • default • LEFT OUTER JOIN • Preserves left-hand-side document if join condition not met • RIGHT OUTER JOIN • Preserves right-hand-side document if join condition not met • FULL OUTER JOIN • Preserves both sides if join condition not met • CROSS JOIN • No join condition (no ON clause), “Cartesian Join”
  139. 139. 139 Play with ANSI JOIN - Setup • Create buckets “customer”, “orders” CREATE PRIMARY INDEX ON customer CREATE PRIMARY INDEX ON orders INSERT INTO customer(KEY, VALUE) VALUES ("customer01", {"customerId": "customer01", "lastName": "Doe", "firstName": "John", "zipcode": "97040"}), VALUES("customer02", {"customerId": "customer02", "lastName": "Doe", "firstName": "Jane", "zipcode": "97040"}) INSERT INTO orders(KEY, VALUE) VALUES ("orders01", {"ordersId": "orders01", "customerId": "customer01", "productId": "product01", "count": 3, "unitPrice": 9.25, "totalPrice": 27.75}), VALUES ("orders02", {"ordersId": "orders02", "customerId": "customer01", "productId": "product02", "count": 1, "unitPrice": 5.50, "totalPrice": 5.50}), VALUES ("orders03", {"ordersId": "orders03", "productId": "product03", "count": 2, "unitPrice": 2.40, "totalPrice": 4.80})
  140. 140. 140 Play with ANSI JOIN - Setup • Create buckets “customer”, “orders” CREATE PRIMARY INDEX ON customer CREATE PRIMARY INDEX ON orders INSERT INTO customer(KEY, VALUE) VALUES ("customer01", {"customerId": "customer01", "lastName": "Doe", "firstName": "John", "zipcode": "97040"}), VALUES("customer02", {"customerId": "customer02", "lastName": "Doe", "firstName": "Jane", "zipcode": "97040"}) INSERT INTO orders(KEY, VALUE) VALUES ("orders01", {"ordersId": "orders01", "customerId": "customer01", "productId": "product01", "count": 3, "unitPrice": 9.25, "totalPrice": 27.75}), VALUES ("orders02", {"ordersId": "orders02", "customerId": "customer01", "productId": "product02", "count": 1, "unitPrice": 5.50, "totalPrice": 5.50}), VALUES ("orders03", {"ordersId": "orders03", "productId": "product03", "count": 2, "unitPrice": 2.40, "totalPrice": 4.80})
  141. 141. 141 ANSI JOIN Example – Inner Join SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c INNER JOIN orders o ON c.customerId = o.customerId • For inner joins, the INNER keyword is optional FROM customer c JOIN orders o ON …
  142. 142. 142 ANSI JOIN Support Requires an Appropriate Index • An appropriate secondary index must be defined on the right-hand-side keyspace for ANSI JOIN to work properly • Primary index is NOT considered for ANSI JOIN • Index selection utilizes the ON-clause filters as well as the WHERE clause filters CREATE INDEX customer_ix1 ON customer(customerId) USING GSI; CREATE INDEX orders_ix1 ON orders(customerId) USING GSI;
  143. 143. 143 Result of Inner Join { "customerId": "customer01", "firstName": "John", "lastName": "Doe", "ordersId": "orders01” }, { "customerId": "customer01", "firstName": "John", "lastName": "Doe", "ordersId": "orders02” }
  144. 144. 144 Index Hints is Allowed on Right-hand-side Keyspace • In N1QL joins (lookup join and index join) index hints can only be used on the left-hand-side keyspace • In ANSI JOIN index hints can be used on both left-hand-side and right-hand-side keyspaces SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c INNER JOIN orders o USE INDEX (orders_ix1) ON c.customerId = o.customerId;
  145. 145. 145 ANSI JOIN Example – Left Outer Join SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c LEFT OUTER JOIN orders o ON c.customerId = o.customerId • For left outer joins, the OUTER keyword is optional FROM customer c LEFT JOIN orders o ON …
  146. 146. 146 Result of Left Outer Join { "customerId": "customer01", "firstName": "John", "lastName": "Doe", "ordersId": "orders01” }, { "customerId": "customer01", "firstName": "John", "lastName": "Doe", "ordersId": "orders02” }, { "customerId": "customer02", "firstName": "Jane", "lastName": "Doe”
  147. 147. 147 ANSI JOIN Example – Left Outer Join Variation SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c LEFT OUTER JOIN orders o ON c.customerId = o.customerId WHERE c.customerId IS NOT MISSING • In case you don’t want to preserve the left-hand-side documents that do not have customerId field • Without it all left-hand-side documents will be preserved
  148. 148. 148 ANSI JOIN Example – Right Outer Join SELECT c.lastName, c.firstName, c.customerId, o.ordersId FROM customer c RIGHT OUTER JOIN orders o ON c.customerId = o.customerId • For right outer joins, the OUTER keyword is optional FROM customer c RIGHT JOIN orders o ON …
  149. 149. 149 Result of Right Outer Join { "customerId": "customer01", "firstName": "John", "lastName": "Doe", "ordersId": "orders01” }, { "customerId": "customer01", "firstName": "John", "lastName": "Doe", "ordersId": "orders02” }, { "ordersId": "orders03” }
  150. 150. 150 ON-clause Complexity – Complex Expression CREATE INDEX beer_brewery_id ON `beer-sample` (brewery_id) WHERE type = "beer" CREATE INDEX brewery_state ON `beer-sample` (state) WHERE type = "brewery" SELECT meta(brewery).id brewery_id, brewery.name brewery_name FROM `beer-sample` brewery INNER JOIN `beer-sample` beer ON beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.type = "beer" WHERE brewery.type = "brewery" AND brewery.state = ”Kansas"
  151. 151. 151 ON-clause Complexity – Multiple Join Keys CREATE INDEX airport_location ON `travel-sample`(country, city) WHERE type = "airport" SELECT * FROM `travel-sample` hotel JOIN `travel-sample` airport ON hotel.country = airport.country AND hotel.city = airport.city AND airport.type = "airport" WHERE hotel.type = "hotel" AND hotel.free_internet = true LIMIT 5
  152. 152. 152 ON-clause Complexity – Non-equal Join Keys CREATE INDEX beer_updated ON `beer-sample`(updated) WHERE type="beer" SELECT * FROM `beer-sample` brewery JOIN `beer-sample` beer ON brewery.updated >= beer.updated AND beer.type = "beer" WHERE brewery.type = "brewery" AND brewery.state = ”California" LIMIT 5
  153. 153. 153 ON-clause Complexity – OR clause SELECT * FROM `beer-sample` brewery JOIN `beer-sample` beer ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) OR beer.updated = brewery.updated) AND beer.type = "beer" WHERE brewery.type = "brewery" AND brewery.state = ”California” LIMIT 5
  154. 154. 154 Separation of Join Conditions and Filter Conditions • ON clause • Join condition • Evaluated as part of the join • For outer join, part of determination of whether to preserve the dominant side • WHERE clause • Filter condition • Evaluated after all joins (i.e. post-join) • Can potentially disqualify “preserved” result of an outer join
  155. 155. 155 Be Careful with Filters in an Outer Join SELECT brewery.name brewery_name, beer.name beer_name FROM `beer-sample` brewery LEFT OUTER JOIN `beer-sample` beer ON beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) WHERE beer.type = "beer" AND brewery.type = "brewery" AND brewery.state = "KS" • Left outer join may produces result with left-hand-side preserved (right-hand- side missing • However the WHERE clause filter will discard all such results • Net effect is an INNER JOIN
  156. 156. 156 Put Filters that are Part of a JOIN on the ON-clause SELECT brewery.name brewery_name, beer.name beer_name FROM `beer-sample` brewery LEFT OUTER JOIN `beer-sample` beer ON beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.type = "beer" WHERE brewery.type = "brewery" AND brewery.state = "KS" • ON-clause is evaluated as part of join processing • Left Outer Join can still produce “preserved” result
  157. 157. 157 N1QL : Arrays Array { "hobbies": ["tennis", "skiing", "lego"]} { "orders": [582, 9721, 3814]} Object { "address": {"street": "1, Main street", "city": Morrisville, "state":"CA", "zip":"94824"} } Arrays of objects of arrays [ { "type": "visa", "cardnum": "5827-2842-2847-3909", "expiry": "2019-03" }, { "type": "master", "cardnum": "6274-2542-5847-3949", "expiry": "2018-12" } ]
  158. 158. 158 ANSI JOIN Support for Arrays • Array is an important construct in NoSQL world • Although the SQL standard does not include array support, we added support for arrays in our implementation of ANSI JOIN • Support arrays on the left-hand-side of ANSI JOIN, on the right-hand-side of ANSI JOIN, or on both sides of ANSI JOIN • Right-hand-side: use array index • Left-hand-side: use IN clause or UNNEST operation • Both sides: combination of above
  159. 159. 159 Play with ANSI JOIN Support for Arrays - Setup • Create buckets “product”, “purchase” CREATE PRIMARY INDEX ON product CREATE PRIMARY INDEX ON purchase INSERT INTO product(KEY,VALUE) VALUES ("product01", {"productId": "product01", "category": "Toys", "name": "Truck", "unitPrice": 9.25}), VALUES("product02", {"productId": "product02", "category": "Kitchen", "name": "Bowl", "unitPrice": 5.50}), VALUES ("product03", {"productId": "product03", "category": "utensil", "name": "Spoons", "unitPrice": 2.40}) INSERT INTO purchase(KEY, VALUE) VALUES ("purchase01", {"purchaseId": "purchase01", "customerId": "customer01", "lineItems": [ {"productId": "product01", "count": 3}, {"productId": "product02", "count": 1} ], "purchasedAt": "2017-11-24T15:03:22"}), VALUES ("purchase02", {"purchaseId": "purchase02", "customerId": "customer02", "lineItems": [ {"productId": "product03", "count": 2} ], "purchasedAt": "2017-11- 27T09:08:37"})
  160. 160. 160 ANSI JOIN Support for Arrays – Right-hand-side • Utilize array index defined on the right-hand-side keyspace CREATE INDEX purchase_ix1 ON purchase(DISTINCT ARRAY l.productId FOR l IN lineItems END) USING GSI SELECT p.name, pu.purchasedAt FROM product p JOIN purchase pu ON ANY l IN pu.lineItems SATISFIES l.productId = p.productId END
  161. 161. 161 ANSI JOIN Support for Arrays – Left-hand-side with UNNEST • Utilize UNNEST to flatten the left-hand-side array first CREATE INDEX product_ix1 ON product(productId) USING GSI SELECT p.name, pu.purchasedAt FROM purchase pu UNNEST pu.lineItems AS pl JOIN product p ON pl.productId = p.productId
  162. 162. 162 ANSI JOIN Support for Arrays – Left-hand-side with IN • Utilize IN-clause with array SELECT p.name, pu.purchasedAt FROM purchase pu JOIN product p ON p.productId IN ARRAY l.productId FOR l IN pu.lineItems END
  163. 163. 163 Difference Between UNNEST and IN-clause • UNNEST first make copies of the left-hand-side document, one for each element of the array. There is no copying if using IN-clause • If there are duplicates in the array: • UNNEST makes copies for all duplicates • IN-clause does not care about duplicates • If performing LEFT OUTER JOIN • UNNEST makes copies and preserves all copies • IN-clause only preserves the original document
  164. 164. 164 ANSI JOIN Support for Arrays – Both-side with UNNEST SELECT pu1.purchaseId pid1, pu2.purchaseId pid2 FROM purchase pu1 UNNEST pu1.lineItems AS pl JOIN purchase pu2 ON ANY l IN pu2.lineItems SATISFIES l.productId = pl.productId END
  165. 165. 165 ANSI JOIN Support for Arrays – Both-side with IN-clause SELECT pu1.purchaseId pid1, pu2.purchaseId pid2 FROM purchase pu1 JOIN purchase pu2 ON ANY l2 IN pu2.lineItems SATISFIES l2.productId IN ARRAY l1.productId FOR l1 IN pu1.lineItems END END
  166. 166. 166 ANSI NEST • NEST is specific to NoSQL • N1QL currently supports lookup nest and index nest, similar to lookup join and index join • ANSI NEST support is added, similar to ANSI JOIN SELECT * FROM customer c NEST orders o ON c.customerId = o.customerId
  167. 167. * HASH JOINS
  168. 168. 168 JOIN implementation so far • Block nested loop join SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer"WHERE brewery.type = "brewery" AND brewery.state = "California"
  169. 169. 169 HASH join SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer USE HASH (probe) ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer"WHERE brewery.type = "brewery" AND brewery.state = "California"
  170. 170. 170 HASH join • Block nested loop join SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer USE HASH (build) ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer"WHERE brewery.type = "brewery" AND brewery.state = "California"
  171. 171. 171 Hash JOIN • Block nested loop join SELECT COUNT(1) FROM `beer-sample` brewery JOIN `beer-sample` beer USE HASH (build) ON (beer.brewery_id = LOWER(REPLACE(brewery.name, " ", "_")) AND beer.updated = brewery.updated) AND beer.type = "beer"WHERE brewery.type = "brewery" AND brewery.state = "California" • beer is the build side. • Scan beer to create the hash table • Brewery automatically becomes the probe. • Each keyspace is scanned once.
  172. 172. 172 HASH join • At least one equality expression between two keyspaces • When this is absent, silently defaults to nested loop join. • Ditto on community edition • Requires memory to build the hash table on the BUILD side. • Unsuitable when only few documents qualify and can be done via index lookup. • The build side has be the keyspace with LESS qualified documents. • This selection is manual until cost based decisions can be made automatically • Typically good for reporting kind of queries. • Once the query latency goes to seconds, consider hash joins!
  173. 173. * ANSI JOINS DEMO
  174. 174. 6 GROUP BY & AGGREGATION PERFORMANCE
  175. 175. 175 Grouping and Aggregation Performance (EE) CUSTOMER PROBLEM OR SCENARIO • Grouping and Aggregation are expensive operations • Latencies are high and cluster is not scaling BENEFITS • High scalability • Low query latencies • Low TCO • Automatic: No changes to query or index SOLUTION • If query is covered by index, Let Indexer perform Grouping and Aggregation. • Eliminate network transportation and disk I/0 due to backfill. These are slowest operations. EXAMPLE: • CREATE INDEX idx ON `travel- sample`(type); • SELECT type, COUNT(1) AS typecount FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type;
  176. 176. 176 N1QL: Inside the Query Service Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service
  177. 177. 177 CREATE INDEX idx ON `travel-sample`(type); Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service SELECT type, COUNT(type) AS typecount FROM `travel-sample` WHERE type is not missing GROUP BY TYPE ORDER BY typecount Covered Query Path Note: Grayed out phases (Fetch, Join) are skipped in the query because the query is completely covered by the index scan.
  178. 178. 178 CREATE INDEX idx ON `travel-sample`(type); Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service SELECT type, COUNT(type) AS typecount FROM `travel-sample` WHERE type is not missing GROUP BY TYPE ORDER BY typecount Query with GROUP & Aggregate Pushdown Note: Grayed out phases (Fetch, Join, filter, pre-aggregate, aggregate, sort) are skipped in the query because indexer can in Couchbase 5.5, when the query qualifies.
  179. 179. * GROUP BY & AGGREGATION PERFORMANCE DETAILS
  180. 180. 180 Aggregation Queries SELECT type FROM `travel-sample` WHERE type is NOT NULL GROUP BY type; SELECT COUNT(1) AS typecount FROM `travel-sample` WHERE type IS NOT NULL; SELECT type, COUNT(1) AS cnt FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type;
  181. 181. 181 Aggregation Example
  182. 182. 182 What is Group and Aggregation? • Grouping and Aggregations are part of SQL syntax. Used in Dashboards, Analytical, Reporting queries. • Query contains GROUP BY clause • Example : SELECT type FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type; • Query contains Aggregate functions. • Aggregate functions are MIN(), MAX(), COUNT(), SUM(), AVG(), ARRAY_AGG() • Aggregate functions can only be in Projection, Having, Letting clause • Aggregate functions can be present with or without GROUP BY. • When GROUP BY is absent it considered as single group. • SELECT type, COUNT(1) AS cnt FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type; • Returns how many documents present in each group.
  183. 183. 183 Improvements in Couchbase 5.5 • Performance, Performance, Performance. • SELECT type, COUNT(1) AS cnt FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type; • Pushes the predicates as part of the spans • Indexer produces all qualified items • Group the documents based on type • COUNT the number of documents for each group • Return the one row for each GROUP • If the GROUP and Aggregation is done on indexer lot of network transportation, disk I/0 due to back fill will be saved. • This results in • low latency • high scalability • low TCO • No changes to query/index syntax
  184. 184. 184 Type of Aggregations • FULL Aggregations • NO GROUP BY or all of GROUP keys must match the leading index keys (Example: Index (c0,c1,c2,c3,c4), GROUP BY c0,c1). • In case of partition index (PARTITION BY HASH, NOT partial index though WHERE clause) the partition keys needs to match with group keys. • Aggregation is done only once. The results are final. One entry per GROUP. • As the GROUP matched with leading index keys indexer knows when the group is done and produce the results. • Partial Aggregations (n-step aggregation, n can be > 1) • GROUP keys are not matched leading index keys ((Example : Index (c0, c1,c2,c3,c4), GROUP BY c1).. • In case of partition index partition key not matched with group keys. • As the GROUP did NOT matched with leading index keys indexer doesn't know when the group is done so they do partial aggregation and produces duplicates in the GROUP. • Query engine will do another level aggregation on partial aggregated results produce FULL aggregation. • SUM becomes SUM of SUM, COUNT becomes SUM of COUNT, MIN becomes MIN of MIN, MAX becomes MAX of MAX. • AVG is special case will discussed later.
  185. 185. 185 GSI Aggregations - Overview • For 5.5, all Aggregations are computed On-The-Fly. • There are no pre-computed/cached results. • All the computations are done in Indexer Scan Pipeline. • The implementation is storage agnostic i.e. it works equally for MOI, Plasma, Forestdb. • Pre-computed aggregates require storage support. Will be added in a future release. • N1QL-Indexer API3(protocol) has already been defined to support pre-computed aggregates whenever it is ready.
  186. 186. 186 Supported GROUP/Aggregates • GROUP BY • Index keys, document key, expressions on document key/index keys • GROUP key order is irrelevant (GROUP BY c1, c0 is same as GROUP BY c0,c1), • GROUP BY duplicates are irrelevant ( GROUP BY c1,c1 is same as GROUP BY c1). • No Pushdowns : Group key depends on query/named parameters, sub query, LET variables • Aggregates • Supported : MIN(), MAX(), COUNT(), COUNTN(), SUM(), AVG(), COUNT(DISTINCT x), COUNTN( DISTINCT x), SUM( DISTINCT x), AVG(DISTINCT x) • COUNTN(X), COUNTN(DISTINCT X) – Counts value when X is numeric value. • Aggregate argument can be Constant, Index keys, document key, expression on document key/index keys • No pushdowns : ARRAY_AGG(), ARRAY_AGG(DISTINCT x) • No Pushdowns : Aggregate argument depends on query/named parameters, sub query, LET variables.
  187. 187. 187 Supported GROUP/Aggregates • Duplicate aggregates are calculated only once. • All aggregates in query block are considered (i.e. Projection, LETTING, HAVING, ORDER BY). • To support partial aggregation AVG() is rewritten as SUM()/COUNTN() • Distinct Aggregates • Query contains DISTINCT aggregates following additional rules apply to push the aggregates to indexer. • Aggregations needs to result in FULL aggregation. • The aggregate argument needs to be one of n+1 leading keys, n is number of non-duplicate group keys. • CREATE INDEX ix1 ON `orders` (c0,c1,c2,c3,c4) WHERE type = "numeric"; • SELECT SUM(DISTINCT X), AVG(DISTINCT X) FROM orders WHERE c0 > 0 and type = "numeric" GROUP BY c0,c1; • X can be c0 or c1 or c2. • The above restriction is required so that indexer can apply DISTINCT values with out sort. (FULL aggregations means groups keys matched with leading n keys, n+1 key is already sorted). • Special case for partition Index, With NO GROUP BY: Distinct aggregates will not push down.
  188. 188. 188 Array Index • CREATE INDEX aa1 ON orders (c0,c1, ALL ARRAY v FOR v IN a1 END,c3,c4) ; • CREATE INDEX da1 ON orders (c0,c1, ALL DISTINCT ARRAY v FOR v IN a1 END,c3,c4); • MIN, MAX aggregates are allowed. • For other aggregates the following additional restrictions apply. • The restrictions are to produce one document for document key. • The Array Index requires DISTINCT keyword (ex: da1). aa1 not qualify due to DistinctScan required to eliminate duplicate document keys • Only when equality predicate is present in SATISFIES clause (i.e. ANY v IN a1 SATISIFES v = 10 END) • Rest of other rules described earlier are still applicable.
  189. 189. 189 EXPLAIN Group /Aggregate Pushdowns • IndexScan section will have "index_group_aggs" • Full aggregation • index_group_aggs.partial will be MISSING • No InitialGroup, IntermediateGroup,FinalGroup operators. • Partial aggregation • index_group_aggs.partial will be true "index_preojection" contains all group and aggregates ORDER pushdown • "index_order" will present in IndexScan section and Order operator will not present at the end. OFFSET pushdown • "offset" will present in IndexScan section and Offset operator will not present at the end. LIMIT pushdown • "limit" will present in IndexScan section { "#operator": "IndexScan3", "keyspace": "travel-sample", "index_group_aggs": { "aggregates": [{ "aggregate": "COUNT", "expr": "cover ((`t`.`city`))", "depends": [ 1 ], "keypos": 1, "id": 4 } ], "group": [ { "expr": "cover ((`t`.`country`))", "depends": [ 0 ], "keypos": 0, "id": 0 } ], "depends": [ 0, 1 ] , "partial": false, = Appears only when it is true } , "index_projection": { "entry_keys": [ 0, 4 ] }, "index_order": [ { "keypos": 0 } ], "limit": "5", "offset": "1" }
  190. 190. 190 Group/Aggregation Example CREATE INDEX ix_ttype ON `travel-sample` (type); SELECT type, COUNT(1) AS cnt FROM `travel-sample` WHERE type IS NOT NULL GROUP BY type; • The above query counts the number of documents for each type of the document. • Group/Aggregates are pushed to indexer • This result in Full aggregations.
  191. 191. 191 Group/Aggregation with pagination Example CREATE INDEX ix_airport ON `travel-sample` (country,city,airportname) WHERE type = "airport"; SELECT t.country, COUNT(t.city) FROM `travel-sample` t WHERE t.type = "airport" AND t.country IS NOT NULL GROUP BY t.country ORDER BY t.country OFFSET 1 LIMIT 5; • The above pagination query counts the number of airports in the country • Group/Aggregates are pushed to indexer • This result in Full aggregations. • ORDER,OFFSET,LIMIT are pushed to indexer
  192. 192. 192 Group/Aggregation with pagination Example CREATE INDEX ix_airport ON `travel-sample` (country,city,airportname) WHERE type = "airport"; SELECT t.city, COUNT(t.city) FROM `travel-sample` t WHERE t.type = "airport" AND t.country IS NOT NULL GROUP BY t.city ORDER BY t.city OFFSET 1 LIMIT 5; • The above pagination query counts the number of airports in the country • Group/Aggregates are pushed to indexer • This result in Paritial aggregations. • ORDER,OFFSET,LIMIT are pushed to indexer
  193. 193. 193 Complex Group/Aggregation Example CREATE INDEX ix1 ON `orders` (c0,c1,c2,c3,c4) WHERE type = "numeric"; SELECT c0,c1,c2, MIN(meta().id) AS minid, MAX(ABS(c2)) AS maxac2, MAX(c2) AS maxc2, MAX([c2,{c1,c2}])[1] AS maxc1c2, ABS(MAX(c2)) AS absmaxc2, AVG(c1+c3) AS avgc1c2, AVG ( DISTINCT c1) AS avgdc1, SUM ( DISTINCT c2) AS sumdc2, COUNT ( DISTINCT c3) AS sumdc3, cnt AS countc1 FROM orders WHERE (type = 'numeric') AND c0 > 0 GROUP BY c1,c2,c0 LETTING cnt = COUNT(c1) HAVING SUM(c1) > 5 AND cnt > 1 ORDER BY c0,c1,c2 OFFSET 1
  194. 194. 194 How To Disable Feature • As there is no syntax change feature is enabled automatically by default. • If results are wrong due to bugs customers need bug fix this is expensive for customers and as well as Couchbase. • Added following Query Service settings and request level query parameters • max-index-api :2 This feature added as part of Index API 3. By setting 2 disables all features added above 2 • n1ql-feat-ctrl : 1 Don't change max-index-api. Setting Bit 1 i.e 0x01 disable aggregate pushdown feature. JSON doesn't recognize the hex values you need to convert hex to numeric and set that value. • Request level setting names are _ instead of – (i.e. max_index_api, n1ql_feat_ctrl). • Request level settings can down grade query level settings but can't upgrade. • As of now there are no plans to document this. Support can recommend customer
  195. 195. * GROUP BY & AGGREGATION DEMO
  196. 196. 196 REGISTER FOR COUCHBASE CONNECT – NEW YORK May 9th and 10th
  197. 197. ∞ COUCHBASE.COM QUERY.COUCHBASE.COM FORUMS.COUCHBASE.COM BLOG.COUCHBASE.COM
  198. 198. TITLE Subtitle here if needed Date Name | Title
  199. 199. 01/ 02/ Fill in agenda items and numbers Delete text boxes not used AGENDA
  200. 200. #
  201. 201. 201 Headline Content Goes Here • Bullet level 1, Arial, 22 pt • Bullet level 2, Arial, 20 pt • Bullet level 3, Arial, 18 pt • Bullet level 4, Arial, 16 pt • Bullet level 5, Arial, 16 pt
  202. 202. 202 Headline Content Goes Here • Bullet level 1, Arial, 22 pt • Bullet level 2, Arial, 20 pt • Bullet level 3, Arial, 18 pt • Bullet level 4, Arial, 16 pt • Bullet level 5, Arial, 16 pt • Bullet level 1, Arial, 22 pt • Bullet level 2, Arial, 20 pt • Bullet level 3, Arial, 18 pt • Bullet level 4, Arial, 16 pt • Bullet level 5, Arial, 16 pt
  203. 203. 203 • Bullet level 1, Arial, 22 pt • Bullet level 2, Arial, 20 pt • Bullet level 3, Arial, 18 pt • Bullet level 4, Arial, 16 pt • Bullet level 5, Arial, 16 pt • Bullet level 1, Arial, 22 pt • Bullet level 2, Arial, 20 pt • Bullet level 3, Arial, 18 pt • Bullet level 4, Arial, 16 pt • Bullet level 5, Arial, 16 pt Headline Content Goes Here • Bullet level 1, Arial, 22 pt • Bullet level 2, Arial, 20 pt • Bullet level 3, Arial, 18 pt • Bullet level 4, Arial, 16 pt • Bullet level 5, Arial, 16 pt
  204. 204. 204 This Is a Blank Slide
  205. 205. 205 Headline Content Goes Here Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
  206. 206. 206 Headline Content Goes Here Lorem ipsumLorem ipsum Lorem ipsumLorem ipsum • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut (ICON) (ICON) (ICON) (ICON) To add an icon: 1. Go to the icon library: https://drive.google.com/drive/fold nXRE02d1dNUnJNWjA?usp=shar 2. Go the menu click on “Insert” an the “Pictures” icon then choose “P File…” in the drop down menu. 3. Click on the icon file then press icon will appear on your slide or do as needed.
  207. 207. 207 Lorem ipsum dolor sit amet consectetur adipiscing elit Headline Content Goes Here (ICON OR TEXT) (ICON OR TEXT) Lorem ipsum dolor sit amet consectetur adipiscing elit Lorem ipsum dolor sit amet consectetur adipiscing elit • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor • Lorem ipsum dolor sit amet, consectetur (ICON OR TEXT) To add an icon: 1. Go to the icon library: https://drive.google.com/drive/folders/0B3tU353ksD nXRE02d1dNUnJNWjA?usp=sharing 2. Go the menu click on “Insert” and then click on the “Pictures” icon then choose “Pictures from File…” in the drop down menu. 3. Click on the icon file then press “Insert” and the icon will appear on your slide or document. Resize as needed.
  208. 208. 208 Headline Content Goes Here Lorem ipsum Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor Lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
  209. 209. 209 Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Lorem ipsum dolor Headline Content Goes Here
  210. 210. 210 Headline Content Goes Here 0 1 2 3 4 5 6 Category 1 Category 2 Category 3 Category 4 Chart Title Series 1 Series 2 Series 3 To edit, select and right-click on chart, “Edit data in Excel” To change color of a data series, click and select “Format Data series” To change the color of an individual da and select “Format Data Point” (DELETE THIS TEXT BOX WHEN FIN
  211. 211. 211 Headline Content Goes Here 0 1 2 3 4 5 6 Category 1 Category 2 Category 3 Category 4 Chart Title Series 1 Series 2 Series 3 To edit, select and right-click on char “Edit data in Excel” To change color of a data series, clic and select “Format Data series” To change the color of an individual d and select “Format Data Point” (DELETE THIS TEXT BOX WHEN F
  212. 212. 212 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum. Headline Content Goes Here 4.3 2.5 3.5 4.5 2.4 4.4 1.8 2.8 2.0 2.0 3.0 5.0 0 1 2 3 4 5 6 Category 1 Category 2 Category 3 Category 4 Chart Title Series 1 Series 2 Series 3 To edit, select and right-click on cha “Edit data in Excel” To change color of a data series, clic and select “Format Data series” To change the color of an individual and select “Format Data Point” (DELETE THIS TEXT BOX WHEN F

×