Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Couchbase Analytics: Under the Hood – Connect Silicon Valley 2018

47 views

Published on

Speaker: Mike Carey, Consulting Chief Architect and UC Irvine Bren Professor of Computer and Information Sciences

With Couchbase Analytics, the operational data in Couchbase Server is available for analytical processing in real time. Join us for an architectural overview of the new service and learn about some more details on the MPP (massively parallel processing) architecture, on data and index placements, and on storage structures. There will also be a discussion of typical query execution plans.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Couchbase Analytics: Under the Hood – Connect Silicon Valley 2018

  1. 1. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. COUCHBASE ANALYTICS: UNDER THE HOOD (Or: How the Sausage is Made ) September 19, 2018 Michael Carey | Consulting Architect (Professor, UC Irvine)
  2. 2. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 2 Analytics Requirements: Rapid Time to Insight! • Flexible data model (JSON, just like the operational system) • Full query capability (N1QL for Analytics, based on SQL++) • Continuous data ingestion (from the Data Service) • Support for Big Data analytics (including exploration) • Volume of data necessitates secondary storage • Ad hoc questions demand parallel processing • Query cost proportional to query complexity • Best practices from years of parallel database R&D! • Optional secondary index support (for tuning)
  3. 3. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 3 (A Compelling Use Case ) "Explore beer characteristics by city" SELECT bw.city, COUNT(*) AS num_beers, AVG(br.abv) AS beer_strength FROM beers br, breweries bw WHERE br.brewery_id = meta(bw).id GROUP BY bw.city HAVING COUNT(*) > 1 ORDER BY beer_strength DESC LIMIT 3; [{ "num_beers": 5, "beer_strength": 12.02, "city": "Vorchdorf" }, { "num_beers": 8, "beer_strength": 10.3125, "city": "Buggenhout" }, { "num_beers": 11, "beer_strength": 10.045454545454545, "city": "Fraserburgh" }]
  4. 4. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. AGENDA 1 2 3 4 5 6 Design Principles Analytics Architecture Ingestion and Storage Big Data Operators Parallel Processing Memory Management Summary7
  5. 5. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. DESIGN PRINCIPLES1
  6. 6. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 6 Parallel Database Performance Objectives Source: Principles of Distributed Database Systems, 3rd Edition, T. Özsu and P. Valduriez, Springer, 2011.
  7. 7. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 7 Shared-Nothing Parallel Databases (a.k.a. MPP) • Notable research systems • Gamma, Grace, Bubba, … • Many commercial examples • Teradata • IBM DB2 Parallel • Microsoft PDW • Vertica • … • Over 30 years of R&D!
  8. 8. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 8 Two Forms of Intra-Query Parallelism Source: Principles of Distributed Database Systems, 3rd Edition, T. Özsu and P. Valduriez, Springer, 2011. (including pipelining)
  9. 9. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. ANALYTICS ARCHITECTURE2
  10. 10. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 10 System Architecture
  11. 11. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. INGESTION AND STORAGE3
  12. 12. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 12 Data Ingestion: Data Service  Analytics Service OLTP Analytics ANALYTICS ANALYTICS ANALYTICS ANALYTICS DATA DATA DATA DCP • Separate services, separate nodes • Needed for performance isolation • Allows separate scaling based on needs • Parallel shadowing of datasets (DCP) • Low impact on Data nodes • High data currency • Other notes • M:N connectivity • Not unlike GSI
  13. 13. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 13 LSM-Based Storage and Indexing Memory Disk Sequential writes to disk Periodically merge disk trees Log-Structured Merge Trees • Support for fast ingestion • B+ tree based components • Bloom filters (search efficiency)
  14. 14. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 14 An Indexed Analytics Dataset Primary Key Index Primary Index Secondary Index on Name Secondary Index on Zipcode Bloom Filter Dataset Partitioned local storage approach • Hashed on primary key (PK) • Primary index w/ PK + record • Secondary index(es) with SK + PK • Record updates are always local Node 1 Node N…
  15. 15. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 15 Exploiting a Secondary Index • On all partitions, in parallel: 1. Search the secondary index to find the PKs with the desired SK values 2. Sort the resulting primary key list (to avoid page re-accesses) 3. For each primary key, fetch the associated record 4. Recheck the query predicate (to filter any just-changed records) 5. Pass the resulting records on to the next operator in the query plan • Regarding the sort in step 2: • Many qualifying PKs  sorting avoids potentially many random I/Os • Few qualifying PKs  sorting is in-memory and minor (∴ the safe bet)
  16. 16. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. BIG DATA OPERATORS4
  17. 17. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 17 Push-Based Runtime System • Operators push pages of data from one to the next • Memory-intensive operators get a budget (an execution memory limit) • Pipelined parallelism between operators • Connectors reroute data between operators (when necessary) • 1:1 if local, M:N hash partitioning when needed, etc.
  18. 18. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 18 Sorting on a (Memory) Budget
  19. 19. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 19 Joining on a (Memory) Budget Partitions of R & S Input buffer for Si Hash table for partition Ri (k < B-1 pages) B main memory buffersDisk Output buffer Disk Join Result hash fn h' h' B main memory buffers DiskDisk Original Dataset OUTPUT 2INPUT 1 hash function h B-1 Partitions 1 2 B-1 . . . (First do R, then do S) • GRACE hash join of R and S • Partition datasets R and then S into memory-sized partitions Ri (and Si) using hash function h( ) on their join fields • For each pair i, read Ri into memory, build a hash table on it using h’( ), and then scan Si while probing Ri for matches • Dynamic hash join in Analytics • Spill partitions Ri dynamically, each time the memory budget is reached • Build phase ends with some Ri‘s in memory, then probe phase begins with Si’s • Recurse and/or repeat as necessary (Execution memory)
  20. 20. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 20 Joining on a (Memory) Budget – Probe Phase Details ReciRecj
  21. 21. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. PARALLEL PROCESSING5
  22. 22. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 22 Parallel Processing Example: Grouped Aggregation { "name": "dubbel", "brewery_id": 3 } { "name": "saison", "brewery_id": 3 } { "name": "pils", "brewery_id": 2 } { "num_beers": 1, "brewery_id": 2 } { "num_beers": 2, "brewery_id": 3 } { "name": "kölsch", "brewery_id": 2 } { "name": "lager", "brewery_id": 1 } { "name": "ale", "brewery_id": 1 } { "num_beers": 2, "brewery_id": 1 } { "num_beers": 1, "brewery_id": 2 } { "name": "tripel", "brewery_id": 3 } { "name": "stout", "brewery_id": 1 } { "name": "weizen", "brewery_id": 2 } { "num_beers": 1, "brewery_id": 1 } { "num_beers": 1, "brewery_id": 2 } { "num_beers": 1, "brewery_id": 3 } { "num_beers": 2, "brewery_id": 3 } { "num_beers": 2, "brewery_id": 1 } { "num_beers": 1, "brewery_id": 1 } { "num_beers": 1, "brewery_id": 3 } { "num_beers": 3, "brewery_id": 1 } { "num_beers": 3, "brewery_id": 3 } { "num_beers": 1, "brewery_id": 2 } { "num_beers": 1, "brewery_id": 2 } { "num_beers": 1, "brewery_id": 2 } { "num_beers": 3, "brewery_id": 2 } local aggregation repartition final aggregation SELECT br.brewery_id,COUNT(*) AS num_beers FROM beers br GROUP BY br.brewery_id;
  23. 23. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 23 Parallel Processing Example: Hash Join h1(jk) ...... ... ... R S R S h2(jk) h2(jk) RS RS
  24. 24. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 24 Parallel Processing Example: Hash Join ...... ... ... R join S = union(RIJ join SIJ) R S R S RS RS
  25. 25. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 25 Parallel Join Communication Strategies • Hash join (default choice) • Each dataset (R, S) is repartitioned (if needed) by hashing their records on the join key • A local hash join is performed between each of the resulting partition pairs (as we just saw) • Nested loops index join • The “outer” dataset R is replicated to all partitions of the “inner” indexed dataset S • Arriving R tuples are used (as they arrive) to probe the S index for matches • Broadcast join • The build dataset R is broadcast (replicated) to all partitions of the probe dataset and incoming tuples are treated as the build tuples in a local hash join • The probe dataset S is then used to probe the build dataset
  26. 26. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 26 Operator and Plan Choices • The Analytics runtime includes a variety of different physical operators and strategies • Parallel Search • Searches (e.g., secondary key searches) run in parallel on all storage partitions • Parallel Join • Hash join, index nested-loop join, broadcast join • Parallel Grouping and Aggregation • Presorted group-by, hash-based group-by • Parallel Sort • Local sorting in parallel followed by a global merge (for now ) • ”Safe” choices made by default, with hints available to override them as necessary
  27. 27. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 27 Query Plans and Runtime Scheduling R1 Operator Tree Probe3 Probe2 ScanR4 ScanR3 Build3 Build1 Build2 Probe1 ScanR2 Join tree Pipeline chain ScanR1 R2 R3 R4 • The query scheduler oversees parallel-schedule-able pipelines (stages) • Staging is based on control dependencies (e.g., build before probe) within operators • Multiple queries can be executed concurrently (of course) Source: Principles of Distributed Database Systems, 3rd Edition, T. Özsu and P. Valduriez, Springer, 2011.
  28. 28. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. MEMORY MANAGEMENT6
  29. 29. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 29 Memory Sections • In-memory LSM components • Initial destination for inserts, upserts, deletes from DCP • Buffer cache • Used to hold disk components’ pages from disk while they’re being read and used • Working memory • Allocated for usage by memory- intensive (budgeted) runtime operators Buffer cache Disk Primary index Secondary index(es) User datasets Java Virtual Machine Heap In-memory components Working memory Primary index Dataset 1 Dataset n Flush Read (pin) Flush LSM indexes Secondary index(es) Metadata Flush
  30. 30. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 30 Admission Control (Memory and Parallelism) • New queries are admitted into the system if resources are available, else queued
  31. 31. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 31 Resource Usage • Tunable memory parameters include • Global size of in-memory components area, maximum number of active datasets, and number of in-memory components per dataset (2 by default) • Size of buffer cache • Remainder goes to working memory area • Tunable parallelism parameters include • Degree of parallelism for a query (defaults to the number of storage partitions) • Core multiplier (number of queries per core, defaults to 3) • Per-query resource requirements are its requested number of cores and its maximum working memory footprint (considering its stages and their scheduling constraints)
  32. 32. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. SUMMARY 7
  33. 33. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 33 What We’ve Covered Today • Design Principles: Have your data and query it too (without fear!) • Analytics Architecture: Shared-nothing parallel NoSQL DBMS (or BDMS ) • Ingestion and Storage: LSM-based primary and secondary indexes, fed by DCP • Big Data Operators: Each individual operator works within a strict memory budget • Parallel Processing: MPP-based divide and conquer algorithms • Memory Management: 3 types of memory, resource-based admission control • Summary: NoETL for NoSQL  your data, in near real-time  rapid insights!
  34. 34. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 34 For Further Reading • Couchbase Analytics Documentation (e.g., for parameters and tuning) • “Parallel Database Systems: The Future of High Performance Database Systems”, D. DeWitt and J. Gray, Comm. ACM 35(6), June 1992. • Principles of Distributed Database Systems, 3rd Edition, T. Özsu and P. Valduriez, Springer, 2011. (4th Edition coming in 2019.) • “AsterixDB: A Scalable, Open Source BDMS”, S. Alsubaiee et al, Proc. VLDB 7(14), September 2014. • “Storage Management in AsterixDB”, S. Alsubaiee et al, Proc. VLDB 7(10), June 2014.
  35. 35. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 35Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. QUESTIONS? Web Twitter Facebook couchbase.com twitter.com/couchbase facebook.com/couchbase Mike.Carey@couchbase.com
  36. 36. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. THANK YOU
  37. 37. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. WRITE A COUCHBASE REVIEW: http://bit.ly/TrustCB DOWNLOAD THE MOBILE APP WI-FI: SSID: Couchbase Password: Rackspace EVENT HASHTAG: #CBConnect COUCHBASE LIVE: Chat with us on Facebook Live (near registration area)

×