Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Couchbase Analytics: an overview – Connect Silicon Valley 2017

241 views

Published on

Speaker: Till Westmann

Due to the lack of built-in analytics, many NoSQL database customers export data from their operational databases into analytical platforms that are installed and managed separately. This approach is complex to manage, costly to operate, and makes application queries across both repositories difficult and error prone. But most importantly, it creates barriers to getting real-time operational analytics. The Couchbase Server Analytics service adds analytical capabilities to Couchbase Server, enabling real-time and ad hoc analytics over operational data, all within the same Couchbase cluster.

The Couchbase Analytics service, currently a Developer Preview feature, is the newest member of the Couchbase Data Platform family of services. This session will provide an overview of this service, which delivers parallel evaluation of analytical queries on current data with minimal impact on the operational performance of the Couchbase cluster. We will examine the architecture of the new service, describe its SQL++ based query interface, and share the roadmap for this important new capability.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Couchbase Analytics: an overview – Connect Silicon Valley 2017

  1. 1. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. COUCHBASE ANALYTICS An Overview
  2. 2. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. AGENDA 01/ 02/ 03/ 04/ What is Couchbase Analytics How to use it? From the inside out Developer Preview 4
  3. 3. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Why Couchbase Analytics? • Support OLTP and OLAP processing in a single platform • Eliminate the need for a separate OLAP system • Eliminate ETL • Reduces latency • Reduces complexity • Enables more intelligent applications • Enable data exploration and ad hoc analytics
  4. 4. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. What is Couchbase Analytics? • Common programming model & data model • Unified management • Fast data synchronization • Extend Couchbase Platform to power real-time analytics • Ad-hoc queries (“Ask me anything!”) • Workload isolation • Independent scaling Scale out architecture Query Mobile & IoT AnalyticsPreview Memory-first architecture Unified Programming Search Core Database Engine
  5. 5. HOW TO USE IT?
  6. 6. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Data: Beer Sample { "name": "Commonwealth Brewing #1", "city": "Boston", "state": "Massachusetts", "code": "", "country": "United States", "phone": "", "website": "", "type": "brewery", "updated": "2010-07-22 20:00:20", "description": "", "address": [ ], "geo": { "accuracy": "APPROXIMATE", "lat": 42.3584, "lng": -71.0598 } } { "name": "Piranha Pale Ale", "abv": 5.7, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f04166d", "updated": "2010-07-22 20:00:20", "description": "", "style": "American-Style Pale Ale", "category": "North American Ale" }
  7. 7. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Simple Join [{ "brewer": "(512) Brewing Company", "beer": "(512) ALT" }, { "brewer": "(512) Brewing Company", "beer": "(512) Bruin" }, { "brewer": "(512) Brewing Company", "beer": "(512) IPA" }] "Get 3 beers with their breweries" SELECT bw.name AS brewer, br.name AS beer FROM breweries bw, beers br WHERE br.brewery_id = meta(bw).id ORDER BY bw.name, br.name LIMIT 3;
  8. 8. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Non-key Self Join [{ "brewer1": "aberdeen_brewing", "brewer2": "hoffbrau_steaks_brewery_2", "beer": "Scottish Ale" }, { "brewer1": "aberdeen_brewing", "brewer2": "carlyle_brewing", "beer": "Scottish Ale" }, { "brewer1": "aberdeen_brewing", "brewer2": "belhaven_brewery", "beer": "Scottish Ale" }] "Get 3 beer names used by different breweries" SELECT b1.name AS beer, b1.brewery_id AS brewer1, b2.brewery_id AS brewer2 FROM beers b1, beers b2 WHERE b1.name = b2.name AND b1.brewery_id != b2.brewery_id ORDER BY b1.brewery_id LIMIT 3;
  9. 9. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Nested Outer Join [{ "beers": [ { "abv": 8.2, "name": "(512) Pecan Porter" }, { "abv": 5.8, "name": "(512) Pale" }, ... ], "brewer": "(512) Brewing Company" }, { "beers": [ { "abv": 7.2, "name": "21A IPA" }, { "abv": 5.8, "name": "North Star Red" }, ... ], "brewer": "21st Amendment Brewery Cafe" }] "Get 2 breweries and the list of their beers" SELECT bw.name AS brewer, ( SELECT br.name, br.abv FROM beers br WHERE br.brewery_id = meta(bw).id ) AS beers FROM breweries bw ORDER BY bw.name LIMIT 2;
  10. 10. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Grouping and Aggregation [{ "num_beers": 57, "brewery_id": "midnight_sun_brewing_co" }, { "num_beers": 49, "brewery_id": "rogue_ales" }, { "num_beers": 38, "brewery_id": "anheuser_busch" } ] "Get all breweries that produce more than 37 beers" SELECT br.brewery_id, COUNT(*) AS num_beers FROM beers br GROUP BY br.brewery_id HAVING num_beers > 37 ORDER BY num_beers DESC;
  11. 11. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Putting it all together [{ "num_beers": 5, "beer_strength": 12.02, "city": "Vorchdorf" }, { "num_beers": 8, "beer_strength": 10.3125, "city": "Buggenhout" }, { "num_beers": 11, "beer_strength": 10.045454545454545, "city": "Fraserburgh" }] "Explore beer characteristics by city" SELECT bw.city, COUNT(*) AS num_beers, AVG(br.abv) AS beer_strength FROM beers br, breweries bw WHERE br.brewery_id = meta(bw).id GROUP BY bw.city HAVING num_beers > 1 ORDER BY beer_strength DESC LIMIT 3;
  12. 12. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Couchbase Analytics DDL: Lifecycle • DDL for shadow datasets CREATE BUCKET `beer-sample`; CREATE SHADOW DATASET beers ON `beer-sample` WHERE `type` = "beer"; CREATE SHADOW DATASET breweries ON `beer-sample` WHERE `type` = "brewery"; CONNECT BUCKET `beer-sample`; SELECT * FROM beers ORDER BY abv DESC LIMIT 12; DISCONNECT BUCKET `beer-sample`; DROP DATASET breweries ; DROP DATASET beers; DROP BUCKET `beer-sample`;
  13. 13. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Couchbase Analytics DDL: Lifecycle • DDL for shadow datasets for external data CREATE BUCKET `beer-sample` WITH { "nodes": "node1.mydomain.com,node2.mydomain.com" }; CREATE SHADOW DATASET beers ON `beer-sample` WHERE `type` = "beer"; CREATE SHADOW DATASET breweries ON `beer-sample` WHERE `type` = "brewery"; CONNECT BUCKET `beer-sample` WITH { "password": "!@#", "timeout": 2000 }; SELECT * FROM beers ORDER BY abv DESC LIMIT 12; DISCONNECT BUCKET `beer-sample`; DROP DATASET breweries ; DROP DATASET beers; DROP BUCKET `beer-sample`;
  14. 14. FROM THE INSIDE OUT
  15. 15. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Why another service? • Common programming model & data model • Unified management • Fast data synchronization • Extend Couchbase Platform to power real-time analytics • Ad-hoc queries (“Ask me anything!”) • Workload isolation • Independent scaling Scale out architecture Query Mobile & IoT AnalyticsPreview Memory-first architecture Unified Programming Search Core Database Engine
  16. 16. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Couchbase Query and Analytics  Many queries  Each touches a little data  Fewer queries  Each touches a lot of data Couchbase Query Couchbase Analytics Optimized for Analytics (OLAP) Optimized for Operations (OLTP)
  17. 17. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. "Get the 10 chattiest users in a timeframe" SELECT user.id, COUNT(message) AS count FROM gbook_messages AS message, gbook_users AS user WHERE message.author_id = user.id AND message.send_time BETWEEN "2001-11-28T09:57:13" AND "2001-11-29T09:57:13" GROUP BY user.id ORDER BY count DESC LIMIT 10; Example: Join, Grouping, and Aggregation
  18. 18. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Couchbase Query and Analytics – Performance Tradeoff 1m (<10) 1h (<500) 1d (<5000) Join GBy CBA Join GBy N1QL GSI 1w (<25K) 1mo (<100K) 3mo (<300K) 6mo (<600K) Join GBy CBA Join GBy N1QL GSI interval (# records)
  19. 19. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. "Secret" Sauce: Query Parallelism • Massively Parallel Query Processor (MPP) executes complex queries on large datasets • Comprehensive query language Query takes 1 minute Query takes 15 seconds
  20. 20. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Couchbase Analytics Coupling • Separate services, separate nodes • Multi-Dimensional Scaling • Workload isolation • Parallel shadowing of data(sets) via DCP • Low impact on data nodes • Low latency ANALYTICS ANALYTICS ANALYTICS ANALYTICS DATA DATA DATA
  21. 21. DEVELOPER PREVIEW 4
  22. 22. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. What is in the Developer Preview? • Common programming model & data model • Unified management • Fast data synchronization • Extend Couchbase Platform to power real-time analytics • Ad-hoc queries (“Ask me anything!”) • Workload isolation • Independent scaling Scale out architecture Query Mobile & IoT AnalyticsPreview Memory-first architecture Unified Programming Search Core Database Engine ✔ ✔ ✔ ✔ ✔ ✔
  23. 23. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Workbench
  24. 24. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Get it https://www.couchbase.com/downloads
  25. 25. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. THANK YOU
  26. 26. APPENDIX
  27. 27. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. Couchbase Analytics and friends Operations Analytics BatchOnline Key Value CB Query CB Analytics Spark Hadoop 𝜇s ms 30s Minutes+ 1 record Trillions of records Start up overhead Job-based Parallel query ETL

×