Due to the lack of built-in analytics, many NoSQL database customers export data from their operational databases into analytical platforms that are installed and managed separately. This approach is complex to manage, costly to operate and makes application queries across both repositories difficult and error prone. But most importantly, it creates barriers to getting real-time operational analytics. The Couchbase Server Analytics service adds analytical capabilities to Couchbase Server, enabling real-time and ad hoc analytics over operational data, all within the same Couchbase cluster.
The Couchbase Analytics service, currently a Developer Preview feature, is the newest member of the Couchbase data platform family of services. This session will provide a sneak peek at this service, which delivers parallel evaluation of analytical queries on current data with minimal impact on the operational performance of the Couchbase cluster. We will share an architectural overview of the new service, the SQL++ based query interface, and a glimpse at the roadmap for this important new capability. We will also review the current technical resources, documentation, and deliver a demo.
Similar to the previous one But a join on a non-key value Currently not possible in N1QL (for good reasons!)
Query nested data and produce nested data Easier consumption by applications Fully composable No additional transformation
The bottom food pyramid for analytical queries
Two similar languages, N1QL & SQL++ derived from a common, open specification differences mostly minor … but enough variation that we need two names, otherwise mass confusion N1QL syntax optimized for OLTP for performance reasons (JOIN … ON KEYS)
Have your data and query it, too Give the full "ETL" here Ask all questions on "your" data No flat schema committee with monthly meetings :-) No batchy ETL delays to make data available Ex: Line items still in orders, phone numbers still in User profiles, …. ⇒ Much shorter TTI (and less painful, too)
2 services that support queries Why 2?
Different target use cases Different workloads Different architecture Separating such workloads in not new (data marts, data warehouses)
Leading over claim that there's a need for a different architecture experiment to support his claim
For example this query can be small or big might not be a key join might be read a lot of data might impact front-end user experience
X-axis -> size of interval -> no records Blue analytics, read N1QL
Query service single node multiple threads translation/execution optimized for low latency
Analytics does translation/optimization creates a distributed job, job scheduling on all nodes of the cluster
Talk about design points -> parallel + distributed query processor
Parallel Query processor??
separate services usually separate nodes (MDS)
DCP == Database Change Protocol one protocol to propagate data(events) though the cluster optimized for low latency
Ad-hoc queries: yes got the query language and powerful parallel processor that can run on Workload isolation: yes separate nodes low load on data nodes Independent scaling: partially can scale independently, but the quality of service is not at the level of a true topology aware service in Couchbase (data nodes or FTS) checking another checkbox Common programming and data model: partially Aligned HTTP APIs and SDK are there Query languages are very similar, but not identical -> work in progress Unified Management: no - not integrated in Couchbase server -> work in progress that should be available as a DP in a few months Fast data synchronization: partially data synchronization is there, but the quality of service is not yet at the full level
install of macOS, Win, Linux
these are the processes CCDriver: cluster mgmt, APIs, SQL++ compiler NCService: lifecycle mgmt for AN NCService NCDriver: Storage and query evaluation
Works with 5.0 Beta
Sneak peek: Couchbase Analytics – Couchbase Connect New York 2017